Large language models are being used in AI to do such things as generate content or predict the next word in a sentence. But access was limited and the resources needed mainly resided in Big Tech companies.
A year ago, a consortium called BigScience comprising more than 1,000 researchers from over 70 countries and at least 250 institutions began developing an open source large language model in multiple languages.
The result is BLOOM, or BigScience Large Open-science Open-access Multilingual Language Model, a 176 billion-parameter, multilingual AI model that is open source and general purpose. It beats the groundbreaking language model, GPT-3, by a billion parameters.
BLOOM can generate text in 46 natural languages and 13 programming languages. For the likes of French, Arabic and Spanish, among others, this is the first time they have been represented in a language model with over 100 billion parameters.
The model can be accessed and used on a local machine or in the cloud. And if researchers do not have access to large servers to train their models, BigScience lead Hugging Face, an AI startup, is working on an inference API for large-scale use without dedicated hardware or engineering. An early version of the API is available now for lower-scale testing.
Teams from Nvidia’s Megatron, Microsoft’s DeepSpeed and the French National Research Agency come together to build BLOOM. French research agencies CNRS and GENCI provided the minds behind the model with a compute grant of $3 million to train the model using the Jean Zay supercomputer located in Paris.
BLOOM can be downloaded by researchers under a Responsible AI License. This license, created by BigScience, imposes no restrictions on reuse, distribution or commercialization so long as users of the model commit to not applying it to use-cases that have been restricted.
Use case restrictions for using the model include generating false information to harm others, impersonation, automating decision-making that harms an individual’s legal rights and discriminating against legally protected characteristics.