Large Language ModelsDon’t expect new large language models like the next GPT-3 to be democratized. 

Meta published Open Pretrained Transformer (OPT-175B) in early May, one of the best large language models (LLM) capable of performing a variety of tasks. In recent years, large language models have become one of the most popular fields of artificial intelligence research. OpenAI's GPT-3, a deep neural network with 175 billion parameters, sparked the LLM arms race, and OPT-175B is the newest contender. GPT-3 demonstrated that LLMs can accomplish a variety of tasks with minimal instruction with only a few instances (zero- or few-shot learning). GPT-3 was later integrated into various Microsoft applications, demonstrating the scientific as well as the economic potential of LLMs.

Meta's dedication to "openness," as the model's name implies, is what distinguishes OPT-175B. The model has now been made public by Meta (with some caveats). It also provided a wealth of information regarding the training and development process. New Large Language Models Democratizing access to large-scale language models. Meta's decision to become more transparent is admirable. The battle for massive language models, on the other hand, has reached a point where it can no longer be democratized.

Looking inside large language models

The New Large Language Models OPT-175B release from Meta contains a few major features. It contains both pre-trained models and the code required to train and use the LLM. Models that have been pre-trained are very valuable for businesses that lack the computational resources to train the model (training neural networks is much more resource-intensive than running them). It will also contribute to lowering the vast carbon footprint produced by the computational resources required to train large neural networks.

OPT, like GPT-3, is available in a range of sizes, ranging from 125 million to 175 billion characters (models with more parameters have more learning capacity). As of this writing, all models up to OPT-30B are available for download. The entire 175 billion-parameter model will be made available to select academics and institutions who fill out a request form.

According to the Meta, we are distributing our model under a noncommercial license to focus on research use cases to protect the integrity and prevent exploitation. Academic researchers, government, civic society, and academic groups, as well as corporate research facilities around the world, will have access to the model.

In addition to the models, Meta has produced a comprehensive logbook that details the development and training of massive language models. The final model is frequently the sole information included in published studies. According to Meta, the logbook reveals how much computing was needed to train OPT-175B and the human overhead necessary when underlying infrastructure or the training process itself becomes problematic at scale.

Contrast with GPT-3

Large language models are generally accessible and limiting access to LLMs has hampered progress on efforts to increase their robustness and minimize known concerns such as bias and toxicity. This is a dig at OpenAI (and, by extension, Microsoft), who instead of making its model's weights and source code public, offered GPT-3 as a black-box API service. Controlling misuse and the development of hazardous applications was one of the reasons given by OpenAI for not making GPT-3 public.

Meta believes that by making the models more widely available, it will be better equipped to evaluate and prevent any potential harm. OPT-175B will bring more voices to the forefront of large language model creation, assist the community in collectively designing responsible release strategies, and add an unprecedented level of transparency and openness to the development of large language models in the field. It is important to note, however, that "transparency and openness" are not the same as "democratizing huge language models." The expenses of training, configuring, and running large language models are still exorbitant, and they're only going to get more so in the future.

According to a blog post on Meta's website, its researchers have successfully reduced the costs of training huge language models. According to the business, the model's carbon footprint is a sixth of that of GPT-3. According to experts GPT-3 training expenditures, could cost up to $27.6 million.

This suggests that training OPT-175B will cost several million dollars. Fortunately, the pre-trained model eliminates the requirement for training, and Meta claims it will give the coding for training and deploying the whole model "using only 16 NVIDIA V100 GPUs." This is the equivalent of an Nvidia DGX-2, which costs over $400,000, which is a significant sum for a cash-strapped research lab or a single researcher. (Meta trained their model with 992 80GB A100 GPUs, which are significantly faster than the V100, according to a report that provides further specifics about OPT-175B.)

The (undemocratic) future of large language models

The transformer architecture is used in language models like OPT and New large language models GPT. Transformers can process massive amounts of sequential data (such as text) in parallel and at scale. Researchers have recently demonstrated that adding more layers and parameters to transformer models can increase their performance on language tasks. Some scientists feel that achieving higher levels of intellect is simply a matter of scale. As a result, cash-rich research labs like Meta AI, Alphabet's DeepMind, and Microsoft's OpenAI are focusing on building larger and larger neural networks.

Last year, Microsoft and Nvidia developed Megatron-Turing, a language model with 530 billion parameters (MT-NLG). Google released the Pathways Language Model (PaLM), an LLM with 540 billion parameters, last month. There are also reports that GPT-4 may be released by OpenAI in the next months. Larger neural networks, on the other hand, necessitate more financial and technological resources. While larger language models will have more bells and whistles (and more failures), they will inevitably concentrate power in the hands of a few wealthy corporations by making it even more difficult for smaller research laboratories and independent researchers to work on huge language models.