EleutherAI Open-Sources Six Billion Parameter GPT-3 Clone GPT-J
JUL 13, 2021 2 MIN READ
- Anthony AlfordDevelopment Group Manager at Genesys Cloud ServicesFOLLOW
A team of researchers from EleutherAI have open-sourced GPT-J, a six-billion parameter natural language processing (NLP) AI model based on GPT-3. The model was trained on an 800GB open-source text dataset and has performance comparable to a GPT-3 model of similar size.
Developer Aran Komatsuzaki announced the release on his blog. The model was trained on EleutherAI’s Pile dataset using Google Cloud’s v3-256 TPUs; training took approximately five weeks. On common NLP benchmark tasks, GPT-J achieves an accuracy similar to OpenAI’s published results for their 6.7B parameter version of GPT-3. EleutherAI’s release includes the model code, pre-trained weight files, Colab notebook, and a demo website. According to Komatsuzaki,
GPT-J is the best-performing publicly available Transformer [language model] in terms of zero-shot performance on various [down-stream] tasks.
OpenAI first published a paper on generative pre-trained transformers (GPT), an unsupervised learning model that achieved state-of-the-art results on several NLP tasks, in 2018. In early 2019, OpenAI announced a 1.5B parameter model called GPT-2. OpenAI initially declined to release the largest trained model, citing “concerns about malicious applications of the technology,” but did release the model later that year. Last year, OpenAI announced a 175B parameter model, GPT-3, but again did not release the trained model files. Instead, OpenAI provided an API that allows developers to integrate the model into their code via web service calls.
EleutherAI, a “decentralized grassroots collective of volunteer researchers,” released their first implementation of a GPT-like system, the 2.7B parameter GPT-Neo model, in March 2021. GPT-Neo was implemented in TensorFlow and trained on TPUs using the parallel library Mesh TensorFlow. The team also began developing GPT-NeoX, a GPU-based implementation that uses Microsoft’s DeepSpeed; although the code is open-sourced, there are currently no model files available.
The latest model, GPT-J, was trained using a new library, Mesh-Transformer-JAX. The library uses Google’s JAX linear algebra framework, instead of a dedicated deep-learning framework such as TensorFlow. Komatsuzaki claims that GPT-J provides “more flexible and faster inference than Tensorflow,” and developing the model took much less time than previous projects. Compared to the 2.7GB GPT-Neo model, GPT-J shows a 125% improvement in training efficiency.
In response to concerns about the misuse of its models, EleutherAI co-founder Connor Leahy posted a justification of the release on the organization’s blog. Leahy noted that GPT-like models are “simple and theoretically straight-forward,” making it infeasible to keep the technology out of the hands of bad actors. Instead, EleutherAI’s goal is to enable more widespread safety research, especially for “low-resource” researchers. Leahy also pointed out that many well-funded organizations have already trained even larger models than GPT-3, including Microsoft, NVIDIA, and Google.
In a Twitter discussion about the release, a user asked about the hardware requirements for running the model. Komatsuzaki replied
For inference, in principle you can modify the code to run it on any hardware that can hold a bit more than 12GB of memory. Best throughput can be achieved with TPUs, in which case you can just run as is. Fine-tuning is more demanding: you need at least TPU v3-8 to do that.