Beginner’s guide to GPT Neo (with Python codes) –


Have you ever thought about writing code that can code for you? ! Or generate contextualized text on the subject you want?! Well, the solution to all of these use cases was given by OpenAI, which is a large-scale organization considered by many to be the world leader in artificial intelligence, when they presented the iconic GPT paper which is titled “Language Models Are Few Learners”. ‘ (Generative Pre-trained Transformer) in June 2018.

Then, in the coming years, OpenAI also introduced GPT-2 and GPT-3.

Generative Pre-trained Transformer or abbreviated GPT is a transformer-based model architecture which is nothing but stacks of encoders and decoders put one after the other, one of which has been pre-trained on Wikipedia Corpus (wow seriously? like everything on Wikipedia?!) as well as Common Crawl datasets (fun fact – this has over 12 petabytes of data, that’s 12 years worth of data downloaded from the internet) for extremely good performance on language-based use cases. Generative, as the word suggests, is about making our code generate text. Now it can be poems, articles, essays or even code!!

According to VentureBeat, “a private corpus of 500 billion tokens was used to train the model and a staggering computational cost of $50 million USD.”

The latest GPT-3 has over 175 BILLION parameters! As Hugo Cen of said, and I quote, “It’s the most powerful artificial intelligence tool in the world”, and I’m sure most of us believe it too! However, there is a problem that

GPT-3 is only accessible through a beta API, which is currently pending and for this you need to write an application on OpenAI. Crazy right?

What if you want to leverage the power of GPT-3 but don’t want to bother going through the application process, etc. ? Introducing GPT-Neo, an open source transformer model with only 2.7 billion parameters, also notes that the larger GPT Neo is nearly equivalent to the smaller GPT-3, which resembles GPT-3 both in terms of design and performance.

Comparing GPT-Neo with GPT-3 Ada (smaller version of GPT-3), the former performed better than the latter on Hellaswag and Piqa. Hellaswag is a smart multiple-choice sentence completion benchmark that features one context paragraph and four endings. Piqa can measure common sense reasoning where the machine has to choose one out of two sentences that makes the most sense. However, GPT-3 Ada is not the greatest as mentioned earlier; it’s the big brother GPT-3 Davinci with about 65 times more settings than GPT-Neo, Davinci beat Neo comfortably. Yeah, nothing too unexpected.

You can train this model from scratch using a mesh-TensorFlow library, a great library for simple and efficient data and model parallelism to help with distributed support. These models have tons of data to train on and lots of parameters; parallelism is therefore vital here. This means that you will perform different segments of your workout simultaneously rather than doing them one after the other. This is completely independent of the different batches. Google Research has provided a simple model as well as an implementation in this notebook. Be sure to read the readme file to learn how to do this; The code for this notebook is provided below with the steps.

  1. Cloning GPT-Neo GitHub repository by Setup cell, make sure you have TPU runtime otherwise go to Runtime -> Change Runtime -> TPU.
  1. Configure Google Cloud as TPU cannot read from on-premises systems; therefore, the cell below will require your authentication credentials if you don’t have a Google Cloud Platform account, no worries! You can create an account for free and get credits worth 300 USD free for 90 days. Otherwise, you can follow the notebook how it goes!

The command below will walk you through setting up gcloud.

 from google.colab import auth
 !gcloud init 

Configure a new configuration with the name of your choice and continue with your Google account with which you signed in to GCP. Create a project name and be sure to follow the instructions as this will cause errors and you will have to rerun the entire cell.

  1. You’re good to go when you get confirmation of Google SDK setup and ready to use it.

Now we need to configure the datasets (the list is present in the notebook), tokenize them and copy them to the bucket (which is a storage for a particular project), which will be done in your GCP.

     # Tokenize Data
 !python data/ --input_dir /content/GPTNeo/$dataset_path --name $dataset_name --files_per 1000 --output_dir $out_name --write_dataset_config --processes 1
 # copy the data to your bucket
 if not path_to_cloud_bucket.endswith('/'):
        path_to_cloud_bucket += '/'
 copy_loc = path_to_cloud_bucket + "datasets/" + dataset
 !gsutil -m cp -r /content/GPTNeo/$out_name $copy_loc
 !gsutil ls $path_to_cloud_bucket 
  1. Before you start training, editing the dataset is required and the template configurations must point to your bucket created in GCP. For this, you need to modify the “path” field and replace the name of the given dataset with the dataset you have chosen.
     %%writefile configs/dataset_configs/Sampling_Only.json
   "path":   "gs://eleutherai/datasets/Sampling_Only/Sampling_Only*.tfrecords",
   "eval_path": "",
   "n_vocab": 50256,
   "tokenizer_is_pretrained": true,
   "tokenizer_path": "gpt2",
   "eos_id": 50256,
   "padding_id": 50257
  1. Configuration of model configurations, for a detailed breakdown be sure to follow here; This is a GitHub README provided by EleutherAI, who had created GPT-Neo and opened it.
     %%writefile configs/GPT3_XL.json
     "n_head": 16,
     "n_vocab": 50257,
     "embed_dropout": 0,
     "lr": 0.0002,
     "lr_decay": "cosine",
     "warmup_steps": 3000,
     "beta1": 0.9,
     "beta2": 0.95,
     "epsilon": 1e-8,
     "opt_name": "adam",
     "weight_decay": 0,
     "train_batch_size": 256,
     "attn_dropout": 0,
     "train_steps": 600000,
     "eval_steps": 0,
     "predict_steps": 1,
     "res_dropout": 0,
     "eval_batch_size": 4,
     "predict_batch_size": 1,
     "iterations": 100,
     "n_embd": 2048,
     "datasets": [["pile", null, null, null]],
     "model": "GPT",
     "model_path": "gs://eleutherai/GPT3_XL",
     "n_ctx": 2048,
     "n_layer": 24,
     "scale_by_depth": true,
     "scale_by_in": false,
     "attention_types" :  [[["global", "local"],12]],
     "mesh_shape": "x:4,y:2",
     "layout": "intermediate_expanded:x,heads:x,vocab:n_vocab,memory_length:y,embd:y",
     "activation_function": "gelu",
     "recompute_grad": true,
     "gradient_clipping": 1.0,
     "tokens_per_mb_per_replica": 2048,
     "precision": "bfloat16"

7. Finally, we can train the model from scratch using the following command.

!python3 --model colab_XL --steps_per_checkpoint 500 --tpu colab

8. Upload the template to your bucket as shown below

 # upload to your bucket
 bucket_base = "gs://" + path_to_cloud_bucket.replace('gs://', '').split('/')[0]
 !gsutil -m cp -r $path_to_local_weights $bucket_base 

9. If everything worked, you may be able to see your model listed below

!gsutil ls $bucket_base

10. For the evaluation, the notebook used a wikitext dataset and to take advantage of it

 wikitext103_src = ""
 !wget $wikitext103_src

11. This step will create a directory, segment the text as needed and copy it to the bucket.

 !mkdir wikitext
 !mv /content/GPTNeo/wikitext-103-raw/wiki.test.raw wikitext/wikitext_test.txt
 # Tokenize Data
 !python data/ --input_dir wikitext --name wikitext --files_per 1000 --output_dir wikitext_tokenized --write_dataset_config --processes 1 --wikitext-detokenize
 # copy the data to your bucket
 if not path_to_cloud_bucket.endswith('/'):
   path_to_cloud_bucket += '/'
 copy_loc = path_to_cloud_bucket 
 !gsutil -m cp -r wikitext_tokenized $copy_loc
 !gsutil ls $path_to_cloud_bucket 

12. Repeat the Configure Dataset Configuration step.

 %%writefile configs/dataset_configs/wikitext.json
   "path": "",
   "eval_path": "gs://test-bucket-neo/wikitext_tokenized/*.tfrecords",
   "n_vocab": 50256,
   "tokenizer_is_pretrained": true,
   "tokenizer_path": "gpt2",
   "eos_id": 50256,
   "padding_id": 50257

13. Run the model for evaluation on the symbolized text.

!python3 --eval --tpu colab --model $pretrained_model

This was a complete breakdown of all the steps needed to train the GPT-Neo model from scratch, you need to follow the order. This requires high computing power (thanks to the TPU it doesn’t take forever!!) and needs time to run, but it’s an amazing run for GPT-Neo

GPT Neo is the name of the code base for loosely stylized transformer-based language models around the GPT architecture. There are two types of GPT Neo provided: 1.3B settings and 2.7B settings for suitability. In this article, we will see how to use the GPT Neo:2.7B settings provided by HuggingFace using a few lines of code.

Let’s dig into the code!

Implementing GPT-Neo code

Importing dependencies

When installing PyTorch, the easiest way is to go to, select the system requirements, and copy-paste the command prompt. I’m using a Windows machine with a Google Colab laptop. Select the stable version, which is 1.8.1 That much. Then select your operating system. I prefer to use pip package in Google Colab, but conda may be preferred in Jupyter. This will help a lot if you have a GPU; else select CUDA 10.2.

You’ll see the command, and it’s ready to use!

Make sure you have the latest version of PyTorch. It may take a while if you are installing it for the first time, as you may need to uninstall older versions first and then install newer versions. It highly depends on your internet connectivity.

!pip install torch==1.8.1+cu111 torchvision==0.9.1+cu111 

torchaudio===0.8.1 -f

By installing transformers we will take advantage of HuggingFace, and the amazing thing about it is that you have a wide variety of different pipelines for different tasks. Incredible isn’t it? !

I highly recommend exploring the most around Transformers on HuggingFace.

!pip install transformers

Importing the pipeline from transformers as we will be using the text generation pipeline.

from transformers import pipeline

Installing the generator

Download the GPT Neo model, which contains 2.7 billion parameters, which is quite huge. Again, this will take some time as the size is around 10 gigabytes, so make sure you have a good internet connection. But you can also download the small GPT Neo version of only 1.3 billion parameters, which is relatively small.

Instantiate the template using a variable name; text-generation is the name of our pipeline, as mentioned earlier.

generator = pipeline('text-generation', model="EleutherAI/gpt-neo-2.7B")

Generating text using the prompt

We need to provide a prompt or topic that we want the text to be generated about.

prompt = "The current stock market"

Output text

Save the output to a variable named ‘res’. The arguments given to the generator created before are: the name of the prompt, the length of the generated text you want, the leveraged sampling in our model, the value used to model the next set of probabilities.

 res = generator(prompt, max_length=50, do_sample=True, temperature=0.9)
 Printing the output to a text name as generated_text

The output will look like this.

Trying a different prompt, let’s say something like this.

prompt=”import pandas as pd”

Running this will give us something like this.

As you can see, it has already imported the base libraries used; you can imagine what level of contextuality this model has achieved. Incredible isn’t it? !

Save to file

Open a new text file named gpttext.txt to save our output using the write method.

 with open('gpttext.txt', 'w') as f:

So it was all about trying out the best text model available and leveraging it for different tasks. Try this notebook with different prompts and different arguments. Links will be present here as well as in the notebook.

REMARK: Make sure you have enough RAM in Google Colab; otherwise, the runtime will crash after downloading the model; so you can try the smaller version of GPT Neo.

The notebook is provided here with all the code you need for reference.

The references:


About Author

Comments are closed.