All About the GPT-3 Hype

Language models in NLP(Natural Language Processing) Systems are Machine Learning based models which are trained to learn and understand Natural Languages like how humans do. A simple example of a trained language model will predict the occurrence of the next word in a sentence.

language model

Some of the common Language model use cases are

  • Language Translation
  • Text Classification
  • Sentiment Extraction
  • Reading Comprehension
  • Named Entity Recognition
  • Question Answer Systems
  • News Article Generation, etc

Traditionally there have been 2 major Language models, which are statistical and Neural network based Language models .

Figure 2 Language Model Techniques

Statistical Language models typically predict based on probabilistic distribution of a word given preceding ones using techniques such as N-Gram, Hidden Markov model etc.

Neural Net based models are little more sophisticated than statistical, as they use neural nets to model the language. 

The challenges

   The challenge with both these traditional models are

Lack of Agility : The time and effort required to collect vast amount of data, pre-process, creating sequence , encoding the sequence , splitting the data for training and validation, deploying the model and inferencing it is so huge

Figure 3 Stages of Language Model training

        Domain specific :  Models trained against data from one domain cannot predict the data from another domain. Ex. Cloud based Q&A , chatbot applications are trained to answer questions from set of pre-defined document collection. If you rephrase the question it might give you a diff answer. If you train the Language model by feeding it with the Reuters financial news feed , then the prediction of the next word is as follows

language model

Figure 4 Model trained against finance news feeds data

 Transfer Learning :

Transfer Learning is a Machine learning technique by storing knowledge gained while solving one problem and applying it to a different but related problem. 

In 2018 the concept of pre-trained transformer models became popular after  Google’s BERT (Bidirectional Encoder Representations from Transformers ) paper , which falls into the above mentioned Transfer Learning technique which is basically pre-training the model against vast amount of data and transferring the model’s learning to do relevant tasks.

language model

Figure 5 Evolution of Language Models

Google’s BERT model was originally trained with 340 million parameters against Wikipedia and millions of book corpus data to build  simple Q&A application, the model accuracy was by far the best at that time. Facebook and microsoft also created BERT based models such as  RoBERTa and codeBERT( NL-PL conversion) respectively . Following the trend that larger natural language models lead to better results, Microsoft Project Turing introduced Turing Natural Language Generation (T-NLG), the largest model ever trained using 17 billion parameters as of Jan’2020 .  NLP tasks such as Writing news articles, generating code etc. became lot simpler with these transformer models without needing to have much processing headache for NLP engineers.

Around Jun’2020 OpenAI released their Beta version of GPT-3 model, which was trained against 175 billion parameters, which is almost all of the internet is the most sophisticated Language model ever built with such large parameter set. More parameters the model is trained against, better the predictions would be.

OpenAI : GPT-3( Generative Pre-Trained Transformer model 3rd version) released by San Francisco based AI research company called OpenAI , which was founded in 2015 by Sam Altman and Elon Musk , Microsoft invested $1bn in 2019 became an exclusive Cloud provider and the GPT-3 models are trained against the Microsoft’s AI super computer. 

Figure 6 Stats about the data used to train GPT-3

Working with GPT-3

Before start working with GPT-3 API, lets first understand some key concepts.

Prompt: The text input given to the API

Completion: The resultant text that the API generates as a result of processing text input prompt

Token: The number of tokens( chopped sentence into pieces)

Let’s look at how powerful the model is at some NLP tasks

Example 1:  SQL Generator prompt is a simple English sentence to get the count of total employees in HR department

Figure 7 GPT-3 model converting  Natural language to SQL

With couple of examples to prime the model, the GPT-3 model is able to produce the SQL statements accurately.

Example 2: Email message generator – I prompted the model to generate email message for a typical hotel booking 

Figure 8 GPT-3 model able to write email message on my behalf 🙂

Sample code for both the above examples can be found here

There are a lot of examples provided by OpenAI in their playground.

Playground – OpenAI API

Though the model performs to a greater extent, researchers fear it can heavily pose a threat to disinformation, where it can be used by bad actors to create an endless amount of fake news, spread misinformation etc. Here is the tweet by the Sam Altman , the CEO of OpenAI  

Based on my years of experience dealing with Financial Services Industries customers, I certainly believe there are some valuable use cases which are best suited for GPT-3 model. 

FSI Use Cases

  • Automated Named Entity Extraction 
  • Sales trader- Client Meeting notes summarization
  • Financial statement summarization
  • Financial sentiment analyzer
  • Domain specific speech to text translators, robotic form filling based on user voice inputs

References:

Disclaimer:

This is completely my personal view on GPT-3 model, The opinions expressed here represent my own and not those of my employer. In addition, my thoughts and opinions change from time to time I consider this a necessary consequence of having an open mind. 

  • Experfy Insights

    Top articles, research, podcasts, webinars and more delivered to you monthly.

  • Leave a Comment

    Next Post

    Leave a Reply

    Your email address will not be published. Required fields are marked *

    AI & Machine Learning

    Duration Estimation in an A/B Test

    While running an experiment, waiting for data is often the most challenging period as you are likely to get impatient. All you want during that period is for the A/B test to end as quickly as possible so you can go in a  full-scale execution mode. And, the anxiety adds up when you don’t know

    AI & Machine Learning

    Using Responsible AI to Teach the Golden Rule

    Business leaders have a delicate balancing act when it comes to AI. On one hand, according to O’Reilly, 85% of executives across 25 industries are tasked with either evaluating or deploying AI. On the other hand, risks and unintended consequences continue to grow, from Google search results showing offensively skewed results for “black girls”, to

    AI & Machine Learning

    How Online Education In Data Science Helps Students To Become Work-Ready Graduates

    Data science is a relatively new field, but it has become central in all spheres of human activity. So, why the ongoing buzz around data science?  According to reports from IDC, the global ‘datasphere’ will expand to 175 Zettabytes by 2025. Now, that’s a lot of data. But, to get the best out of the