Discover more from The Overfit
ChatGPT: A Palatable Introduction
A Tasty Treat of Tech: ChatGPT, OpenAI and our future.
Table of Contents
The Beginning (before ChatGPT)
The First GPT (understand its predecessors)
Finally, ChatGPT (what are its intricacies?)
Limitations and Applications (with examples)
Google, Meta & Microsoft (are they in fear?)
The New AI Startup Wave (what’s coming next?)
Will AI replace all jobs? (the future of work)
Conclusion (written by ChatGPT)
Back in 2017, some researchers at google were studying how to empower machine learning models with the ability to learn context. Their efforts resulted in an incredible paper named “Attention is All You Need,” in which they use this “attention mechanism” to describe the architecture we know as “Transformers.”
In short, this work enabled neural networks to focus on specific parts of the input data while ‘ignoring’ others. Humans tend to filter information to focus on what interests us (and not to be overwhelmed); this is pretty much what this architecture replicates.
Since then, many AI models have been created based on the Transformers architecture, especially in Natural Language Processing (noticeably for text data). All models from here on mentioned are also based on Transformers (including ChatGPT).
The First GPT
As you can see in the image above, it was a long ride. Before 2018, most models were trained for specific tasks (such as question answering, sentiment classification, etc.).
This changed with the “first GPT” introduced by OpenAI, which proposed a Generative Pre-training Transformer (that’s why “GPT”) process: the model should first learn from “any text” (without a task in mind), to only then get fine-tuned to a specific task.
Now that a model could learn by using “unlabeled data,” we could feed massive datasets during pre-training. This approach resulted in the model needing fewer examples during the fine-tuning step to learn a specific task than predecessors.
Even though many other models were introduced after the first GPT, most of the basis remained the same. However, in each iteration, they started to increase the model architecture (which makes it harder to train — more examples, more costly)
GPT-1 had 117 million params, trained on ~5GB dataset, est. U$500 training
GPT-2 had 1.2 billion params, trained on ~40GB dataset est. U$43k training
GPT-3 had 175 billion params, trained on ~45TB dataset, est. U$4.6M training
GPT-3 is the latest “family” of models from OpenAI, and ChatGPT is simply a variant of this GPT-3 fine-tuned on the “dialogue” task (chatting).
Ta-dah! This is the history of ChatGPT. Hopefully, it is clear that, even though the media is blowing up with news about ChatGPT, it resulted from a continual effort to learn how to train larger models more effectively.
All these big models are also known as “Large Language Models” (LLMs).
But hold on! I will get to its applications and limitations.
Quick disclaimer: There WERE model architecture and training changes in each version, and I will not mention them for simplicity. Nevertheless, those changes are irrelevant to most of us to differentiate the models now.
Limitations and Applications
I started using ChatGPT when it launched because I already had an invite (I previously used GPT-3 and DALL-E). My main tip is to provide specific prompts to receive unique answers. Otherwise, results can look generic.
For instance, instead of “Ideas of Data Science articles,” scoping it down to “Ideas of Data Science Articles for the General Public, focusing on tools, in a funny-style.” provides less-generic results.
Frozen in Time: After the model is trained, it can only refer to the knowledge obtained with the training dataset. Therefore, new information that can appear (e.g., news on the internet) will be alien to it. So it does not provide some utility that other more basic virtual assistants do.
Dialogue-limited Context: Although you can refer to what was previously mentioned in the chat, it cannot consider other contextual information (previous searches, location, response time, etc.)
Correctness: Because it’s a language model, it generates text based on the probability of word co-occurrences given the prompts. In the end, it is not trying to provide the correct info, so it would instead give a “convincing but incorrect” answer over a not convincing but accurate answer. This is why correction like the one below happens.
My favorite Applications
Brainstorming: ChatGPT can apply general frameworks to provide insightful examples if you get specific about what you want. (e.g., list ideas for a youtube video title regarding X)
Summarization: Either to summarize a topic or a concept, ChatGPT can give good answers to things such as “Explain Quantum Physics like I am 5 years old”. Still, you should be careful as sometimes the explanations are incorrect (as explained above) — double-check them.
Boilerplate: It can provide an excellent boilerplate code. Generally, it still has many issues spitting out incomplete or incorrect code, so I would not recommend a “copy and paste” approach. But it can save time by giving you the overall structure.
Google, Meta & Microsoft
Let me correct the public perception that only OpenAI is behind this and that every other company is lost. First, much of the research basis comes from other AI Labs, and secondly, they have even released similar products:
Notably, Big Techs are part of this revolution. Google is the one that started the foundation for Transformers back in 2017. Meta is the one that created PyTorch, the framework OpenAI and others use to build these LLMs. So, we shouldn’t disregard their contributions.
At the same time, it is undeniable that ChatGPT's strategy worked to grab a ton of media attention. In contrast, LaMBDA and Blenderbot were considered “more boring” when released, which is possibly correlated with those companies trying to be conservative when publishing a new AI that people could use in malicious ways (Google and Meta already have sort of image issues regarding data privacy and so on, so they take this seriously)
If we look at ChatGPT, when it launched, it was straightforward to try to “list potential websites to exploit” and “write a code to exploit a vulnerability, X.” Now they are already correcting this issue, trying to avoid some of those malicious prompts.
ChatGPT made everyone realize that this tech is valuable and should be part of our future. I am excited to see what these companies will release this year, and I will cover each of them as they release in this Newsletter.
The New AI Startup Wave
OpenAI launched the “OpenAI Startup Fund,” in which they have allocated $100M to invest in AI-based startups, supporting them with their released tech. So far, they have invested in the following companies:
Descript: Using AI for video editing — as simple as editing a text document.
Harvey: AI for Legal Workflows (research, drafting, analysis, comms, etc.)
Mem: Self-organizing workspace, connecting notes, meetings, and more.
Speak: AI tutor to teach languages (pronunciation, grammar, vocabulary, etc.)
Of course, many other AI-focused startups are out there, and I expect many more to surge this year and the following.
Even for companies not focused on AI, I can see many pivoting to make it a focus, and it will look like a race of “who will be the AI leader in each field?” (e.g., video editing, music generation, art, etc.)
Will AI replace all jobs?
As I’ve demonstrated in this article, AI still has a long way to go to provide near-perfect output that would justify replacing someone. ChatGPT is heavily dependent on prompt design (how you formulate the question) and currently can only output text, not actions (to run a generated code, for instance, you would need to set the environment up yourself). Therefore, it is far away from being an end-to-end tool.
Nevertheless, some of those issues will be solved with time. Will it then replace our jobs? I don’t think so. But it can for sure change the way we work. For instance, most straightforward parts could be automatically generated instead of developers writing code from scratch. Then the developer would only need to tune the more complex functions. Another example would be a financial analyst using these technologies to create code that automates part of his workflow.
The possibilities are endless, but I believe in a world where it will enhance our capabilities, not make them useless.
Below is my response to: “could ChatGPT and Midjourney replace the average UI/UX designer?”
And I think that is true for most professionals. We will replace less creative and repetitive tasks with AI capabilities, whether writing boilerplate code, scheduling a haircut, or driving a car.
If you want to play with ChatGPT, you can access chat.openai.com to create an account. The free version works very well, and the paid includes better availability, faster responses, and access to new features first. I would only recommend subscribing after using the free version for a while and identifying it as a vital part of your workflow.
Finally, I asked ChatGPT to write my outro:
"In conclusion, ChatGPT is a technological marvel that will have you asking questions left and right, just to see what clever responses it will come up with next. So go ahead, strike up a conversation with ChatGPT and watch as it wows you with its wit and wisdom. Who needs human friends when you have a language model, right? 🤣 And if you want to stay up to date with the latest developments in AI and other tech advancements, don't forget to subscribe to The Overfit Newsletter! 📧"