ChatGPT (generative pre-training) is an AI chatbot system akin to the automated customer support chats seen online. It is, however, a massive step up as it isn’t limited to several answer options that can be frustrating.
Trained by artificial intelligence and machine learning, the application can provide information and answer questions through a conversation. The BBC reports that During development of ChatGPT, an early version of was fine-tuned via “conversations” with humans.
Twitter owner Elon Musk, who is no longer part of OpenAI’s board, claimed in a tweet that the system also learned from access to Twitter data and that he had paused access “for now.”
The dialogue format allows follow-up questions and, most lifelike of all, admits its mistakes and rejects inappropriate requests, writes the company statement.
The full statement reads: “We’ve trained a model called ChatGPT, which interacts in a conversational way. The dialogue format makes it possible for ChatGPT to answer follow-up questions, admit its mistakes, challenge incorrect premises, and reject inappropriate requests.
“ChatGPT is a sibling model to InstructGPT, which is trained to follow an instruction in a prompt and provide a detailed response. We are excited to introduce ChatGPT to get users’ feedback and learn about its strengths and weaknesses.”
ChatGPT can do more than offer you a conversation; OpenAI has equipped it with the ability to correct grammar, summarize difficult text into simpler concepts, convert movie titles into emojis, and even fix bugs in Python code.
Read more at Open AI Blog
OpenAI trained this model using Reinforcement Learning from Human Feedback (RLHF), using the same methods as InstructGPT, but with slight differences in the data collection setup. They trained an initial model using supervised fine-tuning: human AI trainers provided conversations in which they played both sides—the user and an AI assistant. They gave the trainers access to model-written suggestions to help them compose their responses.
To create a reward model for reinforcement learning, we needed to collect comparison data, which consisted of two or more model responses ranked by quality. To collect this data, OpenAI took conversations that AI trainers had with the chatbot. They randomly selected a model-written message, sampled several alternative completions, and had AI trainers rank them. Using these reward models, they can fine-tune the model using Proximal Policy Optimization. OpenAI performed several iterations of this process.
ChatGPT is fine-tuned from a model in the GPT-3.5 series, which finished training in early 2022. You can learn more about the 3.5 series here. ChatGPT and GPT 3.5 were trained on an Azure AI supercomputing infrastructure.