A more truthful and less toxic GPT-3

About InstructGPT

InstructGPT is an improved version of the OpenAI API, which is powered by GPT-3 language models. These models use text prompts to complete natural language tasks. Unfortunately, these same models can also output untruthful, toxic, or offensive content, as they are trained to predict the next word based on a large dataset of online text, rather than to carry out the desired language task. InstructGPT models are better at following instructions and creating less fictitious content, as well as displaying a reduced level of toxicity. Even though InstructGPT models have significantly fewer parameters than GPT-3 models (100x fewer parameters), our labelers prefer the outputs from the smaller model.


InstructGPT screenshots

