The latest large multimodal model from OpenAI
OpenAI, the creator of ChatGPT has finally revealed GPT-4, capable of accepting text or image inputs. GPT-4 is a multimodal model. It accepts both image and text inputs for text output.
OpenAI claims the model is “more creative and collaborative than ever before” and “can solve difficult problems with greater accuracy.” It can parse both text and image input, though it can only respond via text. OpenAI cautions that the systems retain many of the same problems as earlier language models, including a tendency to make up information (or “hallucinate”) and the capacity to generate violent and harmful text.
GPT-4 is available today to OpenAI’s paying users via ChatGPT Plus (with a usage cap), and developers can sign up on a waitlist to access the API.
GPT-4 powered apps
OpenAI says it’s already partnered with a number of companies to integrate GPT-4 into their products, including Duolingo, Stripe, and Khan Academy. The new model is powering Microsoft’s Bing chatbot. It will also be accessible as an API for developers to build on. (There is a waitlist here, which OpenAI says will start admitting users today.)
What about GPT-5? ?
GPT-5 might be able to pass the Turing test. But probably not worth the effort.
Pricing is $0.03 per 1,000 “prompt” tokens (about 750 words) and $0.06 per 1,000 “completion” tokens (again, about 750 words). Tokens represent raw text; for example, the word “fantastic” would be split into the tokens “fan,” “tas” and “tic.” Prompt tokens are the parts of words fed into GPT-4 while completion tokens are the content generated by GPT-4.