Discover the cutting-edge of AI at OpenAI.com with GPT-4 Omni, a groundbreaking flagship model that seamlessly integrates real-time reasoning across audio, vision, and text domains, redefining the future of artificial intelligence.

What is GPT-4o?

GPT-4o represents a significant advancement towards more natural human-computer interaction. It accepts input in any combination of text, audio, image, and video formats and generates outputs in text, audio, and image.

How does GPT-4o work?

GPT-4o operates as a unified model trained end-to-end across text, vision, and audio modalities, processing all inputs and outputs through a single neural network.

Features of GPT-4o

Processes audio inputs in as little as 232 milliseconds, averaging 320 milliseconds, akin to human conversational response times.
Matches GPT-4 Turbo performance in English text and code, with substantial enhancements in non-English text.
Offers faster processing and is 50% more cost-effective in API usage compared to GPT-4 Turbo.
Demonstrates superior capabilities in understanding vision and audio inputs compared to existing models.

Model Evaluations

GPT-4o achieves GPT-4 Turbo-level performance in text comprehension, reasoning, and coding capabilities, while setting new benchmarks in multilingual, audio, and visual understanding.

Model Safety and Limitations

GPT-4o incorporates built-in safety measures across modalities, including data filtering during training and refining model behavior post-training. Additional safety systems ensure controlled outputs, particularly in voice applications.

Model Availability

GPT-4o represents OpenAI's latest breakthrough in expanding the practical usability of deep learning models. Extensive research efforts have enhanced efficiency across all operational layers, making a GPT-4 level model more widely accessible.

GPT-4o