MultimodalMultimodalGPT-4 FamilyEnriched

GPT-4o

GPT-4o ("omni") is OpenAI's flagship multimodal model that natively supports text, vision, and audio. It is optimized for fast, low-latency interaction while maintaining GPT-4-level intelligence across reasoning, coding, and knowledge tasks.

by OpenAIReleased 2024-05-13Proprietary
Context Window
128K
MMLU
82.0%
HumanEval
87.2%
API Access
Available

Key Capabilities

  • +High-level reasoning across STEM, humanities, and social sciences
  • +Strong code generation and debugging in multiple languages
  • +Native multimodal understanding of text and images
  • +Low-latency, streaming-friendly responses
  • +Good performance on math and quantitative reasoning tasks
  • +Robust instruction following and tool-use orchestration

Limitations

  • -Proprietary, closed weights and training data
  • -Knowledge cutoff in late 2023; lacks awareness of newer events
  • -Can still hallucinate or produce incorrect code or facts
  • -Safety filters may block some legitimate edge-case content
  • -No public parameter count disclosed

Benchmark Performance

math

math

MATH

76.6%
math

Grade School Math 8K

95.8%

reasoning

reasoning

Graduate-Level Google-Proof Q&A

53.6%
reasoning

BIG-Bench Hard

87.5%
reasoning

Instruction Following Eval

85.4%
reasoningsource

Massive Multitask Language Understanding

88.7%

conversation

conversationsource

Chatbot Arena Elo

1286.0Elo

coding

coding

HumanEval

90.2%

Alternatives & Comparisons

Strong reasoning and writing with competitive multimodal capabilities.

Strengths
  • + Very strong general reasoning
  • + Competitive coding performance
Weaknesses
  • - Proprietary and closed weights
  • - Context window and pricing differ

Extremely long context window and tight integration with Google ecosystem.

Strengths
  • + Extremely long context window
  • + Strong multimodal capabilities
Weaknesses
  • - Proprietary
  • - Pricing and quotas may vary by region

Other GPT-4 Models