This document outlines the various OpenAI products, categorizing them based on modality (text, multimodal), multi-model reasoning, support for Model Context Protocol (MCP) or tool routing, and the distinction between client-side and service-side agentic workflows.
Conversational & General-Purpose Agents
These products serve as natural language interfaces for broad tasks, including search, reasoning, writing, coding, and tool invocation.
- ChatGPT (chat.openai.com): Interactive chat interface for everyday users and professionals. Powers productivity, education, and creative tasks.
- Custom GPTs: Tailored GPT instances for specialized workflows or use cases, integrated with custom instructions and tools.
Developer APIs
These offerings expose OpenAI models to developers for integration into apps and workflows.
- GPT-4 API: Access to GPT-4 and GPT-4o models for text and multimodal applications.
- Assistant API: A managed orchestration layer for tools, memory, retrieval, and complex agentic behavior.
- OpenAI Labs / Playground: UI-driven access to test prompts, explore model behavior.
Advanced Multi-Model Reasoning Models
Optimized for complex coordination of capabilities such as reasoning, coding, memory, and vision.
- O3: Enhanced multi-model synthesis engine powering ChatGPT and Assistant workflows.
- O4-mini: Optimized for tool usage and dense logic; supports deeply nested reasoning workflows.
Modality-Specific Capabilities
Models that specialize in input/output beyond text.
- Whisper API: Transcribes speech to text with multilingual support — ideal for voice-driven systems.
- DALL·E: Generates and edits images from textual prompts — suitable for marketing, prototyping, education.
- Sora: Text-to-video model for storytelling, advertising, simulation, and research on video generation.
Coding and Programming Interfaces
Designed for structured code generation, completion, and explanation.
- Codex (Legacy): Earlier code-focused model; still relevant for embedded developer tooling.
Note: What does the "o" mean in GPT-4o, O3, and O4?
The letter "o" in OpenAI model names such as GPT-4o, O3, and O4 refers to "omni", highlighting the model’s native multimodal capabilities. Unlike earlier versions that bolted vision or audio on top of a primarily text-based model, "omni" models are trained to natively and fluidly handle text, vision, and audio in a unified architecture.
- GPT-4o: The first GPT model with native support for simultaneous reasoning across modalities (e.g., interpreting an image while responding in natural language).
- O3 / O4: Internal code names for production-grade reasoning models optimized for routing, tool usage, and seamless integration into applications like ChatGPT and Assistant API.
The "o" branding helps distinguish these newer generation models from older GPT-4 variants that lacked native multimodal integration or unified context management.
Product / Service | Modality | Multi-Model Reasoning | Deep Research Capabilities | Agentic / Tooling Support | Client-Side Routing | Service-Side Routing |
---|---|---|---|---|---|---|
ChatGPT (chat.openai.com) | Text, Multimodal (Pro) | ✅ (GPT-4o with vision/audio) | Moderate (via model capabilities) | ✅ (via actions / plug-ins / tools) | Limited (via custom GPTs) | ✅ (via SaaS UI + plug-ins + retrieval) |
GPT-4 API | Text (and via GPT-4o, Multimodal) | ✅ | High (via external embedding, RAG, fine-tuning) | ✅ (via function calling) | ✅ (user-defined agents + routing) | ❌ |
OpenAI Assistant API | Text, Multimodal (GPT-4o) | ✅ | High (with integrated retrieval & tools) | ✅ (tools, function calling, retrieval, code interpreter) | ❌ | ✅ (OpenAI manages routing) |
O3 (April 2025) | Text, Multimodal | ✅ (advanced multi-model synthesis) | High | ✅ (tools, context memory, function calls) | ❌ | ✅ (ChatGPT/Assistant integrated) |
O4-mini (April 2025) | Text, Multimodal | ✅ (optimized for reasoning & coding) | High | ✅ (internal tool orchestration, MCP-ready) | ❌ | ✅ (via Assistant API and SaaS) |
Whisper API | Audio to Text | ❌ | ❌ | ❌ | ✅ (can be embedded in agent workflows) | ❌ |
DALL·E | Text-to-Image | ❌ | ❌ | ✅ (Image editing in service) | ❌ | ✅ |
Sora | Text-to-Video | ✅ (video understanding/generation pipeline) | Medium | ❌ (currently preview only) | ❌ | ✅ (cloud-run video generation) |
Codex (Legacy) | Text/code | ❌ | Moderate | ✅ (function generation, code assist) | ✅ | ❌ |
Custom GPTs | Text, Multimodal (if GPT-4o) | ✅ | Medium | ✅ (tools, APIs, memory) | ✅ (prompt + file config) | ✅ (executed in SaaS ChatGPT infra) |
OpenAI Labs / Playground | Text (GPT-3.5/4) | ✅ | Moderate | ❌ | ✅ | ❌ |
Key Concepts
- Multimodal: Combines text, image, audio inputs or outputs.
- Multi-Model Reasoning: Supports switching or combining different models for coherent reasoning (e.g., image to text).
- Deep Research Capabilities: Useful for building systems with embedded domain knowledge, custom embeddings, fine-tuning.
- Agentic Tooling: Integration of tools, function calls, or plugins triggered via model reasoning.
- Client-Side Routing: Model/tool orchestration handled on the user’s environment (e.g., LangChain, Semantic Kernel).
- Service-Side Routing: OpenAI-hosted routing of function/tool calls (e.g., Assistant API tools, plug-in invocation).
What is Multi-Model Reasoning?
Multi-model reasoning refers to the capability of a system to dynamically coordinate and utilize multiple AI models—potentially with different specializations—to solve complex tasks. This can include combining:
- A language model (for text generation)
- A vision model (for image understanding)
- A code model (for logic or math)
- A speech model (for voice input/output)
Instead of a single monolithic model, these components may be orchestrated either internally (via service-managed routing, as with OpenAI Assistant API) or externally (via client-managed libraries like LangChain). The goal is seamless handoff between specialized model capabilities to produce richer and more accurate responses.
This approach is particularly useful when solving:
- Visual QA: Image + text model reasoning
- Voice command systems: Audio + text + tool invocation
- Coding agents: Text + code + tool routing for execution
MCP Support Notes
- Assistant API: Most native support for MCP-style routing—server-side orchestration of tools and documents.
- ChatGPT: Indirectly supports MCP-like features via memory, plug-ins, and custom GPT configurations.
- GPT-4 API: Requires manual MCP logic using external orchestration libraries.
- O3 / O4-mini: Designed to support real-time tool usage and dynamic context fusion, aligning well with MCP patterns.