Overview of OpenAI Product Offerings

Overview of current OpenAI services and products

This document outlines the various OpenAI products, categorizing them based on modality (text, multimodal), multi-model reasoning, support for Model Context Protocol (MCP) or tool routing, and the distinction between client-side and service-side agentic workflows.


Conversational & General-Purpose Agents

These products serve as natural language interfaces for broad tasks, including search, reasoning, writing, coding, and tool invocation.

  • ChatGPT (chat.openai.com): Interactive chat interface for everyday users and professionals. Powers productivity, education, and creative tasks.
  • Custom GPTs: Tailored GPT instances for specialized workflows or use cases, integrated with custom instructions and tools.

Developer APIs

These offerings expose OpenAI models to developers for integration into apps and workflows.

  • GPT-4 API: Access to GPT-4 and GPT-4o models for text and multimodal applications.
  • Assistant API: A managed orchestration layer for tools, memory, retrieval, and complex agentic behavior.
  • OpenAI Labs / Playground: UI-driven access to test prompts, explore model behavior.

Advanced Multi-Model Reasoning Models

Optimized for complex coordination of capabilities such as reasoning, coding, memory, and vision.

  • O3: Enhanced multi-model synthesis engine powering ChatGPT and Assistant workflows.
  • O4-mini: Optimized for tool usage and dense logic; supports deeply nested reasoning workflows.

Modality-Specific Capabilities

Models that specialize in input/output beyond text.

  • Whisper API: Transcribes speech to text with multilingual support — ideal for voice-driven systems.
  • DALL·E: Generates and edits images from textual prompts — suitable for marketing, prototyping, education.
  • Sora: Text-to-video model for storytelling, advertising, simulation, and research on video generation.

Coding and Programming Interfaces

Designed for structured code generation, completion, and explanation.

  • Codex (Legacy): Earlier code-focused model; still relevant for embedded developer tooling.

Note: What does the "o" mean in GPT-4o, O3, and O4?

The letter "o" in OpenAI model names such as GPT-4o, O3, and O4 refers to "omni", highlighting the model’s native multimodal capabilities. Unlike earlier versions that bolted vision or audio on top of a primarily text-based model, "omni" models are trained to natively and fluidly handle text, vision, and audio in a unified architecture.

  • GPT-4o: The first GPT model with native support for simultaneous reasoning across modalities (e.g., interpreting an image while responding in natural language).
  • O3 / O4: Internal code names for production-grade reasoning models optimized for routing, tool usage, and seamless integration into applications like ChatGPT and Assistant API.

The "o" branding helps distinguish these newer generation models from older GPT-4 variants that lacked native multimodal integration or unified context management.


Product / ServiceModalityMulti-Model ReasoningDeep Research CapabilitiesAgentic / Tooling SupportClient-Side RoutingService-Side Routing
ChatGPT (chat.openai.com)Text, Multimodal (Pro)✅ (GPT-4o with vision/audio)Moderate (via model capabilities)✅ (via actions / plug-ins / tools)Limited (via custom GPTs)✅ (via SaaS UI + plug-ins + retrieval)
GPT-4 APIText (and via GPT-4o, Multimodal)High (via external embedding, RAG, fine-tuning)✅ (via function calling)✅ (user-defined agents + routing)
OpenAI Assistant APIText, Multimodal (GPT-4o)High (with integrated retrieval & tools)✅ (tools, function calling, retrieval, code interpreter)✅ (OpenAI manages routing)
O3 (April 2025)Text, Multimodal✅ (advanced multi-model synthesis)High✅ (tools, context memory, function calls)✅ (ChatGPT/Assistant integrated)
O4-mini (April 2025)Text, Multimodal✅ (optimized for reasoning & coding)High✅ (internal tool orchestration, MCP-ready)✅ (via Assistant API and SaaS)
Whisper APIAudio to Text✅ (can be embedded in agent workflows)
DALL·EText-to-Image✅ (Image editing in service)
SoraText-to-Video✅ (video understanding/generation pipeline)Medium❌ (currently preview only)✅ (cloud-run video generation)
Codex (Legacy)Text/codeModerate✅ (function generation, code assist)
Custom GPTsText, Multimodal (if GPT-4o)Medium✅ (tools, APIs, memory)✅ (prompt + file config)✅ (executed in SaaS ChatGPT infra)
OpenAI Labs / PlaygroundText (GPT-3.5/4)Moderate

Key Concepts

  • Multimodal: Combines text, image, audio inputs or outputs.
  • Multi-Model Reasoning: Supports switching or combining different models for coherent reasoning (e.g., image to text).
  • Deep Research Capabilities: Useful for building systems with embedded domain knowledge, custom embeddings, fine-tuning.
  • Agentic Tooling: Integration of tools, function calls, or plugins triggered via model reasoning.
  • Client-Side Routing: Model/tool orchestration handled on the user’s environment (e.g., LangChain, Semantic Kernel).
  • Service-Side Routing: OpenAI-hosted routing of function/tool calls (e.g., Assistant API tools, plug-in invocation).

What is Multi-Model Reasoning?

Multi-model reasoning refers to the capability of a system to dynamically coordinate and utilize multiple AI models—potentially with different specializations—to solve complex tasks. This can include combining:

  • A language model (for text generation)
  • A vision model (for image understanding)
  • A code model (for logic or math)
  • A speech model (for voice input/output)

Instead of a single monolithic model, these components may be orchestrated either internally (via service-managed routing, as with OpenAI Assistant API) or externally (via client-managed libraries like LangChain). The goal is seamless handoff between specialized model capabilities to produce richer and more accurate responses.

This approach is particularly useful when solving:

  • Visual QA: Image + text model reasoning
  • Voice command systems: Audio + text + tool invocation
  • Coding agents: Text + code + tool routing for execution

MCP Support Notes

  • Assistant API: Most native support for MCP-style routing—server-side orchestration of tools and documents.
  • ChatGPT: Indirectly supports MCP-like features via memory, plug-ins, and custom GPT configurations.
  • GPT-4 API: Requires manual MCP logic using external orchestration libraries.
  • O3 / O4-mini: Designed to support real-time tool usage and dynamic context fusion, aligning well with MCP patterns.
Originally posted:
Filed Under:
machine-learning
transformer-models
language-models
open-ai
ai