GPT-5.5 API: Pricing, Rate Limits & Quick Start Guide

GPT-5.5 is now live in the OpenAI API. Here's the actual pricing ($5/$30 per MTok), rate limits by tier, what changed from GPT-4o, and how to make your first call.

GPT-5.5 API Is Now Available: Pricing, Rate Limits, and Quick Start

GPT-5.5 is now accessible through the OpenAI API. The model launched publicly on April 23, 2026, and API access opened alongside the consumer rollout — no waitlist, available to all API tiers.

Here is what you need to know to start building.

Pricing

Token type	Price per million tokens
Input	$5.00
Output	$30.00
Cached input	$2.50 (50% discount)

The output-to-input price ratio (6:1) is higher than GPT-4o (3:1), reflecting GPT-5.5's significantly longer and more structured outputs — the model generates more tokens per task by default, particularly on agentic and coding tasks.

Compared to other current frontier models:

Model	Input	Output
GPT-5.5	$5.00	$30.00
Claude 4 Opus	$15.00	$75.00
DeepSeek V4 (hosted)	~$0.30	~$1.20
Gemini 3.1 Pro	$3.50	$10.50

GPT-5.5 is not the most expensive frontier model at these prices, but it is substantially more expensive than DeepSeek V4 for high-volume inference.

What's New in GPT-5.5 vs GPT-4o

GPT-5.5 represents a significant capability jump from GPT-4o, particularly in:

Agentic task completion:
Terminal-Bench score of 82.7% — a benchmark measuring real terminal command sequences — compared to GPT-4o's ~61%. In practice, this shows up as more reliable multi-step task execution without mid-task failures.

Software engineering:
SWE-Bench Pro score of 58.6%, up from GPT-4o's ~38%. GPT-5.5 handles real-world GitHub issues at a meaningfully higher success rate, making it viable for autonomous PR generation on moderate-complexity issues.

Instruction following:
Structured output compliance and multi-constraint instruction following are substantially improved. GPT-5.5 is less likely to silently drop constraints in complex system prompts.

Context window:
128K tokens, same as GPT-4o. There is no 1M-token context option at launch.

Rate Limits by Tier

Tier	RPM	TPM	Daily token limit
Tier 1 ($5+ spent)	500	200,000	1,000,000
Tier 2 ($50+ spent)	1,000	500,000	5,000,000
Tier 3 ($100+ spent)	2,000	1,000,000	Unlimited
Tier 4 ($250+ spent)	5,000	2,000,000	Unlimited
Tier 5 (Enterprise)	Custom	Custom	Custom

RPM = requests per minute, TPM = tokens per minute. Figures are estimated from OpenAI's standard tier scaling pattern.

For most development and moderate production workloads, Tier 2 is sufficient. Tier 3+ is relevant for high-volume agentic pipelines where parallel model calls are the norm.

Quick Start

Basic completion

from openai import OpenAI

client = OpenAI()  # uses OPENAI_API_KEY env var

response = client.chat.completions.create(
    model="gpt-5.5",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Write a Python function that validates email addresses."}
    ],
    max_tokens=1024
)

print(response.choices[0].message.content)

Structured output (JSON mode)

response = client.chat.completions.create(
    model="gpt-5.5",
    messages=[
        {"role": "user", "content": "Extract the name, email, and company from this text: ..."}
    ],
    response_format={"type": "json_object"},
    max_tokens=512
)

Streaming

stream = client.chat.completions.create(
    model="gpt-5.5",
    messages=[{"role": "user", "content": "Explain Mixture-of-Experts architecture."}],
    stream=True,
    max_tokens=2048
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)

Migrating from GPT-4o

Switching model="gpt-4o" to model="gpt-5.5" is a one-line change — the API schema is identical.

Things to watch when migrating:

Output length: GPT-5.5 generates longer responses by default. If you rely on max_tokens to stay within budget, audit your limits — you may need to increase them to avoid truncation, or reduce them if you're optimizing for cost.
Latency: GPT-5.5 is slower than GPT-4o on short tasks. First-token latency is comparable; total generation time is higher due to longer outputs. Streaming is more important than it was with 4o.
Cost modeling: At 6× the output price, workloads that were cheap on GPT-4o become expensive on 5.5. Benchmark your actual token usage before committing to GPT-5.5 for all use cases — for many tasks, GPT-4o or DeepSeek V4 remains the better cost-quality tradeoff.

Using GPT-5.5 Through AnyCap

If you want to use GPT-5.5 without managing the OpenAI API directly — or if you want to route between GPT-5.5 and cheaper alternatives like DeepSeek V4 based on task complexity — AnyCap's unified model API handles this through a single endpoint.

import anycap

client = anycap.Client()

# Use GPT-5.5 for complex tasks, DeepSeek V4 for high-volume inference
response = client.generate(
    model="gpt-5.5",          # or "deepseek-v4", "claude-4-sonnet", etc.
    messages=[{"role": "user", "content": "..."}],
    max_tokens=2048
)

AnyCap provides unified billing, automatic rate limit handling, and model fallback — useful for production deployments where you're mixing models by task type.

Bottom Line

GPT-5.5 is a meaningful capability upgrade from GPT-4o, particularly for agentic and coding workloads. The $5/$30 pricing is accessible for development and moderate production use, though high-volume inference costs will drive most teams toward DeepSeek V4 or Gemini 3.1 Pro for commodity tasks.

For software engineering agents, automated code review, and complex multi-step instruction following, GPT-5.5 is currently the best option at this price point.

→ GPT-5.5: What Developers Need to Know Right Now
→ DeepSeek V4 Is Now Live: Weights, Benchmarks & First Impressions
→ AnyCap Unified Model API