Claude AI API

Claude AI API connects your code to Anthropic’s large-language models (LLMs) using one secure HTTPS call. Unlike many LLMs, Claude is built on Constitutional AI: rules that push every answer toward being helpful, harmless, and honest. That means fewer jailbreak scares and safer user experiences.

Claude AI API

Key numbers to remember

Feature Value (mid-2025)
Context window (max text you can send) 200 000 tokens
Max images per request 20
Batch jobs per upload 100 000
Extended-thinking token cap 64 000 (Sonnet 4)

Need exact pricing? See our dedicated page: price


Step 1 Open an Account & Grab Your Key

  1. Create a developer account in the Anthropic Console (regular Claude.ai logins won’t work for API calls).

  2. Generate an API key in Account → API keys.

  3. Store the key safely — environment variables, secret managers, or a .env file outside source control.

Tip Spin up separate keys for dev, staging, and production using Workspaces. Each workspace gets its own spend cap.


Step 2 Send Your First “Hello Claude” Request

Below is the smallest working call. Replace YOUR_KEY and run with cURL, Postman, or your favorite language.

POST https://api.anthropic.com/v1/messages
Headers:
  x-api-key: YOUR_KEY
  content-type: application/json
  anthropic-version: 2023-06-01
Body:
{
  "model": "claude-3-haiku-20240307",
  "max_tokens": 100,
  "messages": [
    { "role": "user", "content": "Hello, Claude API!" }
  ]
}

Successful JSON comes back like:

content ➜ “Hi there! How can I help?”
usage ➜ input 11 tokens, output 13 tokens

Now you’re live.


Step 3 Pick the Right Model in 60 Seconds

Model When to Use Speed Typical Cost*
Haiku 3.5 chat routing, quick Q&A ⚡ Fastest lowest
Sonnet 4 docs Q&A, coding help, smart chat 🚀 Fast mid
Opus 4 deep research, multi-step agents 🚗 Moderate highest

*Exact token prices change—check the price guide.

Rule of thumb

Prototype on Haiku → move heavy-reasoning routes to Sonnet → reserve Opus only for tasks that truly need brainpower.


Step 4 Write Prompts Claude Understands Instantly

  1. Start with the outcome
    “Write a 200-word email announcing our new analytics dashboard to non-tech CEOs.”

  2. Give context first, ask second

    <context>
    Audience: busy CEOs, no jargon.
    Tone: confident but friendly.
    </context>
    <task>
    Draft the email in 200 words.
    </task>
    
  3. Use XML-style tags — Claude is fine-tuned to respect them.

  4. Include one high-quality example

    <example>
    User asked: Rewrite “Profits soared.”  
    Claude answered: “Our profits leapt to record levels.”
    </example>
    
  5. Ask for chain-of-thought (optional)

    <thinking>Think step by step, but show it only to yourself.</thinking>
    <answer>...</answer>
    
  6. Guardrail against hallucination
    End with: “If you’re unsure, reply ‘I don’t know’.”


Step 5 Unlock Advanced Tricks (Tool Use, Extended Thinking, Caching)

5 A Tool Use in Three Lines

Add a tools array:

"tools": [
  {
    "name": "get_weather",
    "description": "Returns local temperature",
    "input_schema": { "type":"object", "properties":{ "city":{ "type":"string" }}, "required":["city"] }
  }
]

Claude will respond with:

{ "type":"tool_use", "name":"get_weather", "input":{ "city":"Madrid" } }

You run the function, send back tool_result, and Claude finishes the answer.

5 B Extended Thinking

"thinking": { "type":"enabled", "budget_tokens": 800 }

Claude allocates up to 800 tokens for private reasoning, then trims them before the next turn—so they don’t flood the context window.

5 C Prompt Caching

  • Write a long prompt chunk once (5-min or 1-hour cache).

  • Call it later with a cache reference.

  • Typical token savings: 90 % on repeated context.

  • Typical latency drop: -85 %.


Step 6 Keep It Fast & Affordable

✅ Stream every reply ("stream": true)—users see text in < 200 ms.
✅ Compact or clear history after each topic change.
✅ Use Message Batches for nightly jobs → 50 % token discount.
✅ Monitor input_tokens + output_tokens; alert if spikes exceed 20 %.
✅ Back-off on 429s with exponential jitter (the SDKs already retry twice).


Step 7 Handle Errors Without Sweat

Code Meaning Fix in Plain English
400 Bad request Check JSON keys and role order
401 Wrong key Use a fresh API key
403 No model access Enable model in Console
413 Body too big Split batch or use Files API
429 Rate limit Wait, then retry with back-off
529 Service busy Retry after a few seconds

Always log the request-id header—it’s gold for support tickets.


Step 8 Deploy on AWS or GCP (If You Want To)

Path Pros Small Gotchas
Direct API Newest models & betas arrive here first You manage auth headers
AWS Bedrock IAM, unified AWS billing Slightly different request wrapper
GCP Vertex Fits into Vertex pipelines Uses Google-style auth tokens

Anthropic’s SDKs ship Bedrock and Vertex clients, so switching back and forth is mostly swapping the import.


Quick-Reference Cheat Sheet

Need to Do This Send or Set This
Stream output "stream": true
Limit tokens "max_tokens": 1024
Force JSON answer <answer format="json"> in the prompt
Make Claude think hard "thinking": { "type":"enabled", "budget_tokens": 800 }
Cut latency Stream + prompt caching
Slash costs Prototype on Haiku → batch offline tasks

Final Word

Mastering the Claude AI API boils down to three habits:

  1. Clear, tagged prompts—tell Claude exactly what you want.

  2. Smart cost levers—streaming, caching, batching.

  3. Solid error hygiene—log request-ids, back-off on 429s.

Get those right and you’ll launch safer, faster, and cheaper AI features than the competition—without wrestling with endless config files. Enjoy building!