Claude AI API connects your code to Anthropic’s large-language models (LLMs) using one secure HTTPS call. Unlike many LLMs, Claude is built on Constitutional AI: rules that push every answer toward being helpful, harmless, and honest. That means fewer jailbreak scares and safer user experiences.

Key numbers to remember
Feature | Value (mid-2025) |
---|---|
Context window (max text you can send) | 200 000 tokens |
Max images per request | 20 |
Batch jobs per upload | 100 000 |
Extended-thinking token cap | 64 000 (Sonnet 4) |
Need exact pricing? See our dedicated page: price
Step 1 Open an Account & Grab Your Key
-
Create a developer account in the Anthropic Console (regular Claude.ai logins won’t work for API calls).
-
Generate an API key in Account → API keys.
-
Store the key safely — environment variables, secret managers, or a .env file outside source control.
Tip Spin up separate keys for dev, staging, and production using Workspaces. Each workspace gets its own spend cap.
Step 2 Send Your First “Hello Claude” Request
Below is the smallest working call. Replace YOUR_KEY
and run with cURL, Postman, or your favorite language.
POST https://api.anthropic.com/v1/messages
Headers:
x-api-key: YOUR_KEY
content-type: application/json
anthropic-version: 2023-06-01
Body:
{
"model": "claude-3-haiku-20240307",
"max_tokens": 100,
"messages": [
{ "role": "user", "content": "Hello, Claude API!" }
]
}
Successful JSON comes back like:
content ➜ “Hi there! How can I help?”
usage ➜ input 11 tokens, output 13 tokens
Now you’re live.
Step 3 Pick the Right Model in 60 Seconds
Model | When to Use | Speed | Typical Cost* |
---|---|---|---|
Haiku 3.5 | chat routing, quick Q&A | ⚡ Fastest | lowest |
Sonnet 4 | docs Q&A, coding help, smart chat | 🚀 Fast | mid |
Opus 4 | deep research, multi-step agents | 🚗 Moderate | highest |
*Exact token prices change—check the price guide.
Rule of thumb
Prototype on Haiku → move heavy-reasoning routes to Sonnet → reserve Opus only for tasks that truly need brainpower.
Step 4 Write Prompts Claude Understands Instantly
-
Start with the outcome
“Write a 200-word email announcing our new analytics dashboard to non-tech CEOs.” -
Give context first, ask second
<context> Audience: busy CEOs, no jargon. Tone: confident but friendly. </context> <task> Draft the email in 200 words. </task>
-
Use XML-style tags — Claude is fine-tuned to respect them.
-
Include one high-quality example
<example> User asked: Rewrite “Profits soared.” Claude answered: “Our profits leapt to record levels.” </example>
-
Ask for chain-of-thought (optional)
<thinking>Think step by step, but show it only to yourself.</thinking> <answer>...</answer>
-
Guardrail against hallucination
End with: “If you’re unsure, reply ‘I don’t know’.”
Step 5 Unlock Advanced Tricks (Tool Use, Extended Thinking, Caching)
5 A Tool Use in Three Lines
Add a tools array:
"tools": [
{
"name": "get_weather",
"description": "Returns local temperature",
"input_schema": { "type":"object", "properties":{ "city":{ "type":"string" }}, "required":["city"] }
}
]
Claude will respond with:
{ "type":"tool_use", "name":"get_weather", "input":{ "city":"Madrid" } }
You run the function, send back tool_result
, and Claude finishes the answer.
5 B Extended Thinking
"thinking": { "type":"enabled", "budget_tokens": 800 }
Claude allocates up to 800 tokens for private reasoning, then trims them before the next turn—so they don’t flood the context window.
5 C Prompt Caching
-
Write a long prompt chunk once (5-min or 1-hour cache).
-
Call it later with a cache reference.
-
Typical token savings: 90 % on repeated context.
-
Typical latency drop: -85 %.
Step 6 Keep It Fast & Affordable
✅ Stream every reply ("stream": true
)—users see text in < 200 ms.
✅ Compact or clear history after each topic change.
✅ Use Message Batches for nightly jobs → 50 % token discount.
✅ Monitor input_tokens
+ output_tokens
; alert if spikes exceed 20 %.
✅ Back-off on 429s with exponential jitter (the SDKs already retry twice).
Step 7 Handle Errors Without Sweat
Code | Meaning | Fix in Plain English |
---|---|---|
400 | Bad request | Check JSON keys and role order |
401 | Wrong key | Use a fresh API key |
403 | No model access | Enable model in Console |
413 | Body too big | Split batch or use Files API |
429 | Rate limit | Wait, then retry with back-off |
529 | Service busy | Retry after a few seconds |
Always log the request-id
header—it’s gold for support tickets.
Step 8 Deploy on AWS or GCP (If You Want To)
Path | Pros | Small Gotchas |
---|---|---|
Direct API | Newest models & betas arrive here first | You manage auth headers |
AWS Bedrock | IAM, unified AWS billing | Slightly different request wrapper |
GCP Vertex | Fits into Vertex pipelines | Uses Google-style auth tokens |
Anthropic’s SDKs ship Bedrock and Vertex clients, so switching back and forth is mostly swapping the import.
Quick-Reference Cheat Sheet
Need to Do This | Send or Set This |
---|---|
Stream output | "stream": true |
Limit tokens | "max_tokens": 1024 |
Force JSON answer | <answer format="json"> in the prompt |
Make Claude think hard | "thinking": { "type":"enabled", "budget_tokens": 800 } |
Cut latency | Stream + prompt caching |
Slash costs | Prototype on Haiku → batch offline tasks |
Final Word
Mastering the Claude AI API boils down to three habits:
-
Clear, tagged prompts—tell Claude exactly what you want.
-
Smart cost levers—streaming, caching, batching.
-
Solid error hygiene—log request-ids, back-off on 429s.
Get those right and you’ll launch safer, faster, and cheaper AI features than the competition—without wrestling with endless config files. Enjoy building!