
Qwen3.7-Max API: How to Call Qwen 3.7 Max
The Qwen3.7-Max API is now documented through the Qwen release materials and Qwen Cloud model card. If you are searching for qwen-3.7 API, qwen3.7 API, or qwen 3.7 API, the important first detail is the model name.
For Model Studio compatible-mode calls, the release example uses:
qwen3.7-maxThe Qwen Cloud model card also lists a dated snapshot:
qwen3.7-max-2026-05-20Use the stable alias when you want the current route. Use the dated ID when your provider exposes it and you need reproducibility.
Try the model first on the Qwen3.7-Max page.
Official Access Paths
The first-party path is Alibaba Cloud Model Studio. The official Qwen3.7-Max release shows OpenAI-compatible chat completions, responses APIs, and an Anthropic-compatible interface for agent tools.
Common compatible-mode base URLs:
| Region | Base URL |
|---|---|
| Beijing | https://dashscope.aliyuncs.com/compatible-mode/v1 |
| Singapore | https://dashscope-intl.aliyuncs.com/compatible-mode/v1 |
| US Virginia | https://dashscope-us.aliyuncs.com/compatible-mode/v1 |
The Qwen Cloud model card also shows a DashScope SDK example using:
https://dashscope-intl.aliyuncs.com/api/v1For most app integrations, the OpenAI-compatible endpoint is the easiest migration path.
Minimal Python Example
from openai import OpenAI
import os
client = OpenAI(
api_key=os.environ["DASHSCOPE_API_KEY"],
base_url=os.environ.get(
"DASHSCOPE_BASE_URL",
"https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
),
)
completion = client.chat.completions.create(
model="qwen3.7-max",
messages=[
{
"role": "user",
"content": "Write a Python function to merge two sorted linked lists.",
}
],
extra_body={
"enable_thinking": True,
},
stream=True,
)
for chunk in completion:
if chunk.choices:
delta = chunk.choices[0].delta
if getattr(delta, "content", None):
print(delta.content, end="")This is the cleanest qwen 3.7 API shape if your existing code already uses the OpenAI SDK.
Thinking Mode and preserve_thinking
Qwen3.7-Max is positioned for agentic tasks, so thinking mode matters. The official example enables thinking through:
extra_body={"enable_thinking": True}The release also describes preserve_thinking, which keeps thinking content from preceding turns in messages. That is useful for long agent runs where the model needs to keep track of prior reasoning, tool outcomes, and next-step strategy.
Use it carefully. Preserving extra thinking content can improve continuity, but it also increases token usage. For short chat, leave it off. For multi-step qwen3.7 coding agents, test it directly.
Claude Code and Other Agent Harnesses
Qwen APIs also support an Anthropic-compatible route. The official release shows this shape for Claude Code:
export ANTHROPIC_MODEL="qwen3.7-max"
export ANTHROPIC_SMALL_FAST_MODEL="qwen3.7-max"
export ANTHROPIC_BASE_URL=https://dashscope-intl.aliyuncs.com/apps/anthropic
export ANTHROPIC_AUTH_TOKEN=<your_api_key>That is important because Qwen 3.7 Max is meant to run inside coding assistants and agent scaffolds, not only inside direct chat completion calls.
Pricing and Context
The Qwen Cloud model card lists Qwen3.7-Max with:
| Field | Value |
|---|---|
| Context | 1M tokens |
| Max input | 991.80K tokens |
| Max output | 65.53K tokens |
| Input price | $2.50 per 1M tokens |
| Output price | $7.50 per 1M tokens |
| RPM | 600 |
| TPM | 1M |
Always confirm pricing in your actual provider console before committing production traffic. Providers can change price, quota, and region availability independently.
Integration Tips
- Start with
qwen3.7-maxin a staging environment. - Use streaming for coding and agent UX.
- Set
max_tokensintentionally instead of relying on the maximum output size. - Log tool calls and final answers separately.
- Test
enable_thinkingandpreserve_thinkingonly on workflows where they are likely to help. - Compare qwen-3.7 against Qwen3.6-Plus on the same prompts before switching all traffic.
Bottom Line
The Qwen3.7-Max API is no longer just a watchlist item. The official materials now give a model alias, regional compatible-mode endpoints, thinking mode, preserve_thinking, and agent harness examples.
For production work, treat qwen-3.7, qwen3.7, and qwen 3.7 API integration like any other hosted model migration: pin the model where possible, validate costs, test long-context behavior, and keep fallback routing until your own workloads pass.
Related: Qwen3.7-Max benchmark and Qwen3.7-Max context window.

