Qwen3.6-Plus
Current Qwen 3.6 hosted release for agentic coding, stronger tool use, sharper multimodal reasoning, and a 1M default context window.
Try the latest Qwen AI models right in your browser. Ask questions, write code, search the web, and think step-by-step — all in one place.
Starter prompts
Try the models in the browser. Each model page summarizes the key specs, benchmarks, and use cases in a simpler way.
Each Qwen model makes a different trade-off between speed, cost, context length, and reasoning depth. Open a model page to compare benchmarks, use cases, and jump straight into chat.
Current Qwen 3.6 hosted release for agentic coding, stronger tool use, sharper multimodal reasoning, and a 1M default context window.
Fast, lightweight dense model for everyday drafting, QA, and simple coding tasks.
Balanced dense model for longer prompts, analysis, and general-purpose chat.
Compact MoE model for reasoning-heavy conversations and structured output.
Large MoE model for multi-step reasoning, planning, and detailed analysis.
Flagship MoE model for the most demanding reasoning and long-form tasks.
Hosted fast path that Alibaba Cloud maps to the Qwen3.5-35B-A3B base model family.
Hosted all-rounder that Alibaba Cloud maps to the Qwen3.5-397B-A17B base model family.
Public benchmark scores across the Qwen model family. Hosted models (Flash, Plus) are labeled with the open-weight base model they reference.
Light dense model for quick prompts and lightweight coding.
Balanced dense model with better reasoning and coding depth.
Compact MoE model, also the base model behind Qwen3.5-Flash.
Hosted version built on Qwen3.5-35B-A3B with additional tooling and a 1M context window.
Scores reference the Qwen3.5-35B-A3B base model.
Mid-tier MoE model for deeper reasoning and agent tasks.
Flagship open-weight Qwen3.5 model, also the base model behind Qwen3.5-Plus.
Hosted version built on Qwen3.5-397B-A17B with additional tooling and a 1M context window.
Scores reference the Qwen3.5-397B-A17B base model.
Current Qwen 3.6 hosted release with agentic coding, stronger tool use, and multimodal reasoning.
1M default context window with preserve_thinking support.
Qwen 3.5 and Qwen 3.6 together cover a wide range of tasks — from lightweight local models to hosted agents with 1M context.
Qwen 3.5 offers dense models (9B, 27B) for low-latency tasks and MoE models (35B-A3B, 122B-A10B, 397B-A17B) for deeper reasoning. Pick the trade-off that fits your workload.
Open Qwen 3.5 models support 262K native tokens. Hosted models like Flash, Plus, and Qwen3.6-Plus extend this to 1M by default.
Enable step-by-step reasoning for complex tasks — debugging, comparisons, multi-step plans — where a quick answer usually falls short.
Qwen3.6-Plus adds agentic coding, stronger tool calling, and multimodal reasoning on top of the Qwen 3.5 foundation. Best for multi-step tasks that need the model to plan and act.
Qwen 3.5 open-weight models use Apache 2.0 and run anywhere. Hosted models (Flash, Plus, Qwen3.6-Plus) add built-in tools and managed infrastructure.
Strong performance across 100+ languages. Qwen3.6-Plus also handles images and documents alongside text in the same conversation.
Common questions about the Qwen model family and how to use them on this site.
Qwen 3.5 is Alibaba Cloud's open-weight model family with dense and MoE variants from 9B to 397B parameters. This site also hosts Flash, Plus, and the newer Qwen3.6-Plus.
For everyday tasks, try Qwen3.5-9B or Flash for speed. For harder reasoning or coding, use Qwen3.5-122B-A10B or 397B-A17B. For the strongest all-rounder, pick Qwen3.5-Plus or Qwen3.6-Plus.
Yes. Every account gets 5 free credits to try the site. When those credits run out, you can top up or subscribe to keep using higher-cost models, web search, and thinking mode. Open-weight Qwen 3.5 releases remain Apache 2.0 if you want to self-host them.
Qwen 3.6 builds on the Qwen 3.5 foundation. Qwen3.6-Plus adds agentic coding, stronger tool calling, multimodal reasoning, and a 1M default context window. Qwen 3.5 has more model sizes and open weights.
Yes. The open-weight Qwen 3.5 models work with Ollama, vLLM, llama.cpp, and Hugging Face Transformers. Hardware requirements depend on the model size and quantization level.
Open Qwen 3.5 models support 262K native tokens, extensible to ~1M with compatible frameworks. Hosted models (Flash, Plus, Qwen3.6-Plus) ship with 1M context by default.
Thinking mode lets the model reason step-by-step before answering. It improves accuracy on complex tasks like debugging, math, and multi-step analysis. You can toggle it on or off in the chat.
Dense models (9B, 27B) use all parameters for every token — simpler and more predictable. MoE models (35B-A3B, 122B-A10B, 397B-A17B) activate only a subset of experts per token, giving stronger reasoning with lower compute per token.
Yes. All Qwen 3.5 models handle coding tasks. For simple code, Qwen3.5-9B or Flash work well. For complex multi-file projects, Qwen3.5-Plus or Qwen3.6-Plus are better choices.
On this site, you can enable web search in the chat to let the model pull in live information before answering. This is useful for current events, documentation lookups, and fact-checking.
Guides, comparisons, and setup notes for Qwen 3.5, Qwen 3.6, Ollama, Hugging Face, vLLM, GGUF, and more.

How to access Qwen 3.5 through APIs including OpenRouter and Alibaba Cloud DashScope. Covers API keys, Python and curl examples, pricing, and model IDs for qwen 3.5 api integration.

A breakdown of Qwen 3.5 benchmark results across reasoning, coding, math, and multilingual tasks — with comparisons to GPT-4o, Claude, and Llama.

Which Qwen 3.5 model is best for coding? A practical guide to code generation, debugging, and IDE workflows with the Qwen 3.5 family.

How to download and run Qwen 3.5 GGUF files for local inference with llama.cpp. Covers quantization levels, where to find GGUF files, setup instructions, and quality vs performance tradeoffs.

How to find, download, and run Qwen 3.5 models from Hugging Face. Covers model cards, transformers integration, inference examples, and variant comparison for qwen 3.5 huggingface users.

Everything you need to run Qwen 3.5 locally: hardware requirements by model size, setup with Ollama, vLLM, llama.cpp, and Transformers, plus performance optimization tips.

An honest guide to qwen3.5 uncensored models: what uncensored really means, where community fine-tunes come from, safety considerations, and how to approach them responsibly.

A practical guide to fine-tuning Qwen 3.5 with Unsloth, covering installation, LoRA and QLoRA setup, training configuration, and exporting your fine-tuned model.

A complete guide to running Qwen 3.5 models with vLLM for high-throughput inference. Covers installation, serving, model variants, and performance tuning for vllm qwen3.5 deployments.

A detailed comparison of Qwen 3.5 vs Qwen 3.6 covering key differences, feature upgrades, context window changes, and practical guidance on which version fits your workflow.

How to use the Qwen3.6-Plus API — endpoints, request format, tool calling, and integration tips for developers building with Qwen 3.6.

A practical look at where Qwen3.6-Plus feels better for coding than Qwen3.5-Plus, and where the older model is still enough.

A practical guide to Qwen3.6-Plus's 1M context window, what it helps with, and what long context still does not solve.

What Qwen3.6-Plus brings to the table — agentic coding, 1M context, multimodal reasoning — and when to pick it over Qwen 3.5 models.

A practical first pass on qwen3.5 ollama: what people usually mean, how to decide between local and hosted use, and which Qwen page to open next.