DeepSeek: DeepSeek V3.2
deepseek/deepseek-v3.2DeepSeek-V3.2 is a large language model designed to harmonize high computational efficiency with strong reasoning and agentic tool-use performance. It introduces DeepSeek Sparse Attention (DSA), a fine-grained sparse attention mechanism that reduces training and inference cost while preserving quality in long-context scenarios. A scalable reinforcement learning post-training framework further improves reasoning, with reported performance in the GPT-5 class, and the model has demonstrated gold-medal results on the 2025 IMO and IOI. V3.2 also uses a large-scale agentic task synthesis pipeline to better integrate reasoning into tool-use settings, boosting compliance and generalization in interactive environments.
input
$0.32/M
output
$0.48/M
context
164K
created
Dec 3, 2025
Supported API shape
input
text
output
text
tools
Supported
json mode
Supported
Verification
signature
response ID
attestation
GPU TEE
provider
1 routes
Providers
chutes
GPU TEE
input
$0.32/M
output
$0.48/M
context
164K
More models
Other private inference routes.
DeepSeek V3.1
DeepSeek-V3.1 is a large hybrid reasoning model (671B parameters, 37B active) that supports both thinking and non-thinking modes via prompt templates. It extends the DeepSeek-V3 base with a two-phase long-context training process, reaching up to 128K tokens, and uses FP8 microscaling for efficient inference.
context
164K
input
$1.05/M
DeepSeek: R1 0528
May 28th update to the original DeepSeek R1 Performance on par with OpenAI o1, but open-sourced and with fully open reasoning tokens. It's 671B parameters in size, with 37B active in an inference pass.
context
164K
input
$2.00/M
Qwen: Qwen3.5-27B
The Qwen3.5 27B native vision-language Dense model incorporates a linear attention mechanism, delivering fast response times while balancing inference speed and performance. Its overall capabilities are comparable to those of the Qwen3.5-122B-A10B.
context
262K
input
$0.30/M
Z.AI: GLM 4.7 Flash
As a 30B-class SOTA model, GLM-4.7-Flash offers a new option that balances performance and efficiency. It is further optimized for agentic coding use cases, strengthening coding capabilities, long-horizon task planning, and tool collaboration, and has achieved leading performance among open-source models of the same size on several current public benchmark leaderboards.
context
203K
input
$0.10/M