Models
ReadyGPU TEE

DeepSeek V3.1

Model IDdeepseek/deepseek-chat-v3.1

DeepSeek-V3.1 is a large hybrid reasoning model (671B parameters, 37B active) that supports both thinking and non-thinking modes via prompt templates. It extends the DeepSeek-V3 base with a two-phase long-context training process, reaching up to 128K tokens, and uses FP8 microscaling for efficient inference.

input

$1.05/M

output

$3.10/M

context

164K

created

Nov 28, 2025

Supported API shape

input

text

output

text

tools

Supported

json mode

Supported

Verification

signature

response ID

attestation

GPU TEE

provider

1 routes

Providers

near-ai

GPU TEE

input

$1.05/M

output

$3.10/M

context

164K

More models

Other private inference routes.

View catalog
encrypted

DeepSeek: DeepSeek V3.2

DeepSeek-V3.2 is a large language model designed to harmonize high computational efficiency with strong reasoning and agentic tool-use performance. It introduces DeepSeek Sparse Attention (DSA), a fine-grained sparse attention mechanism that reduces training and inference cost while preserving quality in long-context scenarios. A scalable reinforcement learning post-training framework further improves reasoning, with reported performance in the GPT-5 class, and the model has demonstrated gold-medal results on the 2025 IMO and IOI. V3.2 also uses a large-scale agentic task synthesis pipeline to better integrate reasoning into tool-use settings, boosting compliance and generalization in interactive environments.

context

164K

input

$0.32/M

encrypted

DeepSeek: R1 0528

May 28th update to the original DeepSeek R1 Performance on par with OpenAI o1, but open-sourced and with fully open reasoning tokens. It's 671B parameters in size, with 37B active in an inference pass.

context

164K

input

$2.00/M

encrypted

Qwen: Qwen3.5-27B

The Qwen3.5 27B native vision-language Dense model incorporates a linear attention mechanism, delivering fast response times while balancing inference speed and performance. Its overall capabilities are comparable to those of the Qwen3.5-122B-A10B.

context

262K

input

$0.30/M

encrypted

Z.AI: GLM 4.7 Flash

As a 30B-class SOTA model, GLM-4.7-Flash offers a new option that balances performance and efficiency. It is further optimized for agentic coding use cases, strengthening coding capabilities, long-horizon task planning, and tool collaboration, and has achieved leading performance among open-source models of the same size on several current public benchmark leaderboards.

context

203K

input

$0.10/M