PrêtGPU TEE

DeepSeek: DeepSeek V3.2

ID du modèledeepseek/deepseek-v3.2

DeepSeek-V3.2 is a large language model designed to harmonize high computational efficiency with strong reasoning and agentic tool-use performance. It introduces DeepSeek Sparse Attention (DSA), a fine-grained sparse attention mechanism that reduces training and inference cost while preserving quality in long-context scenarios. A scalable reinforcement learning post-training framework further improves reasoning, with reported performance in the GPT-5 class, and the model has demonstrated gold-medal results on the 2025 IMO and IOI. V3.2 also uses a large-scale agentic task synthesis pipeline to better integrate reasoning into tool-use settings, boosting compliance and generalization in interactive environments.

Commencer à construire Docs

entrée

$0.32/M

sortie

$0.48/M

contexte

164K

créé

3 déc. 2025

Forme d’API prise en charge

entrée

text

sortie

text

outils

Pris en charge

mode JSON

Pris en charge

Vérification

signature

ID de réponse

attestation

GPU TEE

fournisseur

Phala

Provider

Phala

GPU TEE

entrée

$0.32/M

sortie

$0.48/M

contexte

164K

$ pip install openai

1from openai import OpenAI
2import os
3
4MODEL = "deepseek/deepseek-v3.2"
5
6client = OpenAI(
7    base_url="https://api.redpill.ai/v1",
8    api_key=os.environ["REDPILL_API_KEY"],
9)
10
11response = client.chat.completions.create(
12    model=MODEL,
13    messages=[{"role": "user", "content": "Run this privately."}],
14)
15
16print(response.id)
17print(response.choices[0].message.content)

$ npm install openai

1import OpenAI from "openai"
2
3const MODEL = "deepseek/deepseek-v3.2"
4
5const openai = new OpenAI({
6  baseURL: "https://api.redpill.ai/v1",
7  apiKey: process.env.REDPILL_API_KEY,
8})
9
10const response = await openai.chat.completions.create({
11  model: MODEL,
12  messages: [{ role: "user", content: "Run this privately." }],
13})
14
15console.log(response.id)
16console.log(response.choices[0].message.content)

1curl https://api.redpill.ai/v1/chat/completions \
2  -H "Authorization: Bearer $REDPILL_API_KEY" \
3  -H "Content-Type: application/json" \
4  -d '{
5    "model": "deepseek/deepseek-v3.2",
6    "messages": [
7      { "role": "user", "content": "Run this privately." }
8    ]
9  }'

Plus de modèles

Autres routes d’inférence privée.

Voir le catalogue

chiffré

Phala: Gemma-4 26B-A4B Uncensored (Heretic)

Uncensored "Heretic" variant of google/gemma-4-26B-A4B-it created using Heretic v1.2.0 with the Arbitrary-Rank Ablation (ARA) method and row-norm preservation. Refusals drop from 100/100 to 11/100 with KL divergence 0.0499 vs the base model. The base Gemma 4 26B A4B is a Mixture-of-Experts model with 25.2B total / 3.8B active parameters (8 active / 128 total experts), 30-layer transformer with hybrid local sliding (1024) + global attention, supporting a 256K context window. Natively multimodal (text + images, variable aspect ratios). Strong on coding, reasoning, function calling, with native system prompt support across 35+ languages. Served on Phala in TDX-attested H200 enclave with end-to-end ECDSA response signing; vLLM-compatible FP8-Static quantization by cloud19 (router excluded from quantization).

contexte

66K

entrée

$0.15/M

chiffré

Phala: Qwen3.6 35B-A3B Uncensored (Aggressive)

Uncensored "Aggressive" variant of Qwen3.6-35B-A3B from Alibaba's Qwen team. The fine-tune by HauhauCS removes refusal behaviors (0/465 refusals) without modifying datasets or core capabilities. The base architecture is a 35B-parameter Mixture-of-Experts model with 256 experts routing 8 per token (~3B active params), 40 layers, and a hybrid linear+full-softmax attention mechanism (3:1 ratio). Supports a native 262K context and is natively multimodal across text, images, and video. Served on Phala in TDX-attested H200 enclave with end-to-end ECDSA response signing; FP8 quantization by lamianlbe.

contexte

131K

entrée

$0.30/M

chiffré

Qwen: Qwen3.5-27B

The Qwen3.5 27B native vision-language Dense model incorporates a linear attention mechanism, delivering fast response times while balancing inference speed and performance. Its overall capabilities are comparable to those of the Qwen3.5-122B-A10B.

contexte

262K

entrée

$0.30/M

chiffré

Z.AI: GLM 4.7 Flash

As a 30B-class SOTA model, GLM-4.7-Flash offers a new option that balances performance and efficiency. It is further optimized for agentic coding use cases, strengthening coding capabilities, long-horizon task planning, and tool collaboration, and has achieved leading performance among open-source models of the same size on several current public benchmark leaderboards.

contexte

203K

entrée

$0.10/M