模型
就绪GPU TEE

Qwen: Qwen3.5-122B-A10B

模型 IDqwen/qwen3.5-122b-a10b

Qwen3.5-122B-A10B is a large Mixture-of-Experts model from Alibaba Cloud with 122B total parameters and 10B active parameters per token. Strong on reasoning, coding, and tool calling with 262K context. Served as a text-only TEE deployment via NEAR AI.

输入

$0.46/M

输出

$3.68/M

上下文

262K

创建时间

2026年5月26日

支持的 API 形态

输入

text

输出

text

工具

未列出

JSON 模式

未列出

验证

签名

响应 ID

证明

GPU TEE

提供商

Phala

Provider

Phala

GPU TEE

输入

$0.46/M

输出

$3.68/M

上下文

262K

更多模型

其他隐私推理路由。

查看目录
加密的

Phala: Gemma-4 26B-A4B Uncensored (Heretic)

Uncensored "Heretic" variant of google/gemma-4-26B-A4B-it created using Heretic v1.2.0 with the Arbitrary-Rank Ablation (ARA) method and row-norm preservation. Refusals drop from 100/100 to 11/100 with KL divergence 0.0499 vs the base model. The base Gemma 4 26B A4B is a Mixture-of-Experts model with 25.2B total / 3.8B active parameters (8 active / 128 total experts), 30-layer transformer with hybrid local sliding (1024) + global attention, supporting a 256K context window. Natively multimodal (text + images, variable aspect ratios). Strong on coding, reasoning, function calling, with native system prompt support across 35+ languages. Served on Phala in TDX-attested H200 enclave with end-to-end ECDSA response signing; vLLM-compatible FP8-Static quantization by cloud19 (router excluded from quantization).

上下文

66K

输入

$0.15/M

加密的

Phala: Qwen3.6 35B-A3B Uncensored (Aggressive)

Uncensored "Aggressive" variant of Qwen3.6-35B-A3B from Alibaba's Qwen team. The fine-tune by HauhauCS removes refusal behaviors (0/465 refusals) without modifying datasets or core capabilities. The base architecture is a 35B-parameter Mixture-of-Experts model with 256 experts routing 8 per token (~3B active params), 40 layers, and a hybrid linear+full-softmax attention mechanism (3:1 ratio). Supports a native 262K context and is natively multimodal across text, images, and video. Served on Phala in TDX-attested H200 enclave with end-to-end ECDSA response signing; FP8 quantization by lamianlbe.

上下文

131K

输入

$0.30/M

加密的

Qwen: Qwen3.5-27B

The Qwen3.5 27B native vision-language Dense model incorporates a linear attention mechanism, delivering fast response times while balancing inference speed and performance. Its overall capabilities are comparable to those of the Qwen3.5-122B-A10B.

上下文

262K

输入

$0.30/M

加密的

Z.AI: GLM 4.7 Flash

As a 30B-class SOTA model, GLM-4.7-Flash offers a new option that balances performance and efficiency. It is further optimized for agentic coding use cases, strengthening coding capabilities, long-horizon task planning, and tool collaboration, and has achieved leading performance among open-source models of the same size on several current public benchmark leaderboards.

上下文

203K

输入

$0.10/M