Anthropic: Claude Opus 4.8
anthropic/claude-opus-4.8Claude Opus 4.8 is Anthropic's most capable generally available model in the Opus family. It supports text, image, and file inputs with text output, includes reasoning support, and has a 1M-token context window.
input
$5.00/M
output
$25.00/M
context
1M
created
May 29, 2026
Supported API shape
input
text · image · file
output
text
tools
Supported
json mode
Supported
Verification
signature
response ID
attestation
Intel TDX
provider
Phala
Provider
Phala
Intel TDX
input
$5.00/M
output
$25.00/M
context
1M
More models
Other private inference routes.
Z.ai: GLM 5.2
GLM-5.2 is Z.ai's flagship model for the era of long-horizon tasks. With a truly usable 1M-token context window, it can handle project-level engineering context and execute long-running tasks more reliably. Served as a text-only TEE deployment via Phala.
context
1.0M
input
$1.40/M
Phala: Gemma-4 26B-A4B Uncensored (Heretic)
Uncensored "Heretic" variant of google/gemma-4-26B-A4B-it created using Heretic v1.2.0 with the Arbitrary-Rank Ablation (ARA) method and row-norm preservation. Refusals drop from 100/100 to 11/100 with KL divergence 0.0499 vs the base model. The base Gemma 4 26B A4B is a Mixture-of-Experts model with 25.2B total / 3.8B active parameters (8 active / 128 total experts), 30-layer transformer with hybrid local sliding (1024) + global attention, supporting a 256K context window. Natively multimodal (text + images, variable aspect ratios). Strong on coding, reasoning, function calling, with native system prompt support across 35+ languages. Served on Phala in TDX-attested H200 enclave with end-to-end ECDSA response signing; vLLM-compatible FP8-Static quantization by cloud19 (router excluded from quantization).
context
66K
input
$0.15/M
Phala: Qwen3.6 35B-A3B Uncensored (Aggressive)
Uncensored "Aggressive" variant of Qwen3.6-35B-A3B from Alibaba's Qwen team. The fine-tune by HauhauCS removes refusal behaviors (0/465 refusals) without modifying datasets or core capabilities. The base architecture is a 35B-parameter Mixture-of-Experts model with 256 experts routing 8 per token (~3B active params), 40 layers, and a hybrid linear+full-softmax attention mechanism (3:1 ratio). Supports a native 262K context and is natively multimodal across text, images, and video. Served on Phala in TDX-attested H200 enclave with end-to-end ECDSA response signing; FP8 quantization by lamianlbe.
context
131K
input
$0.30/M
Qwen: Qwen3.5-27B
The Qwen3.5 27B native vision-language Dense model incorporates a linear attention mechanism, delivering fast response times while balancing inference speed and performance. Its overall capabilities are comparable to those of the Qwen3.5-122B-A10B.
context
262K
input
$0.30/M