Qwen: Qwen3 VL 30B A3B Instruct
qwen/qwen3-vl-30b-a3b-instructQwen3-VL-30B-A3B-Instruct is a multimodal model that unifies strong text generation with visual understanding for images and videos. Its Instruct variant optimizes instruction-following for general multimodal tasks. It excels in perception of real-world/synthetic categories, 2D/3D spatial grounding, and long-form visual comprehension, achieving competitive multimodal benchmark results. For agentic use, it handles multi-image multi-turn instructions, video timeline alignments, GUI automation, and visual coding from sketches to debugged UI. Text performance matches flagship Qwen3 models, suiting document AI, OCR, UI assistance, spatial tasks, and agent research.
input
$0.20/M
output
$0.70/M
context
128K
created
Nov 28, 2025
Supported API shape
input
text · image
output
text
tools
Supported
json mode
Supported
Verification
signature
response ID
attestation
GPU TEE
provider
Phala
Provider
Phala
GPU TEE
input
$0.20/M
output
$0.70/M
context
128K
More models
Other private inference routes.
Z.ai: GLM 5.2
GLM-5.2 is Z.ai's flagship model for the era of long-horizon tasks. With a truly usable 1M-token context window, it can handle project-level engineering context and execute long-running tasks more reliably. Served as a text-only TEE deployment via Phala.
context
1.0M
input
$1.40/M
Phala: Gemma-4 26B-A4B Uncensored (Heretic)
Uncensored "Heretic" variant of google/gemma-4-26B-A4B-it created using Heretic v1.2.0 with the Arbitrary-Rank Ablation (ARA) method and row-norm preservation. Refusals drop from 100/100 to 11/100 with KL divergence 0.0499 vs the base model. The base Gemma 4 26B A4B is a Mixture-of-Experts model with 25.2B total / 3.8B active parameters (8 active / 128 total experts), 30-layer transformer with hybrid local sliding (1024) + global attention, supporting a 256K context window. Natively multimodal (text + images, variable aspect ratios). Strong on coding, reasoning, function calling, with native system prompt support across 35+ languages. Served on Phala in TDX-attested H200 enclave with end-to-end ECDSA response signing; vLLM-compatible FP8-Static quantization by cloud19 (router excluded from quantization).
context
66K
input
$0.15/M
Phala: Qwen3.6 35B-A3B Uncensored (Aggressive)
Uncensored "Aggressive" variant of Qwen3.6-35B-A3B from Alibaba's Qwen team. The fine-tune by HauhauCS removes refusal behaviors (0/465 refusals) without modifying datasets or core capabilities. The base architecture is a 35B-parameter Mixture-of-Experts model with 256 experts routing 8 per token (~3B active params), 40 layers, and a hybrid linear+full-softmax attention mechanism (3:1 ratio). Supports a native 262K context and is natively multimodal across text, images, and video. Served on Phala in TDX-attested H200 enclave with end-to-end ECDSA response signing; FP8 quantization by lamianlbe.
context
131K
input
$0.30/M
Qwen: Qwen3.5-27B
The Qwen3.5 27B native vision-language Dense model incorporates a linear attention mechanism, delivering fast response times while balancing inference speed and performance. Its overall capabilities are comparable to those of the Qwen3.5-122B-A10B.
context
262K
input
$0.30/M