Google: Gemini 3 Pro Preview
google/gemini-3-pro-previewGemini 3 Pro is Google’s flagship frontier model for high-precision multimodal reasoning, combining strong performance across text, image, video, audio, and code with a 1M-token context window. Reasoning Details must be preserved when using multi-turn tool calling, see our docs here: https://openrouter.ai/docs/use-cases/reasoning-tokens#preserving-reasoning-blocks. It delivers state-of-the-art benchmark results in general reasoning, STEM problem solving, factual QA, and multimodal understanding, including leading scores on LMArena, GPQA Diamond, MathArena Apex, MMMU-Pro, and Video-MMMU. Interactions emphasize depth and interpretability: the model is designed to infer intent with minimal prompting and produce direct, insight-focused responses. Built for advanced development and agentic workflows, Gemini 3 Pro provides robust tool-calling, long-horizon planning stability, and strong zero-shot generation for complex UI, visualization, and coding tasks. It excels at agentic coding (SWE-Bench Verified, Terminal-Bench 2.0), multimodal analysis, and structured long-form tasks such as research synthesis, planning, and interactive learning experiences. Suitable applications include autonomous agents, coding assistants, multimodal analytics, scientific reasoning, and high-context information processing.
input
$4.00/M
output
$18.00/M
context
1.0M
created
Dec 13, 2025
Supported API shape
input
text · image · file · audio · video
output
text
tools
Not listed
json mode
Not listed
Verification
receipt
x-receipt-id
attestation
gateway report
session
attested upstream
provider
Phala
Provider
Phala
Intel TDX
input
$4.00/M
output
$18.00/M
context
1.0M
More models
Other private inference routes.
Qwen: Qwen3.6 27B
Qwen3.6 27B is a dense 27-billion-parameter language model from the Qwen Team at Alibaba, released in April 2026. It features hybrid multimodal capabilities accepting text and image inputs, a configurable thinking/reasoning mode, and a native 262K context window. Served as a TEE deployment via Chutes.
context
262K
input
$0.32/M
Google: Gemma 4 31B
Gemma 4 31B Instruct is Google DeepMind's 30.7B dense model. Features a 256K token context window, configurable thinking/reasoning mode, native function calling, and strong multilingual performance. Served as a text-only TEE deployment via NEAR AI.
context
262K
input
$0.15/M
Phala: Gemma-4 26B-A4B Uncensored (Heretic)
Uncensored "Heretic" variant of google/gemma-4-26B-A4B-it created using Heretic v1.2.0 with the Arbitrary-Rank Ablation (ARA) method and row-norm preservation. Refusals drop from 100/100 to 11/100 with KL divergence 0.0499 vs the base model. The base Gemma 4 26B A4B is a Mixture-of-Experts model with 25.2B total / 3.8B active parameters (8 active / 128 total experts), 30-layer transformer with hybrid local sliding (1024) + global attention, supporting a 256K context window. Natively multimodal (text + images, variable aspect ratios). Strong on coding, reasoning, function calling, with native system prompt support across 35+ languages. Served on Phala in TDX-attested H200 enclave with end-to-end ECDSA response signing; vLLM-compatible FP8-Static quantization by cloud19 (router excluded from quantization).
context
66K
input
$0.15/M
Phala: Qwen3.6 35B-A3B Uncensored (Aggressive)
Uncensored "Aggressive" variant of Qwen3.6-35B-A3B from Alibaba's Qwen team. The fine-tune by HauhauCS removes refusal behaviors (0/465 refusals) without modifying datasets or core capabilities. The base architecture is a 35B-parameter Mixture-of-Experts model with 256 experts routing 8 per token (~3B active params), 40 layers, and a hybrid linear+full-softmax attention mechanism (3:1 ratio). Supports a native 262K context and is natively multimodal across text, images, and video. Served on Phala in TDX-attested H200 enclave with end-to-end ECDSA response signing; FP8 quantization by lamianlbe.
context
131K
input
$0.30/M