Confidential AI Models

Private LLMs.
Verified results.

Frontier inference without exposing prompts, tools, or memory.

OpenAI-compatible APIs run inside hardware-backed TEEs and return proof of the runtime that handled the request.

AI calls carry more than prompt.

TEE boundary

Phala private LLM

same SDK
TEE endpoint
hardware receipt

Private AI calls prompts, keys, tools, and memory stay inside the runtime. providers route the request without becoming the trust boundary. Proof follows the answer. verify GPU, container, model route, and response. same API shape, hardware-backed receipt, private inference. streaming, tool calls, and agent memory keep their normal developer flow. auditors can inspect evidence without reading the prompt. users get answers plus runtime proof, not another black box.

Private LLM catalog

Frontier models with private runtime.

OpenAI-compatible models with hardware-backed privacy and verification. Keep your SDK flow, change the endpoint, and copy the real call when you need it.

encrypted

Phala: Gemma-4 26B-A4B Uncensored (Heretic)

66K context

$0.15/M input

Check detail
encrypted

Phala: Qwen3.6 35B-A3B Uncensored (Aggressive)

131K context

$0.30/M input

Check detail
encrypted

MoonshotAI: Kimi K2.6

262K context

$1.09/M input

Check detail
encrypted

Qwen: Qwen3 Coder Next

262K context

$0.18/M input

Check detail
encrypted

Z.ai: GLM 5.1

203K context

$1.21/M input

Check detail
encrypted

Xiaomi: MiMo-V2-Flash

262K context

$0.10/M input

Check detail
encrypted

Qwen: Qwen3.5-27B

262K context

$0.30/M input

Check detail
encrypted

Qwen: Qwen3.5 397B A17B

262K context

$0.55/M input

Check detail
encrypted

MiniMax: MiniMax M2.5

197K context

$0.20/M input

Check detail
encrypted

Z.AI: GLM 5

203K context

$1.20/M input

Check detail
encrypted

MoonshotAI: Kimi K2.5

262K context

$0.60/M input

Check detail
encrypted

Z.AI: GLM 4.7

131K context

$0.85/M input

Check detail
Model requests are routed through confidential AI providers with TEE support.
Check all

Integrate in minutes

Same SDK, Change Endpoint, Verify E2EE.

Keep your OpenAI-compatible client. Point it at the private endpoint, choose a Phala model slug, and read the proof when the output needs an audit trail.

selected proof

Private LLM Gateway

The OpenAI-compatible endpoint terminates inside the verified gateway boundary.

reporttls_endpointreceiptgateway_app_idstatusverified
app_idlinked
endpointlinked
policylinked
app_certlinked
drag · zoom · click node

AI solution paths

Use private models where AI touches secrets.

The private model endpoint is the first entry point. The same privacy primitive extends to agents, data workflows, and training.

LLM API

Private AI inference

Serve OpenAI-compatible model calls where prompts, outputs, and customer context need encrypted-in-use protection.

Open solution
encrypted

DeepSeek V3.1

128K

$0.27/M input

encrypted

Qwen3 Coder

256K

$0.40/M input

encrypted

Llama 3.3 70B

128K

$0.15/M input

encrypted

GPT OSS 120B

128K

$0.10/M input

encrypted

Claude Sonnet 4.5

200K

$3.00/M input

encrypted

Gemini 2.5 Pro

1M

$1.25/M input

Agents

Private AI agents

Run agents with keys, tools, memory, and actions inside a verified runtime instead of a visible automation cloud.

Open solution
Training

Private model training

Adapt models on proprietary data while keeping datasets, gradients, checkpoints, and evaluation traces inside the boundary.

Open solution

private training run

Observe without exposing weights.

H100 CC

01

dataset

sealed

02

fine-tune

running

03

eval

private

04

checkpoint

verified

loss curve

proof attached

attestation.json

Data

Private AI data

Move models to sensitive records and return approved outputs without exposing raw data to the model operator.

Open solution

source

EHR data

source

Customer records

source

Internal docs

TEE clean room

query without raw access

approved output

aggregate only
no row exportproof linked

Questions

What teams ask before they switch.

Private LLMs are not just another endpoint. They are a deployment choice between SaaS convenience and self-operated AI infrastructure.

1

How is this different from a normal LLM API?

A normal LLM API asks you to trust the provider boundary. Phala runs the model call inside hardware-backed TEEs and can attach runtime proof showing what protected the request.

2

How is this different from running models on-prem?

On-prem gives control, but you operate GPUs, model serving, upgrades, and capacity. Phala keeps the API workflow while adding private execution and verifiable runtime state.

3

How difficult is it to integrate private LLMs into my existing app?

Use the OpenAI-compatible API shape: change the base URL, select a private model slug, and keep your existing SDK or agent framework.

4

What model types are available?

The catalog includes coding, reasoning, general chat, and open-weight model families from providers such as DeepSeek, Qwen, Meta, Mistral, Google, and OpenAI OSS.

5

How can customers verify that data was protected?

The Trust Center turns attestation reports into an inspectable view of hardware, source, runtime, and network verification state.

6

When should I use a dedicated private stack?

Use a dedicated stack when you need custom models, reserved GPUs, customer-specific deployments, or a stronger compliance and audit boundary than shared inference.

Start building

Build AI you can prove.

Deploy private workloads, verify execution, and scale from models to GPU jobs.