Confidential AI Models
Frontier inference without exposing prompts, tools, or memory.
OpenAI-compatible APIs run inside hardware-backed TEEs and return proof of the runtime that handled the request.
TEE boundary
Private AI calls prompts, keys, tools, and memory stay inside the runtime. providers route the request without becoming the trust boundary. Proof follows the answer. verify GPU, container, model route, and response. same API shape, hardware-backed receipt, private inference. streaming, tool calls, and agent memory keep their normal developer flow. auditors can inspect evidence without reading the prompt. users get answers plus runtime proof, not another black box.
Private LLM catalog
OpenAI-compatible models with hardware-backed privacy and verification. Keep your SDK flow, change the endpoint, and copy the real call when you need it.
Integrate in minutes
Keep your OpenAI-compatible client. Point it at the private endpoint, choose a Phala model slug, and read the proof when the output needs an audit trail.
selected proof
The OpenAI-compatible endpoint terminates inside the verified gateway boundary.
AI solution paths
The private model endpoint is the first entry point. The same privacy primitive extends to agents, data workflows, and training.
Serve OpenAI-compatible model calls where prompts, outputs, and customer context need encrypted-in-use protection.
128K
$0.27/M input
256K
$0.40/M input
128K
$0.15/M input
128K
$0.10/M input
200K
$3.00/M input
1M
$1.25/M input
Run agents with keys, tools, memory, and actions inside a verified runtime instead of a visible automation cloud.
Adapt models on proprietary data while keeping datasets, gradients, checkpoints, and evaluation traces inside the boundary.
private training run
01
sealed
02
running
03
private
04
verified
loss curve
proof attached
attestation.json
Move models to sensitive records and return approved outputs without exposing raw data to the model operator.
source
EHR data
source
Customer records
source
Internal docs
TEE clean room
approved output
Questions
Private LLMs are not just another endpoint. They are a deployment choice between SaaS convenience and self-operated AI infrastructure.
A normal LLM API asks you to trust the provider boundary. Phala runs the model call inside hardware-backed TEEs and can attach runtime proof showing what protected the request.
On-prem gives control, but you operate GPUs, model serving, upgrades, and capacity. Phala keeps the API workflow while adding private execution and verifiable runtime state.
Use the OpenAI-compatible API shape: change the base URL, select a private model slug, and keep your existing SDK or agent framework.
The catalog includes coding, reasoning, general chat, and open-weight model families from providers such as DeepSeek, Qwen, Meta, Mistral, Google, and OpenAI OSS.
The Trust Center turns attestation reports into an inspectable view of hardware, source, runtime, and network verification state.
Use a dedicated stack when you need custom models, reserved GPUs, customer-specific deployments, or a stronger compliance and audit boundary than shared inference.
Start building
Deploy private workloads, verify execution, and scale from models to GPU jobs.