隐私推理

在不暴露提示词的情况下进行推理。

兼容 OpenAI。签名收据。天然无日志。

你的想法只属于你

Private LLM catalog

Frontier models with private runtime.

OpenAI-compatible models with hardware-backed privacy and verification. Keep your SDK flow, change the endpoint, and copy the real call when you need it.

加密的

Z.ai: GLM 5.2

1.0M 上下文

$1.40/M input

查看详情

加密的

Qwen: Qwen3.6 27B

262K 上下文

$0.32/M input

查看详情

加密的

DeepSeek: DeepSeek V4 Flash

1.0M 上下文

$0.20/M input

查看详情

加密的

Qwen: Qwen3.5-122B-A10B

262K 上下文

$0.46/M input

查看详情

加密的

Qwen: Qwen3 32B

41K 上下文

$0.12/M input

查看详情

加密的

Google: Gemma 4 31B

262K 上下文

$0.15/M input

查看详情

加密的

Qwen: Qwen3.6 35B A3B

262K 上下文

$0.20/M input

查看详情

加密的

DeepSeek: DeepSeek V4 Pro

800K 上下文

$1.50/M input

查看详情

加密的

Phala: Gemma-4 26B-A4B Uncensored (Heretic)

66K 上下文

$0.15/M input

查看详情

加密的

Phala: Qwen3.6 35B-A3B Uncensored (Aggressive)

131K 上下文

$0.30/M input

查看详情

加密的

MoonshotAI: Kimi K2.6

262K 上下文

$1.09/M input

查看详情

加密的

Z.ai: GLM 5.1

203K 上下文

$1.21/M input

查看详情

Model requests are routed through confidential AI providers with TEE support.

Check all

Private inference, by construction.

What you say to the model stays between your client and an attested CVM. Three primitives — encryption, TEE, no-logs — make that a property of the build, not a promise.

End-to-End Encryption

AES-GCM ciphertext on the wire, both hops
RA-TLS terminates inside the CVM, not at a load balancer
No plaintext intermediary on the host

工作原理

逐步查看单个请求的端到端流程。

关闭 dstack，看看究竟失去了哪项保障。

dstack 上的隐私推理

进入一组已证明的模型 CVM 的双跳 RA-TLS —— 可验证，天然无日志

活跃边缘未激活步骤中新增

步骤 1 of 5 · 随滚动响应

步骤 1 / 5

发送任何字节前先验证构建

客户端 SDK 获取每个候选 CVM 的 TDX quote，并在本地运行 dcap-qvl——确认构建与 DstackApp.sol 中的无日志条目匹配。信任决策在客户端完成；Phala 不需要为自己背书。

With dstack: 用户持有信任根，锚定在 Intel 的 TDX 硬件签名之上。

同一 SDK。同一端点。默认机密。

cURL · 即插即用

Hit inference.phala.com/v1/chat/completions with the OpenAI request shape. Receipt headers come back on every response — even from curl.

cURL

$ curl https://inference.phala.com \/v1/chat/completions \-H "Authorization: …" \-d '{"model":…}'x-receipt-id: rcpt-e0ee..x-aci-keyset-digest: sha256:3eff..

PYTHON

from openai import OpenAIc = OpenAI(base_url="…phala.com/v1",api_key=PHALA_KEY)r = c.chat.completions.create(…)

OpenAI Python SDK

`base_url="https://inference.phala.com/v1"` and you’re done. Existing code keeps working; capture x-receipt-id from the raw response when you need proof.

一个统一的验证器

Whether the model runs on Intel TDX + H100 or AMD SEV + B300, the receipt format is identical. One verification path covers your whole TEE-LLM fleet — even when you mix providers.

UNIFIED PROOF

unified verifierall match

phalaLlama 3.1

near aiDeepSeek V3

tinfoilQwen2.5

chutesMistral

one format · any provider

OPENROUTER

openrouter · phala2026-07-26

2.8Btokens / day

Llama · open$0.40 / M

Llama · phala$0.40 / M

DeepSeek · open$0.27 / M

DeepSeek · phala$0.27 / M

隐私不加价

Confidential routes through Phala on OpenRouter price the same as the open route. Privacy is no longer a procurement line item — just a header you opted into.

two-hop RA-TLS · X.509 with TDX-quote extension

tunneled · no plaintext intermediary

hop 01 · client → gateway

CN=phala-gatewayTDX-quote ext (1.3.6.1.4.1…)

hop 02 · gateway → model CVM

CN=vllm-llama-3.1-70bTDX+H100 quote ext

RA-TLSmTLSX.509tunneled

两跳 RA-TLS，直达模型

The first TLS hop terminates inside the dstack-gateway CVM (whose certificate carries its TDX quote). The second terminates inside the model CVM. There is no plaintext intermediary — just two confidential VMs whose X.509 certificates ARE their attestations.

response · /v1/chat/completions

200

x-receipt-idrcpt-e0eefe…x-aci-identitysha256:3def…x-aci-keyset-digestsha256:3eff…sessionupstream.verified

verify receiptmatches attestation

Signed receipt + attested session, every response

Every response carries x-receipt-id plus the gateway identity headers. Fetch the receipt, match it to a fresh gateway attestation, then follow upstream.verified.session_id when you need deeper audit evidence.

in production today · 3 live partners

Confidential inference, in production.

OpenRouter routes its enterprise tier through Phala. NEAR AI ships verifiable agent inference. OODA AI runs decentralized GPU TEE.

01enterprise · live

OpenRouter

enterprise tier · drop-in

“Drop-in OpenAI-compatible endpoint with verifiable, no-log routing. The receipt is the audit trail.”

18B+ tokens

no-log · verified routing

02web3 · live

NEAR AI

verifiable agent inference

“Verifiable agent inference for autonomous, on-chain workflows. Every model call lands on-chain with proof.”

100% receipts

on-chain verified · zk inference

03public-co · live

OODA AI

NASDAQ-listed · decentralized GPUs

“Decentralized GPUs with hardware attestation guarantees. No host root, no off-band access, no policy promises.”

12M tokens / day

TDX + H100 · hardware-attested

OpenAI-compatible

drop-in /v1 surface

TDX + H100/H200/B300

CPU + GPU TEE

5–15% overhead

vs bare-metal

No host root

compose-hash IS the policy

AI 解决方案路径

在 AI 触及密钥时使用隐私模型。

隐私模型端点是第一个入口点。同样的隐私原语也适用于代理、数据工作流和训练。

Agents

隐私 AI 代理

在可验证的运行时中运行代理的密钥、工具、记忆和操作，而不是放在可见的自动化云中。

打开解决方案

Training

隐私模型训练

在保持数据集、梯度、检查点和评估轨迹处于边界内的同时，基于专有数据调整模型。

打开解决方案

private training run

Observe without exposing weights.

H100 CC

dataset

sealed

fine-tune

running

eval

private

checkpoint

verified

loss curve

proof attached

attestation.json

Data

隐私 AI 数据

将模型移动到敏感记录旁，在不向模型运营方暴露原始数据的情况下返回已批准的输出。

打开解决方案

source

EHR data

source

Customer records

source

Internal docs

TEE clean room

query without raw access

approved output

aggregate only

no row exportproof linked

Deploy private inference

OpenAI-compatible. Attested. Receipt-backed.

Drop in with the OpenAI SDK you already use. Point at inference.phala.com and capture x-receipt-id for per-response proof.

View docs 联系销售

01OpenAI-compatible base URL
02TDX + H100 / H200 / Blackwell
03x-receipt-id per response
04Gateway attestation + attested sessions
055–15% TEE overhead vs bare-metal

隐私执行。可验证结果。

新闻通讯