Private inferentie

Inference zonder de prompt bloot te geven.

OpenAI-compatibel. Ondertekende ontvangstbewijzen. Geen logs, by design.

je gedachten blijven van jou

Private LLM catalog

Frontier models with private runtime.

OpenAI-compatible models with hardware-backed privacy and verification. Keep your SDK flow, change the endpoint, and copy the real call when you need it.

versleuteld

Z.ai: GLM 5.2

1.0M context

$1.40/M input

Bekijk details

versleuteld

Qwen: Qwen3.6 27B

262K context

$0.32/M input

Bekijk details

versleuteld

DeepSeek: DeepSeek V4 Flash

1.0M context

$0.20/M input

Bekijk details

versleuteld

Qwen: Qwen3.5-122B-A10B

262K context

$0.46/M input

Bekijk details

versleuteld

Qwen: Qwen3 32B

41K context

$0.12/M input

Bekijk details

versleuteld

Google: Gemma 4 31B

262K context

$0.15/M input

Bekijk details

versleuteld

Qwen: Qwen3.6 35B A3B

262K context

$0.20/M input

Bekijk details

versleuteld

DeepSeek: DeepSeek V4 Pro

800K context

$1.50/M input

Bekijk details

versleuteld

Phala: Gemma-4 26B-A4B Uncensored (Heretic)

66K context

$0.15/M input

Bekijk details

versleuteld

Phala: Qwen3.6 35B-A3B Uncensored (Aggressive)

131K context

$0.30/M input

Bekijk details

versleuteld

MoonshotAI: Kimi K2.6

262K context

$1.09/M input

Bekijk details

versleuteld

Z.ai: GLM 5.1

203K context

$1.21/M input

Bekijk details

Model requests are routed through confidential AI providers with TEE support.

Check all

Private inference, by construction.

What you say to the model stays between your client and an attested CVM. Three primitives — encryption, TEE, no-logs — make that a property of the build, not a promise.

End-to-End Encryption

AES-GCM ciphertext on the wire, both hops
RA-TLS terminates inside the CVM, not at a load balancer
No plaintext intermediary on the host

Hoe het werkt

Loop stap voor stap door één request, end-to-end.

Schakel dstack uit om precies te zien welke garantie verdwijnt.

Private inference op dstack

Twee-hops RA-TLS naar een vloot van geattesteerde model-CVM’s — verifieerbaar, van nature zonder logs

actieve edgeinactiefnieuw in stap

stap 1 of 5 · scroll-aware

Stap 1 / 5

Verifieer de build vóórdat u één byte verzendt

De client-SDK haalt de TDX-quote van elke kandidaat-CVM op en draait dcap-qvl lokaal — controleert of de build overeenkomt met een no-log entry in DstackApp.sol. De trustbeslissing gebeurt client-side; Phala wordt niet gevraagd om zichzelf te attesteren.

With dstack: De gebruiker houdt de trust root vast, verankerd in de hardware-handtekening van Intel TDX.

Zelfde SDK. Zelfde endpoints. Confidential by default.

cURL · direct inzetbaar

Hit inference.phala.com/v1/chat/completions with the OpenAI request shape. Receipt headers come back on every response — even from curl.

cURL

$ curl https://inference.phala.com \/v1/chat/completions \-H "Authorization: …" \-d '{"model":…}'x-receipt-id: rcpt-e0ee..x-aci-keyset-digest: sha256:3eff..

PYTHON

from openai import OpenAIc = OpenAI(base_url="…phala.com/v1",api_key=PHALA_KEY)r = c.chat.completions.create(…)

OpenAI Python SDK

`base_url="https://inference.phala.com/v1"` and you’re done. Existing code keeps working; capture x-receipt-id from the raw response when you need proof.

Eén uniforme verifier

Whether the model runs on Intel TDX + H100 or AMD SEV + B300, the receipt format is identical. One verification path covers your whole TEE-LLM fleet — even when you mix providers.

UNIFIED PROOF

unified verifierall match

phalaLlama 3.1

near aiDeepSeek V3

tinfoilQwen2.5

chutesMistral

one format · any provider

OPENROUTER

openrouter · phala2026-07-25

2.7Btokens / day

Llama · open$0.40 / M

Llama · phala$0.40 / M

DeepSeek · open$0.27 / M

DeepSeek · phala$0.27 / M

Geen premium voor privacy

Confidential routes through Phala on OpenRouter price the same as the open route. Privacy is no longer a procurement line item — just a header you opted into.

two-hop RA-TLS · X.509 with TDX-quote extension

tunneled · no plaintext intermediary

hop 01 · client → gateway

CN=phala-gatewayTDX-quote ext (1.3.6.1.4.1…)

hop 02 · gateway → model CVM

CN=vllm-llama-3.1-70bTDX+H100 quote ext

RA-TLSmTLSX.509tunneled

RA-TLS over twee hops, helemaal tot aan het model

The first TLS hop terminates inside the dstack-gateway CVM (whose certificate carries its TDX quote). The second terminates inside the model CVM. There is no plaintext intermediary — just two confidential VMs whose X.509 certificates ARE their attestations.

response · /v1/chat/completions

200

x-receipt-idrcpt-e0eefe…x-aci-identitysha256:3def…x-aci-keyset-digestsha256:3eff…sessionupstream.verified

verify receiptmatches attestation

Signed receipt + attested session, every response

Every response carries x-receipt-id plus the gateway identity headers. Fetch the receipt, match it to a fresh gateway attestation, then follow upstream.verified.session_id when you need deeper audit evidence.

in production today · 3 live partners

Confidential inference, in production.

OpenRouter routes its enterprise tier through Phala. NEAR AI ships verifiable agent inference. OODA AI runs decentralized GPU TEE.

01enterprise · live

OpenRouter

enterprise tier · drop-in

“Drop-in OpenAI-compatible endpoint with verifiable, no-log routing. The receipt is the audit trail.”

18B+ tokens

no-log · verified routing

02web3 · live

NEAR AI

verifiable agent inference

“Verifiable agent inference for autonomous, on-chain workflows. Every model call lands on-chain with proof.”

100% receipts

on-chain verified · zk inference

03public-co · live

OODA AI

NASDAQ-listed · decentralized GPUs

“Decentralized GPUs with hardware attestation guarantees. No host root, no off-band access, no policy promises.”

12M tokens / day

TDX + H100 · hardware-attested

OpenAI-compatible

drop-in /v1 surface

TDX + H100/H200/B300

CPU + GPU TEE

5–15% overhead

vs bare-metal

No host root

compose-hash IS the policy

AI-oplossingspaden

Gebruik privé-modellen waar AI met geheimen werkt.

Het endpoint voor het privé-model is het eerste toegangspunt. Hetzelfde privacy-gebouwblok breidt zich uit naar agents, datastromen en training.

Agents

Privé AI-agents

Laat agents draaien met sleutels, tools, geheugen en acties binnen een geverifieerde runtime in plaats van een zichtbare automation cloud.

Open oplossing

Training

Privémodeltraining

Pas modellen aan op propriëtaire data terwijl datasets, gradients, checkpoints en evaluatietraces binnen de grens blijven.

Open oplossing

private training run

Observe without exposing weights.

H100 CC

dataset

sealed

fine-tune

running

eval

private

checkpoint

verified

loss curve

proof attached

attestation.json

Data

Privé AI-data

Verplaats modellen naar gevoelige records en geef goedgekeurde outputs terug zonder ruwe data bloot te stellen aan de modeloperator.

Open oplossing

source

EHR data

source

Customer records

source

Internal docs

TEE clean room

query without raw access

approved output

aggregate only

no row exportproof linked

Deploy private inference

OpenAI-compatible. Attested. Receipt-backed.

Drop in with the OpenAI SDK you already use. Point at inference.phala.com and capture x-receipt-id for per-response proof.

View docs Neem contact op met sales

01OpenAI-compatible base URL
02TDX + H100 / H200 / Blackwell
03x-receipt-id per response
04Gateway attestation + attested sessions
055–15% TEE overhead vs bare-metal

Private uitvoering. Verifieerbare resultaten.

Nieuwsbrief