Private inferentie

Inference zonder de prompt bloot te geven.

OpenAI-compatibel. Ondertekende ontvangstbewijzen. Geen logs, by design.

je gedachten blijven van jou

Private LLM catalog

Frontier models with private runtime.

OpenAI-compatible models with hardware-backed privacy and verification. Keep your SDK flow, change the endpoint, and copy the real call when you need it.

versleuteld

Qwen: Qwen3.5-122B-A10B

262K context

$0.46/M input

Bekijk details
versleuteld

Qwen: Qwen3 32B

41K context

$0.12/M input

Bekijk details
versleuteld

Google: Gemma 4 31B

262K context

$0.15/M input

Bekijk details
versleuteld

Qwen: Qwen3.6 35B A3B

262K context

$0.20/M input

Bekijk details
versleuteld

DeepSeek: DeepSeek V4 Pro

800K context

$1.50/M input

Bekijk details
versleuteld

Phala: Gemma-4 26B-A4B Uncensored (Heretic)

66K context

$0.15/M input

Bekijk details
versleuteld

Phala: Qwen3.6 35B-A3B Uncensored (Aggressive)

131K context

$0.30/M input

Bekijk details
versleuteld

MoonshotAI: Kimi K2.6

262K context

$1.09/M input

Bekijk details
versleuteld

Z.ai: GLM 5.1

203K context

$1.21/M input

Bekijk details
versleuteld

Qwen: Qwen3.5-27B

262K context

$0.30/M input

Bekijk details
versleuteld

Qwen: Qwen3.5 397B A17B

262K context

$0.55/M input

Bekijk details
versleuteld

MiniMax: MiniMax M2.5

197K context

$0.20/M input

Bekijk details
Model requests are routed through confidential AI providers with TEE support.
Check all

Private inference, by construction.

What you say to the model stays between your client and an attested CVM. Three primitives — encryption, TEE, no-logs — make that a property of the build, not a promise.

End-to-End Encryption

  • AES-GCM ciphertext on the wire, both hops
  • RA-TLS terminates inside the CVM, not at a load balancer
  • No plaintext intermediary on the host
Hoe het werkt

Loop stap voor stap door één request, end-to-end.

Schakel dstack uit om precies te zien welke garantie verdwijnt.

Private inference op dstack

Twee-hops RA-TLS naar een vloot van geattesteerde model-CVM’s — verifieerbaar, van nature zonder logs

1
Stap 1 / 5

Verifieer de build vóórdat u één byte verzendt

De client-SDK haalt de TDX-quote van elke kandidaat-CVM op en draait dcap-qvl lokaal — controleert of de build overeenkomt met een no-log entry in DstackApp.sol. De trustbeslissing gebeurt client-side; Phala wordt niet gevraagd om zichzelf te attesteren.

With dstack: De gebruiker houdt de trust root vast, verankerd in de hardware-handtekening van Intel TDX.

Zelfde SDK. Zelfde endpoints. Confidential by default.

cURL · direct inzetbaar

Hit api.redpill.ai/v1/chat/completions with the OpenAI request shape. Receipt headers come back on every response — even from curl.

cURL
$ curl https://api.redpill.ai \/v1/chat/completions \-H "Authorization: …" \-d '{"model":…}'x-phala-receipt-sig: 0x9c..x-phala-compose-hash: 0xa1..
PYTHON
from openai import OpenAIc = OpenAI(base_url="…redpill.ai/v1",api_key=RP_KEY)r = c.chat.completions.create(…)

OpenAI Python SDK

`base_url="https://api.redpill.ai/v1"` and you’re done. Existing code keeps working; receipts attach to the response object.

Eén uniforme verifier

Whether the model runs on Intel TDX + H100 or AMD SEV + B300, the receipt format is identical. One verification path covers your whole TEE-LLM fleet — even when you mix providers.

UNIFIED PROOF
unified verifierall match
phalaLlama 3.1
near aiDeepSeek V3
tinfoilQwen2.5
chutesMistral
one format · any provider
OPENROUTER
openrouter · phala2026-06-01
3Btokens / day
Llama · open$0.40 / M
Llama · phala$0.40 / M
DeepSeek · open$0.27 / M
DeepSeek · phala$0.27 / M

Geen premium voor privacy

Confidential routes through Phala on OpenRouter price the same as the open route. Privacy is no longer a procurement line item — just a header you opted into.

two-hop RA-TLS · X.509 with TDX-quote extension

tunneled · no plaintext intermediary

hop 01 · client → gateway

CN=phala-gatewayTDX-quote ext (1.3.6.1.4.1…)

hop 02 · gateway → model CVM

CN=vllm-llama-3.1-70bTDX+H100 quote ext
RA-TLSmTLSX.509tunneled

RA-TLS over twee hops, helemaal tot aan het model

The first TLS hop terminates inside the dstack-gateway CVM (whose certificate carries its TDX quote). The second terminates inside the model CVM. There is no plaintext intermediary — just two confidential VMs whose X.509 certificates ARE their attestations.

response · /v1/chat/completions

200
x-phala-receipt-sig0x9c1a…f7e2x-phala-compose-hash0xa1b2…d1f3x-phala-app-idvllm-llama-3.1-70bx-phala-no-logtrue · by build
verify offlinechains to DstackApp.sol

Getekend receipt + on-chain compose-hash, elke response

Every response carries x-phala-receipt-sig + x-phala-compose-hash. The signature chains to the TDX root and the on-chain DstackApp.sol entry — verify offline that the build that ran is the build that was registered.

in production today · 3 live partners

Confidential inference, in production.

OpenRouter routes its enterprise tier through Phala. NEAR AI ships verifiable agent inference. OODA AI runs decentralized GPU TEE.

01enterprise · live

OpenRouter

enterprise tier · drop-in

Drop-in OpenAI-compatible endpoint with verifiable, no-log routing. The receipt is the audit trail.

18B+ tokens

no-log · verified routing

02web3 · live

NEAR AI

verifiable agent inference

Verifiable agent inference for autonomous, on-chain workflows. Every model call lands on-chain with proof.

100% receipts

on-chain verified · zk inference

03public-co · live

OODA AI

NASDAQ-listed · decentralized GPUs

Decentralized GPUs with hardware attestation guarantees. No host root, no off-band access, no policy promises.

12M tokens / day

TDX + H100 · hardware-attested

OpenAI-compatible

drop-in /v1 surface

TDX + H100/H200/B300

CPU + GPU TEE

5–15% overhead

vs bare-metal

No host root

compose-hash IS the policy

AI-oplossingspaden

Gebruik privé-modellen waar AI met geheimen werkt.

Het endpoint voor het privé-model is het eerste toegangspunt. Hetzelfde privacy-gebouwblok breidt zich uit naar agents, datastromen en training.

Agents

Privé AI-agents

Laat agents draaien met sleutels, tools, geheugen en acties binnen een geverifieerde runtime in plaats van een zichtbare automation cloud.

Open oplossing
Training

Privémodeltraining

Pas modellen aan op propriëtaire data terwijl datasets, gradients, checkpoints en evaluatietraces binnen de grens blijven.

Open oplossing

private training run

Observe without exposing weights.

H100 CC

01

dataset

sealed

02

fine-tune

running

03

eval

private

04

checkpoint

verified

loss curve

proof attached

attestation.json

Data

Privé AI-data

Verplaats modellen naar gevoelige records en geef goedgekeurde outputs terug zonder ruwe data bloot te stellen aan de modeloperator.

Open oplossing

source

EHR data

source

Customer records

source

Internal docs

TEE clean room

query without raw access

approved output

aggregate only
no row exportproof linked

Deploy private inference

RA-TLS over twee hops. Ondertekende ontvangstbewijzen. On-chain zonder logs.

Plug-and-play met de OpenAI SDK die je al gebruikt. Richt op api.redpill.ai. Ontvang bij elke response een ondertekend ontvangstbewijs.

View docsNeem contact op met sales
  • 01OpenAI-compatible base URL
  • 02TDX + H100 / H200 / Blackwell
  • 03Signed receipt per response
  • 04On-chain compose-hash registry
  • 055–15% TEE overhead vs bare-metal