Private AI Inference

Serve LLMs without exposing prompts or weights

Deploy on Phala

Why It Matters

Why Private Inference Matters

Centralized inference can log prompts and leak IP. Phala enclaves ensure no operator—cloud or vendor—can peek.

Prompts can expose sensitive queries

Traditional cloud infrastructure exposes sensitive information to operators and administrators.

More Information

Model weights are valuable IP

Hardware-enforced isolation prevents unauthorized access while maintaining computational efficiency.

More Information

Inference logs reveal business patterns

End-to-end encryption protects data in transit, at rest, and critically during computation.

More Information

No operator access to runtime memory

Cryptographic verification ensures code integrity and proves execution in genuine TEE hardware.

More Information

Hover to Encrypt

GPU TEE Protection

Zero-Trust Inference

Confidential Serving

OpenAI-Compatible API with Hardware Encryption

GPU TEEs with Intel TDX and AMD SEV provide hardware-level memory encryption—your model weights, user prompts, and inference outputs stay encrypted in-use. Inputs, outputs, and weights stay inside attested GPU enclaves. Not even cloud admins or hypervisors can inspect runtime state.

Privacy as a human right—by design. Route requests via mTLS into enclave. Emit usage receipts; never store plaintext. OpenAI-compatible endpoints with verifiable attestation and zero-logging guarantees.

GPU memory encryption

OpenAI-compatible API

Zero-logging architecture

Available Models

Access the latest frontier AI models with cryptographic privacy protection

DeepSeek: R1 0528

164K context

$2.00/M input tokens

$2.00/M output tokens

Encrypted

Qwen: Qwen3 Coder 480B A35B

262K context

$2.00/M input tokens

$2.00/M output tokens

Encrypted

Qwen: Qwen3 VL 30B A3B Instruct

262K context

$2.00/M input tokens

$2.00/M output tokens

Encrypted

Meta: Llama 3.3 70B Instruct

131K context

$2.00/M input tokens

$2.00/M output tokens

Encrypted

Sentence Transformers: all-MiniLM-L6-v2

$0.10/M output tokens

Encrypted

DeepSeek: DeepSeek V3 0324

164K context

$0.28/M input tokens

$1.14/M output tokens

Encrypted

Qwen: Qwen2.5 VL 72B Instruct

66K context

$0.59/M input tokens

$0.59/M output tokens

$0.40/M output tokens

$0.49/M output tokens

$0.15/M output tokens

Encrypted

Real-World Success Stories

Discover how leading companies are leveraging Phala's confidential AI to build exceptional digital experiences, while maintaining complete data privacy and regulatory compliance.

Financial Services

Major Investment Bank

Phala enabled us to process sensitive trading data with AI while maintaining complete regulatory compliance. We've reduced compliance costs by 40% while improving model accuracy.

• Risk Management AI

• $2B+ Assets Under Management

Healthcare Research

Research Consortium

Multi-party collaboration on patient data without privacy compromise. Accelerated drug discovery by 60% while maintaining HIPAA compliance.

• 5 Hospital Network

• 100K+ Patient Records

AI SaaS Platform

Enterprise Software Company

Phala's confidential AI helped us land Fortune 500 clients who required verifiable data protection. Increased enterprise sales by 300%.

• B2B AI Platform

• 50+ Enterprise Clients

Decentralized AI

DeFi Protocol

Built autonomous trading agents with verifiable execution. Users trust our AI because they can verify every decision on-chain.

• Autonomous Agents

• $50M+ TVL

Legal AI

Law Firm & Legal Tech

Comprehensive evaluation of safety-utility trade-offs in Legal AI across 12 Large Language Models. Analysis of model performance balancing accuracy and risk mitigation for legal practice.

• 12 LLM Models Evaluated

• Safety-Utility Trade-offs Analysis

Financial Services

Major Investment Bank

Phala enabled us to process sensitive trading data with AI while maintaining complete regulatory compliance. We've reduced compliance costs by 40% while improving model accuracy.

• Risk Management AI

• $2B+ Assets Under Management

Healthcare Research

Research Consortium

Multi-party collaboration on patient data without privacy compromise. Accelerated drug discovery by 60% while maintaining HIPAA compliance.

• 5 Hospital Network

• 100K+ Patient Records

AI SaaS Platform

Enterprise Software Company

Phala's confidential AI helped us land Fortune 500 clients who required verifiable data protection. Increased enterprise sales by 300%.

• B2B AI Platform

• 50+ Enterprise Clients

Decentralized AI

DeFi Protocol

Built autonomous trading agents with verifiable execution. Users trust our AI because they can verify every decision on-chain.

• Autonomous Agents

• $50M+ TVL

Legal AI

Law Firm & Legal Tech

Comprehensive evaluation of safety-utility trade-offs in Legal AI across 12 Large Language Models. Analysis of model performance balancing accuracy and risk mitigation for legal practice.

• 12 LLM Models Evaluated

• Safety-Utility Trade-offs Analysis

On-prem Privacy, Cloud Simplicity

Deploy confidential AI inference with the flexibility of cloud and the security of on-premise infrastructure.

Your private request

End-to-end encrypted

Phala Cloud

Hardware-attested routing

DeepSeek V3.1

GPU TEE

by Phala

Qwen3 Coder 480B

GPU TEE

by NearAI

GPT OSS 120B

GPU TEE

by Tinfoil

Your private request

End-to-end encrypted

Phala Cloud

Hardware-attested routing

Qwen3 Coder 480B

GPU TEE

by NearAI

Powerful Features, Simple Integration

OpenAI-compatible APIs with advanced capabilities running in TEE

Step 1

Make Secure API Requests

Use OpenAI-compatible SDK to access 200+ models with hardware-enforced privacy. Drop-in replacement with zero code changes.

secure_request.py

from openai import OpenAI

client = OpenAI(
    api_key="<API_KEY>",
    base_url="https://api.redpill.ai/api/v1"
)

response = client.chat.completions.create(
    model="deepseek/deepseek-chat-v3.1",
    messages=[
        {"role": "system", "content": "You are a helpful assistant"},
        {"role": "user", "content": "What is your model name?"},
    ],
    stream=True
)
print(response.choices[0].message.content)

Step 2

Verify TEE Execution

Every response includes cryptographic proof from NVIDIA and Intel TEE hardware. Verify attestation to ensure secure execution.

verify_attestation.py

import requests
import jwt

# Fetch attestation report
response = requests.get(
    "https://api.redpill.ai/v1/attestation/report?model=deepseek/deepseek-chat-v3.1",
    headers={"Authorization": f"Bearer {api_key}"}
)
report = response.json()

# Verify NVIDIA GPU attestation
gpu_response = requests.post(
    "https://nras.attestation.nvidia.com/v3/attest/gpu",
    headers={"Content-Type": "application/json"},
    data=report["nvidia_payload"]
)

# Check verification result
gpu_tokens = gpu_response.json()[1]
for gpu_id, token in gpu_tokens.items():
    decoded = jwt.decode(token, options={"verify_signature": False})
    assert decoded.get("measres") == "success"
    print(f"{gpu_id}: Verified ✓")

Solutions for Every User

Choose the perfect privacy-first AI solution tailored to your needs

Personal

Individual

Private AI assistants for individuals who value data sovereignty and zero-logging guarantees.

What's included:

Private chat with zero data retention
Encrypted journal & notes
Personal data analysis

Try Models

Developer

API

OpenAI-compatible APIs with TEE protection—drop-in replacement with hardware-enforced privacy.

What's included:

One-line API integration
Same SDKs & libraries
Verifiable attestation

Get API Key

Enterprise

Scalable confidential AI infrastructure with compliance, auditability, and flexible deployment options.

What's included:

Private RAG & AI copilots
Confidential fine-tuning
On-prem or cloud deployment
HIPAA/SOC2 compliance

OODA AI

NASDAQ LISTED / GPU TEE

NASDAQ-listed OODA AI partners with Phala Decentralized GPUs with hardware attestation guarantees

Read case study

NEAR AI

WEB3 / VERIFIABLE ML

Private ML SDK—verifiable agent inference. Zero-knowledge inference for autonomous agents

Read case study

OpenRouter

ENTERPRISE / CONFIDENTIAL API

Confidential route for enterprise prompts. TEE-protected LLM gateway with attestation

Read case study

Industry-Leading Enterprise Compliance

Meeting the highest compliance requirements for your business

Explore Our Solutions

Discover how Phala Network enables privacy-preserving AI across different use cases

Private AI Data

Monetize and analyze sensitive data with TEEs and remote attestation—without exposing the raw data

Private AI Inference

Run AI models with end-to-end encryption to protect user inputs, outputs, and model IP

Fine-Tuned Models

Train and deploy custom AI models in secure enclaves while protecting your proprietary data

Confidential Training

Train AI models on sensitive data without exposing it, ensuring privacy and compliance

AI Agents

Build autonomous AI agents with cryptographic privacy guarantees for enterprise workflows

FAQ

Frequently Asked Questions

Everything you need to know about Private AI Inference

How does Phala ensure truly private inference?

Phala uses GPU Trusted Execution Environments (TEEs) with Intel TDX and AMD SEV to encrypt all prompts, outputs, and model weights during inference. Not even cloud providers or system administrators can access data in use—only the attested enclave can decrypt your inputs.

Can cloud providers or Phala operators see my prompts?

No. Hardware-level memory encryption (Intel TDX/AMD SEV) prevents any operator—including Phala, cloud providers, or root users—from reading runtime memory. Data is encrypted from the moment it enters the TEE until it leaves.

How are GPU model weights protected?

Model weights are loaded directly into encrypted GPU memory inside TEEs. They never touch disk or CPU in plaintext. Each deployment is sealed with cryptographic measurements (mrenclave) you can verify before sending data.

What are attestation proofs, and how do I verify them?

Attestation proofs are cryptographic signatures from the CPU/GPU proving the exact code and environment running inside the TEE. Verify them via /v1/attestation endpoints before sending prompts—ensuring no tampering or backdoors exist.

How long does it take to deploy a private inference endpoint?

Under 5 minutes. Use Docker containers with pre-configured TEE images, or deploy via Phala Cloud's one-click interface. No custom firmware or low-level TEE programming required.

Do I need to modify my existing OpenAI-compatible code?

No. Phala provides drop-in OpenAI-compatible API endpoints (base_url = https://api.redpill.ai/v1). Use the same SDKs (openai-python, openai-node) and just point to Phala's attested endpoints.

Can I bring my own fine-tuned model weights?

Yes. Upload weights encrypted with your key, and Phala will load them into TEE memory without ever decrypting them in transit. Use /v1/attestation to verify the deployment before sending prompts.

What's the latency overhead of TEE inference?

5-15% compared to bare-metal GPUs. Memory encryption happens at hardware speed with Intel TDX/AMD SEV, so most workloads see negligible impact. Batching and caching reduce overhead further.

Can I use this for healthcare/law firm AI assistants?

Yes. Private inference is ideal for HIPAA/GDPR-regulated industries. Patient records or legal documents never leave the TEE in plaintext, and attestation proofs provide audit trails for compliance.

How does this help with internal document Q&A chatbots?

Embed your internal docs (HR policies, financial reports) into TEE-protected RAG pipelines. Employees query via private endpoints, and neither Phala nor cloud providers can read the documents or queries.

What's the maximum prompt/document size I can send?

Up to 128k tokens for most models (e.g., Qwen3 Coder 480B, DeepSeek V3.1). For longer documents, use chunking strategies or contact us for enterprise deployments with extended context windows.

Are there production deployments using this today?

Yes. OpenRouter uses Phala for confidential enterprise routes, NEAR AI for verifiable ML inference, and OODA AI (NASDAQ-listed) for decentralized GPU TEE deployments. See case studies above.

Start Private AI Inference Today

Deploy confidential LLM endpoints with hardware-enforced encryption and zero-knowledge guarantees.

Deploy on Phala

Intel TDX & AMD SEV support
Remote attestation built-in
Zero-trust architecture
Enterprise-ready compliance
24/7 technical support

Build AI People Can Trust.

Subscribe to our newsletter