Models
ReadyGPU TEE

Sentence Transformers: all-MiniLM-L6-v2

Model IDsentence-transformers/all-minilm-l6-v2

The all-MiniLM-L6-v2 embedding model maps sentences and short paragraphs into a 384-dimensional dense vector space, enabling high-quality semantic representations that are ideal for downstream tasks such as information retrieval, clustering, similarity scoring, and text ranking.

input

$0.0050/M

output

Free/M

context

512

created

Nov 25, 2025

Supported API shape

input

text · embeddings

output

embeddings

tools

Not listed

json mode

Not listed

Verification

signature

response ID

attestation

GPU TEE

provider

1 routes

Providers

phala

GPU TEE

input

$0.0050/M

output

Free/M

context

512

More models

Other private inference routes.

View catalog
encrypted

Qwen: Qwen3.5-27B

The Qwen3.5 27B native vision-language Dense model incorporates a linear attention mechanism, delivering fast response times while balancing inference speed and performance. Its overall capabilities are comparable to those of the Qwen3.5-122B-A10B.

context

262K

input

$0.30/M

encrypted

Z.AI: GLM 4.7 Flash

As a 30B-class SOTA model, GLM-4.7-Flash offers a new option that balances performance and efficiency. It is further optimized for agentic coding use cases, strengthening coding capabilities, long-horizon task planning, and tool collaboration, and has achieved leading performance among open-source models of the same size on several current public benchmark leaderboards.

context

203K

input

$0.10/M

encrypted

Qwen: Qwen3 Embedding 8B

The Qwen3 Embedding model series is the latest proprietary model of the Qwen family, specifically designed for text embedding and ranking tasks. This series inherits the exceptional multilingual capabilities, long-text understanding, and reasoning skills of its foundational model. The Qwen3 Embedding series represents significant advancements in multiple text embedding and ranking tasks, including text retrieval, code retrieval, text classification, text clustering, and bitext mining.

context

33K

input

$0.01/M

encrypted

Phala: Venice Uncensored 24B

Venice Uncensored Dolphin Mistral 24B Venice Edition is a fine-tuned variant of Mistral-Small-24B-Instruct-2501, developed by dphn.ai in collaboration with Venice.ai. This model is designed as an “uncensored” instruct-tuned LLM, preserving user control over alignment, system prompts, and behavior. Intended for advanced and unrestricted use cases, Venice Uncensored emphasizes steerability and transparent behavior, removing default safety and alignment layers typically found in mainstream assistant models.

context

33K

input

$0.20/M