GPU Confidential ComputingarXiv preprint · June 22, 2026

The Serialized Bridge: Understanding and Recovering LLM Serving Performance under Blackwell GPU Confidential Computing

Hang Yin

Kevin Wang

View on arXiv Download PDF

Highlights

Identifies the CVM–GPU "serialized bridge" as the dominant overhead
Throughput loss reduced from 13–27% down to single digits
Worker-thread drain recovers up to 92% of lost performance

Abstract

GPU Confidential Computing preserves local GPU performance, yet LLM serving under Intel TDX plus GPU-CC suffers significant throughput losses (13–27%) and doubled KV-cache restore latency. We identify the confidential VM–GPU bridge — not GPU computation — as the primary bottleneck. GPU-CC turns host–device data movement into a serialized, high-setup-cost channel where secure copies cannot leverage CUDA-stream concurrency, asynchronous transfers block at runtime boundaries, and small crossings pay a fixed overhead. In vLLM dense decode, degradation stems from 44×-slower small allocation and copy operations. A scheduling flag recovers 57% of lost performance, and a worker-thread drain approach recovers up to 92% under high concurrency. The same bridge model explains KV-cache restoration penalties and model-loading slowdowns.

arXiv:2606.23969