All research
GPU Confidential ComputingarXiv preprint · September 6, 2024

Confidential Computing on NVIDIA Hopper GPUs: A Performance Benchmark Study

JZJianwei ZhuHang YinHang YinPDPeng DengAAAline AlmeidaShunfan ZhouShunfan Zhou

Highlights

  • Under 7% overhead for typical LLM inference
  • Near-negligible overhead for large models / long sequences
  • PCIe data transfer — not GPU compute — is the bottleneck

Abstract

We evaluate how enabling Trusted Execution Environments on NVIDIA Hopper GPUs affects performance during large language model inference. Benchmarking overhead across multiple LLMs and token lengths, we emphasize CPU–GPU data transfer over PCIe as the key constraint. Computational overhead within the GPU itself remains minimal; data transfer is the primary performance penalty. For typical LLM queries, overhead stays under 7%, and larger models with longer sequences show nearly negligible overhead — establishing that confidential GPU inference is practical at production scale.

arXiv:2409.03992

2409.03992.pdf
Loading paper…
Confidential Computing on NVIDIA Hopper GPUs: A P… | Phala