nvidia_tech_guides

NVIDIA Triton Inference Server is an open-source inference serving platform that allows teams to deploy and serve AI models from multiple frameworks — TensorRT, ONNX Runtime, PyTorch, TensorFlow, Python, and more — through a consistent HTTP and gRPC API. Triton is designed for multi-model, multi-framework production deployments on both NVIDIA GPU and CPU infrastructure.