NVIDIA TensorRT is a high-performance deep learning inference SDK that optimizes trained neural network models for deployment on NVIDIA GPUs. It applies graph optimizations, layer fusion, precision calibration, and hardware-specific tuning to maximize throughput and minimize latency at inference time.