The NVIDIA API Catalog, accessible at build.nvidia.com, is NVIDIA’s hosted platform for developers to discover, try, and integrate AI models and microservices through standard API endpoints. It is the cloud-hosted counterpart to NVIDIA NIM: where NIM lets you run AI inference microservices on your own GPU infrastructure, the API Catalog lets you call the same models over HTTPS without provisioning any hardware.
For developers building AI-powered applications, the API Catalog provides a fast path from “I want to try this model” to “I have code that calls this model” — in minutes, without a GPU, without container setup, and without infrastructure overhead.
The NVIDIA API Catalog is the fastest way to integrate NVIDIA-hosted AI models into applications. It is designed for prototyping and production API access to hundreds of AI models across multiple modalities.
The catalog is organized into model and service categories that span the core AI application space:
The catalog hosts a wide range of open and proprietary large language models for text generation, instruction following, code generation, and reasoning:
All LLM endpoints expose an OpenAI-compatible API (/v1/chat/completions, /v1/completions), so any application using the OpenAI Python SDK or HTTP client can switch to NVIDIA-hosted models with a base URL change.
Vision-language models process both images and text, enabling:
Available models include Llama 3.2 Vision, Microsoft Phi-3.5 Vision, Google PaliGemma, NVLM, and others. VLM endpoints accept image inputs as base64-encoded data or URLs alongside text messages.
Embedding models convert text (and sometimes images) into dense vector representations for semantic search, retrieval, and clustering:
Embedding endpoints follow the OpenAI /v1/embeddings format, making them compatible with LangChain, LlamaIndex, and other retrieval frameworks.
Rerankers take a query and a set of retrieved documents and re-score them for relevance — a crucial step in high-quality RAG pipelines:
Visual generative AI models for creating images from text prompts:
The catalog also includes domain-specific models for scientific applications:
This makes the catalog valuable not just for conversational AI but for computational biology and scientific computing applications.
The NVIDIA API Catalog uses standard REST APIs:
curl -X POST "https://integrate.api.nvidia.com/v1/chat/completions" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $NVIDIA_API_KEY" \
-d '{
"model": "meta/llama-3.3-70b-instruct",
"messages": [{"role": "user", "content": "Explain GPU parallelism in one paragraph."}],
"max_tokens": 512
}'
curl -X POST "https://integrate.api.nvidia.com/v1/embeddings" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $NVIDIA_API_KEY" \
-d '{
"input": "NVIDIA GPUs accelerate AI workloads",
"model": "nvidia/nv-embed-v2",
"encoding_format": "float"
}'
Because the endpoints are OpenAI-compatible, the OpenAI Python SDK works with a simple base URL override:
from openai import OpenAI
client = OpenAI(
base_url="https://integrate.api.nvidia.com/v1",
api_key="$NVIDIA_API_KEY"
)
response = client.chat.completions.create(
model="meta/llama-3.3-70b-instruct",
messages=[{"role": "user", "content": "What is RAPIDS?"}]
)
print(response.choices[0].message.content)
NVIDIA provides a free tier for the API Catalog:
build.nvidia.com show a free “Try in Playground” interface — no API key required for browser-based testingThe free tier is sufficient for prototyping, integration testing, and evaluating whether a particular model meets your application’s requirements before committing to infrastructure.
The API Catalog and NIM serve complementary purposes:
| Aspect | NVIDIA API Catalog | NVIDIA NIM (Self-hosted) |
|---|---|---|
| Infrastructure | NVIDIA-hosted — no GPUs needed | Your GPU infrastructure required |
| Setup time | Minutes (get API key, start calling) | Hours to days (containers, Kubernetes) |
| Data privacy | Data sent to NVIDIA’s servers | Data stays in your environment |
| Cost model | Pay-per-token / subscription | Infrastructure + software licensing |
| Customization | Limited — use models as provided | Full control over model, runtime, config |
| Latency | Dependent on network + queue | Controlled by your hardware |
| Best for | Prototyping, development, variable workloads | Enterprise self-hosting, regulated data, high throughput |
The common pattern is to prototype using the API Catalog, then deploy to self-hosted NIM for production workloads requiring data privacy, consistent latency, or cost efficiency at scale.
The NVIDIA API Catalog integrates directly with popular AI application frameworks:
from langchain_nvidia_ai_endpoints import ChatNVIDIA
llm = ChatNVIDIA(model="meta/llama-3.3-70b-instruct")
response = llm.invoke("What is cuDNN?")
from llama_index.llms.nvidia import NVIDIA
llm = NVIDIA(model="meta/llama-3.3-70b-instruct")
response = llm.complete("Explain TensorRT in simple terms.")
The langchain-nvidia-ai-endpoints and llama-index-llms-nvidia packages abstract away the API details and connect directly to the NVIDIA API Catalog backend.
The NVIDIA API Catalog provides developers with fast, standards-compatible API access to hundreds of AI models across language, vision, speech, embeddings, and scientific domains — all without provisioning GPU infrastructure. Its OpenAI-compatible endpoints mean minimal friction for teams already using OpenAI-style tooling. The catalog serves as both a discovery platform for evaluating NVIDIA-hosted models and an integration point for building production applications that can later be migrated to self-hosted NIM deployments for enterprise requirements.