nvidia_tech_guides

NVIDIA API Catalog Explained: Hosted AI Models and APIs for Developers

The NVIDIA API Catalog, accessible at build.nvidia.com, is NVIDIA’s hosted platform for developers to discover, try, and integrate AI models and microservices through standard API endpoints. It is the cloud-hosted counterpart to NVIDIA NIM: where NIM lets you run AI inference microservices on your own GPU infrastructure, the API Catalog lets you call the same models over HTTPS without provisioning any hardware.

For developers building AI-powered applications, the API Catalog provides a fast path from “I want to try this model” to “I have code that calls this model” — in minutes, without a GPU, without container setup, and without infrastructure overhead.

The NVIDIA API Catalog is the fastest way to integrate NVIDIA-hosted AI models into applications. It is designed for prototyping and production API access to hundreds of AI models across multiple modalities.


What the NVIDIA API Catalog Provides

The catalog is organized into model and service categories that span the core AI application space:

Large Language Models (LLMs)

The catalog hosts a wide range of open and proprietary large language models for text generation, instruction following, code generation, and reasoning:

All LLM endpoints expose an OpenAI-compatible API (/v1/chat/completions, /v1/completions), so any application using the OpenAI Python SDK or HTTP client can switch to NVIDIA-hosted models with a base URL change.

Vision Language Models (VLMs)

Vision-language models process both images and text, enabling:

Available models include Llama 3.2 Vision, Microsoft Phi-3.5 Vision, Google PaliGemma, NVLM, and others. VLM endpoints accept image inputs as base64-encoded data or URLs alongside text messages.

Embedding Models

Embedding models convert text (and sometimes images) into dense vector representations for semantic search, retrieval, and clustering:

Embedding endpoints follow the OpenAI /v1/embeddings format, making them compatible with LangChain, LlamaIndex, and other retrieval frameworks.

Reranking Models

Rerankers take a query and a set of retrieved documents and re-score them for relevance — a crucial step in high-quality RAG pipelines:

Image Generation Models

Visual generative AI models for creating images from text prompts:

Speech and Audio Models

Biology and Scientific Models

The catalog also includes domain-specific models for scientific applications:

This makes the catalog valuable not just for conversational AI but for computational biology and scientific computing applications.

Code Generation Models


API Format and Compatibility

The NVIDIA API Catalog uses standard REST APIs:

OpenAI-Compatible Chat Completions

curl -X POST "https://integrate.api.nvidia.com/v1/chat/completions" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $NVIDIA_API_KEY" \
  -d '{
    "model": "meta/llama-3.3-70b-instruct",
    "messages": [{"role": "user", "content": "Explain GPU parallelism in one paragraph."}],
    "max_tokens": 512
  }'

Embeddings

curl -X POST "https://integrate.api.nvidia.com/v1/embeddings" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $NVIDIA_API_KEY" \
  -d '{
    "input": "NVIDIA GPUs accelerate AI workloads",
    "model": "nvidia/nv-embed-v2",
    "encoding_format": "float"
  }'

OpenAI Python SDK

Because the endpoints are OpenAI-compatible, the OpenAI Python SDK works with a simple base URL override:

from openai import OpenAI

client = OpenAI(
    base_url="https://integrate.api.nvidia.com/v1",
    api_key="$NVIDIA_API_KEY"
)

response = client.chat.completions.create(
    model="meta/llama-3.3-70b-instruct",
    messages=[{"role": "user", "content": "What is RAPIDS?"}]
)
print(response.choices[0].message.content)

Free Tier and Rate Limits

NVIDIA provides a free tier for the API Catalog:

The free tier is sufficient for prototyping, integration testing, and evaluating whether a particular model meets your application’s requirements before committing to infrastructure.


API Catalog vs NIM: Choosing the Right Path

The API Catalog and NIM serve complementary purposes:

Aspect NVIDIA API Catalog NVIDIA NIM (Self-hosted)
Infrastructure NVIDIA-hosted — no GPUs needed Your GPU infrastructure required
Setup time Minutes (get API key, start calling) Hours to days (containers, Kubernetes)
Data privacy Data sent to NVIDIA’s servers Data stays in your environment
Cost model Pay-per-token / subscription Infrastructure + software licensing
Customization Limited — use models as provided Full control over model, runtime, config
Latency Dependent on network + queue Controlled by your hardware
Best for Prototyping, development, variable workloads Enterprise self-hosting, regulated data, high throughput

The common pattern is to prototype using the API Catalog, then deploy to self-hosted NIM for production workloads requiring data privacy, consistent latency, or cost efficiency at scale.


Integration with Frameworks

The NVIDIA API Catalog integrates directly with popular AI application frameworks:

LangChain

from langchain_nvidia_ai_endpoints import ChatNVIDIA

llm = ChatNVIDIA(model="meta/llama-3.3-70b-instruct")
response = llm.invoke("What is cuDNN?")

LlamaIndex

from llama_index.llms.nvidia import NVIDIA

llm = NVIDIA(model="meta/llama-3.3-70b-instruct")
response = llm.complete("Explain TensorRT in simple terms.")

The langchain-nvidia-ai-endpoints and llama-index-llms-nvidia packages abstract away the API details and connect directly to the NVIDIA API Catalog backend.


Summary

The NVIDIA API Catalog provides developers with fast, standards-compatible API access to hundreds of AI models across language, vision, speech, embeddings, and scientific domains — all without provisioning GPU infrastructure. Its OpenAI-compatible endpoints mean minimal friction for teams already using OpenAI-style tooling. The catalog serves as both a discovery platform for evaluating NVIDIA-hosted models and an integration point for building production applications that can later be migrated to self-hosted NIM deployments for enterprise requirements.


References

  1. NVIDIA API Catalog — build.nvidia.com
  2. NVIDIA API Catalog Documentation
  3. NVIDIA NIM for Developers
  4. LangChain NVIDIA AI Endpoints
  5. LlamaIndex NVIDIA Integration

← Back to NVIDIA API Catalog · ← Back to Home