Introduction

Welcome to the RAG API Core Documentation Center.

RAG API Core consists of two main components:

Data Pipelines: Responsible for ingesting documents and files, parsing, chunking, and indexing them into the knowledge base. This enables efficient retrieval and grounding for LLM-powered applications.
API Service: Exposes endpoints for chat, RAG, and related features. This serves the chat models, handles user queries, and orchestrates retrieval, prompt assembly, and response generation.

These two components are designed to work together but are logically and operationally distinct. The data pipelines prepare and maintain the knowledge base, while the API provides real-time access to LLM and retrieval capabilities.

RAG API Core is a Retrieval-Augmented Generation API built on FastAPI, designed for RAG and Chat endpoints. This project provides a backend for LLM-powered applications, with a focus on:

Flexible RAG & Chat Endpoints: Unified endpoints support retrieval, prompt templating, and multi-turn chat.
Modular Architecture: Design for endpoints, routers, and app configuration, enabling extension and maintenance.
A/B Testing: Experiment management and variant testing for iteration.
Health & Status: Health endpoints, probes, and a dashboard for monitoring.
Typing & Validation: Type annotations and models for schema validation.
Documentation: OpenAPI docs, markdown-driven endpoint docs, and a documentation center.

Capabilities

RAG API Core enables building intelligent applications that combine retrieval from a knowledge base with generative AI capabilities. It provides APIs for:

Retrieval-Augmented Generation (RAG): Enhancing LLM responses with relevant information from indexed documents
Chat functionality: Multi-turn conversations with context awareness
Knowledge base management: Indexing and searching documents stored in Azure services
Experimentation: A/B testing different prompts, models, and retrieval strategies

The API integrates with Azure services including Azure OpenAI, Azure Cognitive Search, Azure Blob Storage, and Azure Key Vault for secure, scalable operations. Azure is chosen for its enterprise-grade security compliance, trustworthiness, and robust infrastructure that ensures data protection and regulatory adherence.

Key Features

Async FastAPI core for inference and retrieval
Factories and routers for extension
Experiment and variant support
Endpoints for retrieval-augmented generation and chat
Health checks, status endpoints, and dashboards
OpenAPI, markdown-driven docs, and a doc center
Models and type annotations throughout

For detailed information, explore the documentation sections.

Key Features

Async FastAPI core for inference and retrieval
Factories and routers for extension
Experiment and variant support
Endpoints for retrieval-augmented generation and chat
Health checks, status endpoints, and dashboards
OpenAPI, markdown-driven docs, and a doc center
Models and type annotations throughout

For detailed information, explore the documentation sections.

Last updated: September 2025