Introduction
Welcome to the RAG API Core Documentation Center.
RAG API Core consists of two main components:
- Data Pipelines: Responsible for ingesting documents and files, parsing, chunking, and indexing them into the knowledge base. This enables efficient retrieval and grounding for LLM-powered applications.
- API Service: Exposes endpoints for chat, RAG, and related features. This serves the chat models, handles user queries, and orchestrates retrieval, prompt assembly, and response generation.
These two components are designed to work together but are logically and operationally distinct. The data pipelines prepare and maintain the knowledge base, while the API provides real-time access to LLM and retrieval capabilities.
RAG API Core is a Retrieval-Augmented Generation API built on FastAPI, designed for RAG and Chat endpoints. This project provides a backend for LLM-powered applications, with a focus on:
- Flexible RAG & Chat Endpoints: Unified endpoints support retrieval, prompt templating, and multi-turn chat.
- Modular Architecture: Design for endpoints, routers, and app configuration, enabling extension and maintenance.
- A/B Testing: Experiment management and variant testing for iteration.
- Health & Status: Health endpoints, probes, and a dashboard for monitoring.
- Typing & Validation: Type annotations and models for schema validation.
- Documentation: OpenAPI docs, markdown-driven endpoint docs, and a documentation center.
Capabilities
RAG API Core enables building intelligent applications that combine retrieval from a knowledge base with generative AI capabilities. It provides APIs for:
- Retrieval-Augmented Generation (RAG): Enhancing LLM responses with relevant information from indexed documents
- Chat functionality: Multi-turn conversations with context awareness
- Knowledge base management: Indexing and searching documents stored in Azure services
- Experimentation: A/B testing different prompts, models, and retrieval strategies
The API integrates with Azure services including Azure OpenAI, Azure Cognitive Search, Azure Blob Storage, and Azure Key Vault for secure, scalable operations. Azure is chosen for its enterprise-grade security compliance, trustworthiness, and robust infrastructure that ensures data protection and regulatory adherence.
Key Features
- Async FastAPI core for inference and retrieval
- Factories and routers for extension
- Experiment and variant support
- Endpoints for retrieval-augmented generation and chat
- Health checks, status endpoints, and dashboards
- OpenAPI, markdown-driven docs, and a doc center
- Models and type annotations throughout
For detailed information, explore the documentation sections.
Key Features
- Async FastAPI core for inference and retrieval
- Factories and routers for extension
- Experiment and variant support
- Endpoints for retrieval-augmented generation and chat
- Health checks, status endpoints, and dashboards
- OpenAPI, markdown-driven docs, and a doc center
- Models and type annotations throughout
For detailed information, explore the documentation sections.
Last updated: September 2025