Product
RAG API Core provides APIs for Retrieval-Augmented Generation (RAG) and Chat, including:
- Unified, flexible endpoints for RAG and Chat
- Input validation and reliability
- Health, feedback, and observability features
- Integration with enterprise authentication and Azure services
- A clean, navigable documentation center
What You Get
RAG API Core provides a complete API platform for building intelligent, knowledge-aware applications. It enables businesses to:
- Build Q&A systems that provide accurate, sourced answers from your knowledge base
- Build conversational interfaces with context-aware chat capabilities
- Manage and update knowledge bases through secure document upload and indexing
- Monitor system health and performance with comprehensive dashboards and alerts
- Collect user feedback to continuously improve response quality
- Conduct A/B testing of different prompts, models, and retrieval strategies
The API integrates with Azure services for reliability, security, and scalability.
Key Use Cases
- Knowledge-base Q&A: Build systems that answer questions using your organization's documents, with citations and source grounding
- Chatbots: Create chat interfaces with guardrails and templated responses
- Content management: Upload, index, and search through large document collections
- Experimentation platform: Test different AI approaches and measure their effectiveness
Getting Started
- Explore the Introduction and Authentication guides
- Review Setup and Testing for deployment and validation
- See Architecture for technical details
How It Works
- User or system sends a request to the API (e.g., ask a question, chat, upload knowledge).
- API validates the request and determines the best way to fulfill it (RAG, chat, etc).
- If retrieval is needed, the API fetches relevant knowledge from Azure AI Search or other sources.
- The API assembles a prompt and calls Azure OpenAI (or other LLMs) to generate a response.
- The response is returned, optionally with sources, metadata, and feedback options.
- Health, feedback, and monitoring endpoints help ensure reliability and continuous improvement.
Feature Highlights
- Unified RAG & Chat Endpoints: Consistent, flexible APIs for both retrieval-augmented and conversational AI. Test responses interactively through the
/unified-testendpoint. - Prompt Management: Customize system prompts and response templates. Create versions, revert changes, and test prompts in the interface.
- Feedback Loop: Collect user feedback on responses for ongoing improvement.
- Knowledge Upload & Management: Add, update, and manage knowledge base content. Documents are indexed in Azure AI Search with processing, chunking, and embedding creation.
- Health Monitoring: Live, ready, and deep health checks with JSON and HTML dashboards.
- Testing Interface: Interactive testing through
/unified-testfor experimenting with prompts, parameters, and real-time responses. - OpenAPI & Docs Center: Auto-generated OpenAPI JSON and a modern, markdown-driven documentation center.
- Input Validation: Pydantic v2 models ensure data quality and safety.
- A/B Testing Support: Built-in fields for experiment tracking and variant selection.
Integrations
- Azure AI Search: For fast, scalable retrieval of knowledge base content.
- Azure Blob/Table Storage: For storing documents, indexes, and feedback.
- Azure OpenAI: For advanced language model responses.
- Azure API Management (APIM) & Entra ID (JWT): For secure, enterprise-grade authentication and API fronting.
- Customize templates in
rag_api_core/templates/v2
Limits & Guarantees
- Pydantic validation on inputs/outputs; clear error responses (400/4xx/5xx)
- Multimodal validation: at least 1 image, up to 5; supported types: jpeg/jpg/png/webp; ≤ 5 MB per data URL
- Feedback idempotency keyed by
response_id
Roadmap
- Streaming responses (SSE/WebSocket) after orchestrator stage
- Hardened AB testing manager replacing stubs
- Continued docs & examples expansion
Last updated: September 2025