RAG Service
← Back to Documentation Center

Product

RAG API Core provides APIs for Retrieval-Augmented Generation (RAG) and Chat, including:

  • Unified, flexible endpoints for RAG and Chat
  • Input validation and reliability
  • Health, feedback, and observability features
  • Integration with enterprise authentication and Azure services
  • A clean, navigable documentation center

What You Get

RAG API Core provides a complete API platform for building intelligent, knowledge-aware applications. It enables businesses to:

  • Build Q&A systems that provide accurate, sourced answers from your knowledge base
  • Build conversational interfaces with context-aware chat capabilities
  • Manage and update knowledge bases through secure document upload and indexing
  • Monitor system health and performance with comprehensive dashboards and alerts
  • Collect user feedback to continuously improve response quality
  • Conduct A/B testing of different prompts, models, and retrieval strategies

The API integrates with Azure services for reliability, security, and scalability.

Key Use Cases

  • Knowledge-base Q&A: Build systems that answer questions using your organization's documents, with citations and source grounding
  • Chatbots: Create chat interfaces with guardrails and templated responses
  • Content management: Upload, index, and search through large document collections
  • Experimentation platform: Test different AI approaches and measure their effectiveness

Getting Started


How It Works

  1. User or system sends a request to the API (e.g., ask a question, chat, upload knowledge).
  2. API validates the request and determines the best way to fulfill it (RAG, chat, etc).
  3. If retrieval is needed, the API fetches relevant knowledge from Azure AI Search or other sources.
  4. The API assembles a prompt and calls Azure OpenAI (or other LLMs) to generate a response.
  5. The response is returned, optionally with sources, metadata, and feedback options.
  6. Health, feedback, and monitoring endpoints help ensure reliability and continuous improvement.

Feature Highlights

  • Unified RAG & Chat Endpoints: Consistent, flexible APIs for both retrieval-augmented and conversational AI. Test responses interactively through the /unified-test endpoint.
  • Prompt Management: Customize system prompts and response templates. Create versions, revert changes, and test prompts in the interface.
  • Feedback Loop: Collect user feedback on responses for ongoing improvement.
  • Knowledge Upload & Management: Add, update, and manage knowledge base content. Documents are indexed in Azure AI Search with processing, chunking, and embedding creation.
  • Health Monitoring: Live, ready, and deep health checks with JSON and HTML dashboards.
  • Testing Interface: Interactive testing through /unified-test for experimenting with prompts, parameters, and real-time responses.
  • OpenAPI & Docs Center: Auto-generated OpenAPI JSON and a modern, markdown-driven documentation center.
  • Input Validation: Pydantic v2 models ensure data quality and safety.
  • A/B Testing Support: Built-in fields for experiment tracking and variant selection.

Integrations

  • Azure AI Search: For fast, scalable retrieval of knowledge base content.
  • Azure Blob/Table Storage: For storing documents, indexes, and feedback.
  • Azure OpenAI: For advanced language model responses.
  • Azure API Management (APIM) & Entra ID (JWT): For secure, enterprise-grade authentication and API fronting.
  • Customize templates in rag_api_core/templates/v2

Limits & Guarantees

  • Pydantic validation on inputs/outputs; clear error responses (400/4xx/5xx)
  • Multimodal validation: at least 1 image, up to 5; supported types: jpeg/jpg/png/webp; ≤ 5 MB per data URL
  • Feedback idempotency keyed by response_id

Roadmap

  • Streaming responses (SSE/WebSocket) after orchestrator stage
  • Hardened AB testing manager replacing stubs
  • Continued docs & examples expansion

Last updated: September 2025