Knowledge Base & Search Endpoints
Access, search, and manage the knowledge base of indexed documents and their metadata. This system powers retrieval-augmented generation (RAG) and semantic search features.
What is the Knowledge Base?
The knowledge base is a collection of ingested documents (PDFs, Office files, text, etc.) that have been chunked, indexed, and made searchable via Azure AI Search. It supports both keyword and semantic search, and is the foundation for RAG workflows.
How is it Configured?
- Indexes: The core of the knowledge base is one or more Azure AI Search indexes. These are configured in the application config (
config.app.ai_search). - Document Storage: Documents are chunked and indexed. Metadata (file name, type, ingestion timestamp, etc.) is stored in the index.
- Table Storage: Azure Table Storage is optionally used to track ingestion timestamps and file-level metadata, providing a fallback if the index is missing this info.
- Index Management: You can view, create, and manage indexes via the Search Admin UI (
/api/v2/search-admin) or API endpoints. Indexes can be created from YAML or JSON specs, and you can list all live indexes.
What Documents Are Present?
- The
/knowledgeUI and/knowledge/statsAPI provide a live view of all indexed files, their types, sizes, and ingestion times. - You can filter by file type, search by name, and export metadata as CSV.
- Each file is broken into chunks for efficient retrieval; stats include total chunks, distinct files, and estimated token counts.
- If ingestion timestamps are missing from the index, the system will attempt to retrieve them from Table Storage (if configured).
- New: The
/knowledge/statsAPI now returns both a breakdown by file type and a breakdown by index, allowing you to see how files are distributed across indexes and types.
Business impact
You can now monitor not just what file types are present, but also how your documents are distributed across different Azure AI Search indexes. This helps with compliance, capacity planning, and understanding ingestion patterns.
Developer details
- Endpoint:
GET /knowledge/stats - Changes: Response now includes two new fields:
file_type_breakdownandindex_breakdown. - Migration: No breaking changes; new fields are additive. Existing consumers will continue to work, but you can now use the new breakdowns for analytics.
Example response (truncated)
{
"total_files": 123,
"total_chunks": 4567,
"file_type_breakdown": {
"pdf": 50,
"docx": 30,
"txt": 43
},
"index_breakdown": {
"main-index": 100,
"archive-index": 23
},
...
}
Index & Table Storage Details
- Indexes:
- Each index defines the searchable schema (fields, types, vectorization, etc.).
- You can have multiple indexes (e.g., for different document types or business units).
- Indexes are managed via the Search Admin UI/API, supporting creation, update, and listing.
- Table Storage:
- Used as a fallback for ingestion timestamps and file-level metadata.
- If the index lacks a timestamp, the system queries Table Storage for the latest ingestion time per file.
- This ensures accurate reporting even if the index is missing some metadata.
Creating New Indexes
You can create and manage Azure AI Search indexes from the UI or programmatically.
-
UI: Open
/api/v2/search-admin- List existing indexes
- Create an index from YAML (paste or upload)
- Create/Update/Upsert from raw JSON
-
API:
- List live indexes:
GET /search/indexes/live - Create from YAML:
POST /search/indexes/from-yaml(multipart file fieldfile) - Create/Update/Upsert from JSON:
POST /search/indexes/from-json- Body:
{ "mode": "create|update|upsert", "index": { "name": "my-index", "fields": [ ... ] } }
- Body:
- Create one by alias:
POST /search/indexes/create/{alias} - Create all configured:
POST /search/indexes/create-all
- List live indexes:
Notes
- Auth uses Managed Identity via DefaultAzureCredential if available, otherwise falls back to API key from config.app.ai_search.api_key.
- Aliases map to entries in config.app.ai_search.indexes and resolve to concrete index names.
- Minimal schema requires name and a non-empty fields[] list; richer features (vector config, analyzers) are supported via JSON/YAML.
Endpoints
GET /knowledge— Knowledge Base UI (dashboard for files, stats, and filters)GET /knowledge/stats— Aggregate stats and file-level metadata (JSON)GET /knowledge/stats.csv— Download stats as CSVGET /knowledge/files— Paginated list of files in the knowledge baseGET /knowledge/file/{source_file}/chunks— Retrieve all chunks for a specific filePOST /knowledge/query— Search/query the knowledge base (see below)
Querying the Knowledge Base
POST /knowledge/query
- Search the knowledge base using keyword or semantic search.
- Supports filters and top-k result limiting.
Request Example
{
"query": "What is RAG?",
"filters": {"file_type": "pdf"},
"top_k": 5
}
Response Example
{
"results": [
{"doc_id": "123", "snippet": "..."}
],
"total": 1
}
Practical Usage
- Business users:
- See what documents are available, when they were ingested, and their types.
- Use the UI to filter, search, and export file metadata.
- Developers:
- Integrate with the API to query, filter, and retrieve document chunks.
- Manage indexes and monitor ingestion via the admin UI or endpoints.
- Use Table Storage as a fallback for ingestion metadata if needed.
Authentication & Errors
- All endpoints require a valid API key or Azure AD token.
- Common errors:
400 Bad Request: Invalid input401 Unauthorized: Missing/invalid credentials500 Internal Server Error: Search or ingestion failed
Tips
- Use the Search Admin UI to manage indexes.
- Use
/knowledge/statsto monitor ingestion and file coverage. - If you see missing ingestion timestamps, check Table Storage configuration.