Upload Endpoints: Secure Document Ingestion, Processing, and Status Tracking
These endpoints provide a robust, auditable interface for ingesting business documents into your organization's knowledge base. They are designed to:
- Enable secure, compliant upload of files with rich metadata for search, compliance, and downstream automation.
- Support a wide range of business document types (PDF, DOCX, PPTX, XLSX, CSV, TXT, JSON) and custom metadata fields.
- Allow both asynchronous and synchronous processing, with real-time status tracking and error reporting for every upload.
- Integrate with Azure Blob Storage, Table Storage, AI Search, and Function Apps for scalable, automated document pipelines.
- Provide operational transparency and diagnostics for admins, with endpoints for pipeline health, configuration, and storage mapping.
Business Value: - These endpoints let you upload and organize business documents so they can be found, searched, and used by your team or applications. - Every upload is tracked, so you can see if it was processed, if there were any errors, and when it finished. - You can add custom metadata to help with search, reporting, or sorting later. This is useful for things like contracts, policies, reports, or any files you want to keep organized and searchable.
How it works: - The UI or API receives a file and metadata from the user (or automation). - The file is stored in Azure Blob Storage, and a status record is created in Table Storage for tracking. - The system triggers processing (via Azure Function or Event Grid), extracting content, enriching metadata, and indexing for search. - The status of each upload is updated at every stage, and can be polled or listed for real-time feedback and troubleshooting.
All features described here are available via the v2 upload UI and are designed for both business and developer use cases.
Overview
The upload endpoints support: - Multiple file types (PDF, DOCX, PPTX, XLSX, CSV, TXT, JSON) - Rich metadata for search, compliance, and downstream automation - Asynchronous and synchronous processing modes - Real-time status tracking and error reporting - Integration with Azure AI Search, Blob Storage, Table Storage, and Function Apps - Event Grid-driven and HTTP-triggered processing for scalable, auditable pipelines
Endpoints
GET /upload
Serves the upload UI (HTML). Use this for interactive file uploads and status monitoring.
POST /upload
Upload a file for ingestion and processing.
- Content-Type: multipart/form-data
- Body fields:
- file (file, required): The document to upload.
- metadata (string, required): JSON string with document metadata (see below).
- process_mode (string, optional): async (default), sync, or upload_only.
- prevent_reprocess (bool, optional): If true and file exists, returns 409 instead of replacing (default: true).
- replace_existing (bool, optional): If true, deletes existing indexed chunks and re-uploads (default: false).
- target_index (string, optional): Azure AI Search index name (must exist).
Metadata Example
{
"title": "Q3 Financial Report",
"doc_type": "report",
"tags": ["finance", "quarterly"],
"confidentiality": "internal",
"business_unit": "Finance",
"effective_date": "2025-07-01",
"expiry_date": "2026-07-01"
}
Request Example (cURL)
curl -X POST \
-F "file=@Q3_Report.pdf" \
-F "metadata={\"title\":\"Q3 Financial Report\",\"doc_type\":\"report\"}" \
-F "process_mode=async" \
-F "target_index=finance-index" \
https://your-api/upload
Response Example
{
"upload_id": "uuid",
"status": "queued",
"message": "File processing queued for Azure Function"
}
GET /upload/status/{upload_id}
Get the current processing status of an uploaded file. - Returns progress, stage, errors, and metadata.
Example Response
{
"upload_id": "...",
"status": "processing",
"progress": 60,
"processing_stage": "Extracting text",
"message": "Text extraction in progress",
"errors": [],
"metadata": { ... }
}
GET /upload/list
List recent uploads and their processing status. Returns an array of recent upload records, each with upload_id, filename, status, progress, timestamps, and any error details. Useful for monitoring all recent ingestion activity and for admins to audit the pipeline.
GET /upload/config
Get current Azure Function configuration and status. Returns details about the configured Function App endpoint, trigger mode (HTTP/Event Grid), timeouts, and debug flags. Used by the UI to display diagnostics and by admins to verify deployment/configuration.
POST /upload/config/toggle
Enable or disable HTTP-based function triggering at runtime (in-memory, admin only). Accepts a JSON body to set the trigger state. Used for operational control—e.g., to temporarily switch to Event Grid-only processing or to test fallback modes without redeploying.
GET /upload/mappings
Get real file type mappings, including the container, directory, example blob URLs, and whether the target container exists. Returns a mapping for each supported file extension. Used by the UI to show admins and users exactly where files will be stored and to surface misconfigurations or missing containers.
GET /upload/event-grid/activity
List recent Event Grid-driven processing activity. Returns a list of recent Event Grid events related to file uploads and processing, including event type, timestamp, upload ID, and status. Used for pipeline health monitoring, troubleshooting, and compliance/audit reporting.
GET /upload/function-app/health
Proxy to Azure Function App /api/health?details=1 for browser-friendly access. Returns the health status of the Function App and its dependencies (e.g., storage, search, AI services). Used by the UI and admins to quickly check the operational health of the processing backend, with badges for each dependency.
Parameters
| Name | Type | Required | Description |
|---|---|---|---|
| file | file | Yes | File to upload |
| metadata | string | Yes | JSON metadata for enrichment/search |
| process_mode | string | No | async (default), sync, or upload_only |
| prevent_reprocess | bool | No | If true, 409 on duplicate (default: true) |
| replace_existing | bool | No | If true, forcibly reprocess existing files |
| target_index | string | No | Azure AI Search index name |
Processing Flow
- Upload: File and metadata are sent to
/upload. A uniqueupload_id(UUID) is generated for each upload and returned in the response. - Storage: File is stored in Azure Blob Storage, in a container/directory determined by file extension and config. Debug mode can store locally.
- Status Tracking: Upload status is tracked in Azure Table Storage (or fallback store if unavailable). Each upload gets a status entity keyed by
upload_id.- The status entity includes:
upload_id,status(e.g. queued, processing, complete, error),progress(0-100),processing_stage(text),message,errors(array), and all metadata fields. - Status is updated at each pipeline stage (upload, blob, function trigger, processing, indexing, complete, error).
- The UI polls
/upload/status/{upload_id}every few hundred ms to update progress bars, stage, and error messages in real time.
- The status entity includes:
- Processing: File is processed asynchronously (default) by an Azure Function App (HTTP trigger or Event Grid trigger, depending on config). The function endpoint is chosen by file type. For small files and
syncmode, processing may occur inline.- The Function App performs extraction, enrichment, and indexing. It updates status in Table Storage as it progresses.
- If Event Grid is used, events are emitted and tracked for pipeline health.
- Status Polling: The UI and API clients poll
/upload/status/{upload_id}for up-to-date progress, errors, and completion. The status entity is designed for efficient polling and includes ETag support for caching. - Completion: On success, the file is indexed and available for search. On error, the status entity includes error details and the UI displays them to the user. Event Grid activity and function health endpoints provide further diagnostics.
Status Entity Schema (Table Storage)
Each upload is tracked by a status entity in Azure Table Storage (or fallback store). Example fields:
| Field | Type | Description |
|---|---|---|
| upload_id | string | Unique UUID for the upload (primary key) |
| status | string | queued, processing, complete, error, etc. |
| progress | int | 0-100, percent complete |
| processing_stage | string | Human-readable stage (e.g. "Extracting text") |
| message | string | Status or error message |
| errors | array | List of error objects (if any) |
| metadata | object | All user-supplied metadata fields |
| timestamps | object | Created, updated, completed times |
The backend uses atomic merge/update operations to ensure status is always current and consistent. The UI interprets these fields to show progress bars, stage info, and error badges.
Processing Pipeline: Function App & Event Grid
- Function App: The backend triggers an Azure Function (HTTP or Event Grid) based on file type. The function endpoint is determined by extension (e.g.
/api/pdf-processfor PDFs). - Event Grid: If enabled, Event Grid events are emitted for uploads and processing stages. The
/upload/event-grid/activityendpoint lists recent events for monitoring. - Sync Mode: For small files and
syncmode, processing may occur inline in the web app, with status updated directly. - Debug Mode: If enabled, files are stored locally and uploads are not sent to Azure.
Upload ID Generation
Each upload receives a unique upload_id (UUID, generated server-side) that is used for all status tracking, polling, and diagnostics. This ID is returned in the initial upload response and is required for all status queries.
Error Handling & Status Codes
- All errors (validation, storage, processing, function, etc.) are captured in the status entity's
errorsarray and surfaced in the UI. - The UI displays error badges, messages, and allows retry or replace as appropriate.
- HTTP status codes are used for API responses (
400,401,409,500), but detailed error context is always available in the status entity.
UI Features (as implemented in /upload)
- Drag-and-drop and multi-file selection
- Per-file metadata entry (title, doc_type, tags, confidentiality, business unit, effective/expiry date)
- Processing mode selection (async, sync, upload only)
- Target index selection (auto-populated from live Azure AI Search indexes)
- Prevent reprocess and replace existing toggles
- Real-time upload progress and status polling
- Error and duplicate detection with user feedback
- Event Grid activity and function health diagnostics
- File type mapping and container existence diagnostics
Authentication
All endpoints require a valid API key or Azure AD token.
Error Codes
400 Bad Request: Invalid file, metadata, or parameters401 Unauthorized: Missing/invalid credentials409 Conflict: Duplicate file (whenprevent_reprocessis true)500 Internal Server Error: Upload or processing failed
Usage Tips
- Use the upload UI for a guided, user-friendly experience with status and diagnostics.
- For automation, use the API directly with
multipart/form-datarequests. - Always check status with
/upload/status/{upload_id}after upload. - Use
/upload/listand/upload/event-grid/activityfor monitoring pipeline health.