RAG Service
← Back to Documentation Center

Unified Test UI — End-to-End Example Usage and API Mapping

This guide walks through the Unified Test UI (the page served by unified_rag_test.html) and shows how each control maps to API calls. Use this as a blueprint to replicate the same functionality from your own backend or tooling.

The UI focuses on: - Model selection and reasoning effort overrides per request - Flexible RAG and Chat flows (with optional knowledge base retrieval) - Multimodal variant (images + text) - Prompt templating and variable injection - Token usage and reasoning metadata in responses - Feedback collection with response_id correlation

Where the UI is mounted

  • HTML template: rag_api_core/templates/v2/unified_rag_test.html
  • Route that serves it: rag_api_core/endpoints/v2/management.pycreate_unified_rag_test_endpoint() returns the page.

Key backend endpoints used by the UI

  • Model catalog (for the model picker)

    • GET /api/v2/models?validate=true&filter_unhealthy=true[&deep=true]
    • Returns default alias, alias list, model details, and optional validation results.
  • Prompts (for viewing defaults and templates)

    • GET /api/v2/prompts?json=true
    • GET /api/v2/prompts?json=true&full=true (on-demand full content for previews)
  • Indexes (for retrieval controls)

    • GET /api/v2/indexes
  • Flexible APIs (core actions)

    • POST /api/v2/flexible-rag (single-turn, with optional KB retrieval)
    • POST /api/v2/flexible-chat (multi-turn chat)
    • POST /api/v2/flexible-rag-mm (multimodal RAG: images + text)
    • POST /api/v2/flexible-chat-mm (multimodal chat)
  • Feedback

    • POST /api/v2/feedback

The UI uses a single Flexible form and switches the target endpoint at submit time based on: - Chat toggle (OFF → RAG; ON → Chat) - Multimodal section (OFF → text only; ON → image + text) There is no separate "Multimodal" tab anymore. Multimodal is an optional section within the Flexible form where you can drag/drop images or paste URLs; when enabled, the UI calls the -mm endpoint variant automatically.

Instant load bootstrap (/unified-test?json=true)

Business impact

The Unified Test page now renders immediately with lightweight skeleton states and then hydrates using a JSON bootstrap. This removes the long blank screen while defaults load and keeps the UI responsive even when prompt storage is slow.

Developer details

  • Endpoint: GET /api/v2/unified-test?json=true
  • Payload changes: Response now includes active_model, llm_params, configured_defaults.system_prompts, and the hydrated system prompt text (current_system_prompt).
  • Usage: Fetch the JSON payload after the HTML shell loads to pre-fill form controls, mirrors the first-party UI behaviour.
  • Migration: No breaking change—existing HTML flow still works. Client apps can optionally call the JSON endpoint to boot quickly or re-use defaults in custom dashboards.

Examples

curl -X GET "https://yourhost/api/v2/unified-test?json=true" \
    -H "Authorization: Bearer $TOKEN" 
import requests

resp = requests.get(
    "https://yourhost/api/v2/unified-test",
    params={"json": "true"},
    headers={"Authorization": f"Bearer {TOKEN}"},
)
resp.raise_for_status()
payload = resp.json()
print(payload["active_model"], payload["configured_defaults"]["system_prompts"], payload["llm_params"])

The JSON structure is cached server-side, so repeated loads avoid re-fetching prompts or blob client handles. Clients should still handle occasional cache misses (cache_status: "miss") and fall back to the full payload.

Request payload shapes (what the UI sends)

All Flexible requests share the same shape, with some optional fields.

  • Flexible RAG (text-only):
{
    "question": "What can you tell me about yourself?",
    "skip_knowledge_base": false,
    "fetch_args": {
        "AzureSearchFetcher": {
            "query": "What can you tell me about yourself?",
            "top_k": 5,
            "vector_search": true
            // ... other fetcher options may be included
        }
    },
    "history": [ { "role": "user", "content": "..." } ],
    "template_variables": { "domain": "general knowledge", "response_style": "formal" },
    "metadata": { "user_id": "demo_user" },
    "override_config": {
        "llm": "alias-selected-in-ui",
        "reasoning_effort": "auto",
        "reasoning": { "effort": "auto" }
    }
}
  • Flexible Chat (text-only) uses the same payload fields; history is read/written as you turn on Chat Mode.

  • Flexible RAG (multimodal) adds images under metadata.images:

{
    "question": "Help me understand this image",
    "fetch_args": { /* optional KB fetch args */ },
    "history": [],
    "template_variables": {},
    "metadata": {
        "images": [
            { "url": "https://.../img1.jpg", "detail": "auto" },
            { "data_url": "...", "detail": "high" }
        ]
    },
    "override_config": { "llm": "alias-selected-in-ui" }
}

The UI also allows image drag/drop to create data_url entries.

Notes: - Model override is set via override_config.llm. - Reasoning effort selector writes both override_config.reasoning_effort and override_config.reasoning.effort for compatibility. The router sanitizes unsupported values and preserves what was requested vs what was sent in response_metadata. - When Knowledge Base is disabled, the UI sends skip_knowledge_base: true and omits fetch_args.

Schemas to reference: - Requests: rag_api_core/schemas/v2/requests.py (FlexibleRagRequest, FlexibleChatRequest)

Parameter reflection (optional)

Some providers and router paths return a reflection of the effective parameters used to generate the response under response_metadata.parameters.

  • Business impact: Helps confirm what values were actually applied when defaults, overrides, or provider sanitization are involved.
  • Developer details: When present, expect fields like temperature, top_p, max_tokens, frequency_penalty, presence_penalty, repetition_penalty, and stop. Treat this as optional; not all providers populate it.

Examples

curl -X POST "https://yourhost/api/v2/flexible-rag" \
    -H "Authorization: Bearer $TOKEN" \
    -H "Content-Type: application/json" \
    -d '{
        "question": "Param reflection?",
        "override_config": {
            "llm": "default",
            "n": 2,
            "params": {"temperature": 0.2, "top_p": 0.9}
        },
        "fetch_args": {"AzureSearchFetcher": {"top_k": 0}}
    }'
import requests
payload = {
    "question": "Param reflection?",
    "override_config": {
        "llm": "default",
        "n": 2,
        "params": {"temperature": 0.2, "top_p": 0.9}
    },
    "fetch_args": {"AzureSearchFetcher": {"top_k": 0}}
}
r = requests.post("https://yourhost/api/v2/flexible-rag", json=payload, headers={"Authorization": f"Bearer {TOKEN}"})
data = r.json()
print(data.get("answers"), data.get("response_metadata", {}).get("parameters"))

Response shape (what the UI expects)

Flexible responses include rich metadata and IDs used by the feedback flow.

Common fields: - question — echoed question - answer — primary answer; if multiple completions were generated, this equals the first item of answers[] - answers[] — optional list of alternative completions when n > 1 - token_usage — with input/prompt and output/completion breakdown and total - reasoning_effort — what the client asked for - reasoning_tokens — reasoning token usage (if any, v2.1.1+) - reasoning_effective — object with requested, sent_effort, provider, and sanitized flags - model_used — final provider/model/deployment info (when available)

Schemas to reference:

Response examples by endpoint

The UI relies on these endpoints. Below are representative response payloads you can expect and emulate.

GET /api/v2/models?validate=true&filter_unhealthy=true[&deep=true]

Example:

{
    "default_alias": "default",
    "aliases": ["default", "oai_gpt4o", "or_claude", "or_gemini", "or_grok", "or_llama", "or_gpt4o", "or_claude_opus", "or_gemini_pro", "or_gpt5"],
    "models": [
        {
            "alias": "default",
            "provider": "azure",
            "deployment": "ChatModelLM",
            "api_base_url": "https://aichat-test-openai-msba.cognitiveservices.azure.com/",
            "api_version": "2024-12-01-preview",
            "auth": "managed-identity",
            "supports_multimodal": true
        },
        { "alias": "oai_gpt4o", "provider": "openai", "model": "gpt-4o-mini", "supports_multimodal": true },
        { "alias": "or_claude", "provider": "openrouter", "model": "anthropic/claude-3.5-sonnet", "supports_multimodal": true },
        { "alias": "or_gemini", "provider": "openrouter", "model": "google/gemini-2.5-flash", "supports_multimodal": true },
        { "alias": "or_grok", "provider": "openrouter", "model": "x-ai/grok-4", "supports_multimodal": true },
        { "alias": "or_llama", "provider": "openrouter", "model": "meta-llama/llama-3.1-8b-instruct", "supports_multimodal": false },
        { "alias": "or_gpt4o", "provider": "openrouter", "model": "openai/gpt-4o", "supports_multimodal": true },
        { "alias": "or_claude_opus", "provider": "openrouter", "model": "anthropic/claude-3-opus", "supports_multimodal": true },
        { "alias": "or_gemini_pro", "provider": "openrouter", "model": "google/gemini-1.5-pro", "supports_multimodal": true },
        { "alias": "or_gpt5", "provider": "openrouter", "model": "openai/gpt-5", "supports_multimodal": true }
    ],
    "validation": [
        { "alias": "default", "ok": true, "reason": null }
    ]
}

Notes: - When validate=false, the "validation" array is omitted. - Fields under "models" vary by provider (deployment vs model name, base URLs, etc.).

GET /api/v2/prompts?json=true

Example (listing prompts and effective defaults):

{
    "prompts": {
        "system_prompts": ["assistant_default.j2"],
        "response_templates": ["plain_answer.j2", "markdown_answer.j2"],
        "experiments": []
    },
    "source": "filesystem",
    "storage": { "folder": "resources/prompts" },
    "current_system_prompt": "You are a helpful AI assistant...",
    "selected_prompt_content": "",
    "selected_prompt_info": null,
    "configured_defaults": {
        "system_prompts": ["assistant_default.j2"],
        "response_templates": ["plain_answer.j2"]
    },
    "effective": {
        "system_prompt": "You are a helpful AI assistant...",
        "response_template": "Based on the following information, please answer the question..."
    },
    "versions": {
        "system": { "assistant_default.j2": ["2025-09-12T18:02:00Z"] },
        "response": {}
    }
}

Selected content preview (when a specific prompt is requested):

GET /api/v2/prompts?json=true&prompt=system_prompts/assistant_default.j2
{
    "selected_prompt_info": { "path": "system_prompts/assistant_default.j2", "source": "filesystem" },
    "selected_prompt_content": "You are a helpful AI assistant...",
    "prompts": { "system_prompts": ["assistant_default.j2"], "response_templates": ["plain_answer.j2"] },
    "source": "filesystem",
    "storage": { "folder": "resources/prompts" },
    "configured_defaults": { "system_prompts": ["assistant_default.j2"], "response_templates": ["plain_answer.j2"] },
    "effective": { "system_prompt": "...", "response_template": "..." },
    "versions": { "system": {"assistant_default.j2": ["2025-09-12T18:02:00Z"]}, "response": {} }
}

Tip: The UI sometimes uses &full=true for previews; the response structure stays the same.

GET /api/v2/indexes

Example:

{
    "indexes": ["unified_text_index", "kb_faq_index"],
    "default": "unified_text_index",
    "map": {
        "unified_text_index": { "name": "unified_text_index-prod" },
        "kb_faq_index": { "name": "kb_faq_index" }
    }
}

POST /api/v2/flexible-rag

Example success:

{
    "response_id": "7a7d3f3c-f2e0-4a1d-9c9f-7a4a1b0a3f9d",
    "answer": "The support policy for X includes LTS releases with quarterly updates...",
    "answers": [
        "The support policy for X includes LTS releases with quarterly updates...",
        "X is supported with quarterly updates and a long-term support cadence..."
    ],
    "metadata": [
        { "index": "unified_text_index", "id": "doc-123", "score": 14.2, "text": "...excerpt..." },
        { "index": "unified_text_index", "id": "doc-987", "score": 12.8, "text": "...excerpt..." }
    ],
    "history": [],
    "ab_testing": null,
    "errors": null,
    "data": null,
    "template_info": {
        "system_prompt_source": "file:system_prompts/assistant_default.j2",
        "response_template_source": "file:response_templates/plain_answer.j2",
        "template_variables_used": ["question", "user_id", "metadata"]
    },
    "processing_time": 1.72,
    "tokens_used": 532,
    "response_metadata": {
        "config_overrides_applied": true,
        "custom_templates_used": false,
        "request_metadata": { "user_id": "demo_user" },
        "question": "What is the support policy for X?",
        "system_prompt": "You are a helpful AI assistant...",
        "errors_present": false,
        "queried_indexes": ["unified_text_index-prod"],
        "timings": {
            "prepare_vars": 0.004,
            "load_system_prompt": 0.012,
            "load_response_template": 0.003,
            "build_orchestrator": 0.001,
            "orchestrator_call": 1.56,
            "build_response": 0.002,
            "orchestrator": {
                "fetch_total": 0.16,
                "prompt_build": 0.01,
                "llm_generate": 1.38,
                "metadata_extract": 0.01,
                "history_build": 0.00,
                "total": 1.56,
                "fetchers": { "AzureSearchFetcher": { "search": 0.16 } }
            }
        },
        "token_usage": { "prompt": 412, "completion": 120, "total": 532 },
        "reasoning_effort": "auto",
        "reasoning_effective": {
            "requested": "auto",
            "sent_effort": "medium",
            "provider": "openrouter",
            "sanitized": true
        },
            "model_used": { "alias": "default", "provider": "azure", "deployment": "ChatModelLM", "model": "ChatModelLM" }
    }
}

POST /api/v2/flexible-chat

Example success:

{
    "response_id": "f0a9c8d2-...",
    "answer": "Sure, here's a summary...",
    "metadata": [
        { "index": "unified_text_index", "id": "doc-222", "score": 8.7, "text": "..." }
    ],
    "history": [
        { "role": "user", "content": "Hello", "sources": [] },
        { "role": "assistant", "content": "Hi! How can I help?", "sources": [] },
        { "role": "user", "content": "Summarize the policy.", "sources": [] },
        { "role": "assistant", "content": "Sure, here's a summary...", "sources": [ { "index": "unified_text_index", "id": "doc-222" } ] }
    ],
    "ab_testing": null,
    "errors": null,
    "data": null,
    "template_info": { "system_prompt_source": "default", "response_template_source": "default", "template_variables_used": ["question", "user_id", "metadata", "history"] },
    "processing_time": 1.43,
    "tokens_used": 289,
    "response_metadata": {
        "config_overrides_applied": true,
        "custom_templates_used": false,
        "request_metadata": {},
        "conversation_length": 2,
        "is_multi_turn": true,
        "total_sources_retrieved": 1,
        "conversation_sources_summary": { "total_turns": 4, "assistant_turns_with_sources": 1 },
        "question": "Summarize the policy.",
        "system_prompt": "You are a helpful AI assistant...",
        "errors_present": false,
        "token_usage": { "prompt": 230, "completion": 59, "total": 289 },
        "reasoning_effort": "auto",
        "reasoning_effective": { "requested": "auto", "sent_effort": "low", "provider": "azure", "sanitized": true },
            "model_used": { "alias": "default", "provider": "azure", "deployment": "ChatModelLM", "model": "ChatModelLM" }
    }
}

POST /api/v2/flexible-rag-mm

Example success (triggered when the Flexible form’s Multimodal section is enabled and Chat mode is OFF):

{
    "response_id": "3e2b7b50-...",
    "answer": "From the image, it appears to be a circuit board with ...",
    "metadata": [],
    "history": [
        { "role": "user", "content": "Help me understand this image", "sources": [] },
        { "role": "assistant", "content": "From the image, it appears ...", "sources": [] }
    ],
    "ab_testing": null,
    "errors": null,
    "template_info": {
        "system_prompt_source": "inline",
        "response_template_source": "default",
        "template_variables_used": ["question", "user_id", "metadata"]
    },
    "processing_time": 1.95,
    "tokens_used": 245,
    "response_metadata": {
        "custom_templates_used": true,
        "request_metadata": { "images": [ { "detail": "high" } ] },
        "conversation_length": 0,
        "is_multi_turn": false,
        "total_sources_retrieved": 0,
        "conversation_sources_summary": { "total_turns": 2, "assistant_turns_with_sources": 0 },
        "question": "Help me understand this image",
        "errors_present": false,
        "timings": { "build_orchestrator": 0.02, "orchestrator_call": 1.89, "build_response": 0.01 },
        "retrieval": { "attempted": false, "skipped_reason": "kb_disabled", "fetch_args_present": false },
        "token_usage": { "prompt": 180, "completion": 65, "total": 245 },
        "reasoning_effort": "auto",
        "reasoning_effective": { "requested": "auto", "sent_effort": "medium", "provider": "openrouter", "sanitized": true },
            "model_used": { "alias": "or_claude", "provider": "openrouter", "deployment": null, "model": "anthropic/claude-3.5-sonnet" }
    }
}

POST /api/v2/flexible-chat-mm

Example success (triggered when both the Flexible form’s Multimodal section and Chat mode are ON; shape is identical to flexible-rag-mm):

{
    "response_id": "8c9a1ed0-...",
    "answer": "The chart indicates a steady increase over time...",
    "metadata": [],
    "history": [
        { "role": "user", "content": "Explain this chart", "sources": [] },
        { "role": "assistant", "content": "The chart indicates a steady increase...", "sources": [] }
    ],
    "ab_testing": null,
    "errors": null,
    "template_info": { "system_prompt_source": "default", "response_template_source": "default", "template_variables_used": ["question", "user_id", "metadata"] },
    "processing_time": 1.61,
    "tokens_used": 198,
    "response_metadata": {
        "custom_templates_used": false,
        "request_metadata": { "images": [ { "detail": "auto" } ] },
        "conversation_length": 0,
        "is_multi_turn": false,
        "total_sources_retrieved": 0,
        "conversation_sources_summary": { "total_turns": 2, "assistant_turns_with_sources": 0 },
        "question": "Explain this chart",
        "errors_present": false,
        "timings": { "build_orchestrator": 0.01, "orchestrator_call": 1.57, "build_response": 0.01 },
        "retrieval": { "attempted": false, "skipped_reason": "kb_disabled", "fetch_args_present": false },
        "token_usage": { "prompt": 150, "completion": 48, "total": 198 },
        "reasoning_effort": "auto",
        "reasoning_effective": { "requested": "auto", "sent_effort": "low", "provider": "azure", "sanitized": true },
            "model_used": { "alias": "default", "provider": "azure", "deployment": "ChatModelLM", "model": "ChatModelLM" }
    }
}

POST /api/v2/feedback

Example success:

{ "status": "success", "message": "Feedback recorded" }

Example duplicate click:

{ "status": "duplicate", "message": "Duplicate click ignored" }

Model picker and validation flow

  • UI calls: GET /api/v2/models?validate=true&filter_unhealthy=true (optionally &deep=true)
  • Response:
    • default_alias: the server’s configured default alias
    • aliases: the currently usable aliases
    • models: array with { alias, provider|type, deployment|model, supports_multimodal? }
    • validation: optional entries with { alias, ok, reason }
  • The UI hides failing models by default but can reveal them. Selecting a model writes override_config.llm.
  • The “Reasoning Effort” control is disabled for Azure-provider models (annotated in the UI); non-Azure providers may support it. The router will sanitize unsupported values and expose the actual behavior via response_metadata.reasoning_effective.

Provider-specific notes

  • Azure (AOAI):
    • Reasoning effort: not currently supported; the UI disables this control when the selected alias has provider: azure. If a client still sends an effort value, the router sanitizes it and reports the effective value under response_metadata.reasoning_effective.
    • Multimodal: supported when the alias entry sets supports_multimodal: true (e.g., deployment: gpt-4o-mini). Validate with GET /api/v2/models.
    • Auth: prefers Managed Identity when use_managed_identity: true; otherwise falls back to API key if configured.
  • OpenRouter:
    • Reasoning effort: some models support it; others ignore/return errors for invalid values. The router preserves requested vs sent_effort and sets sanitized: true if it had to adjust.
    • Multimodal: check supports_multimodal in GET /api/v2/models. Popular options include anthropic/claude-3.5-sonnet, openai/gpt-4o, and Google Gemini variants.
    • Auth: requires API key per alias (api_key). Ensure the runtime has this key.

Prompts and templates

  • UI calls: GET /api/v2/prompts?json=true to list prompt/template names and defaults. It may call &full=true to preview content.
  • The Flexible endpoints also accept:
    • system_prompt_template or system_prompt_file
    • response_template or response_template_file
    • template_variables — for Jinja2 rendering
  • In the response, template_info describes which prompt/template was applied (and source: default/inline/file/blob).
  • Business impact:
    • Prompt owners can now promote any stored system or response template to be the default directly from the repository table or selected prompt drawer—no more copying content into the default editor. The preview card preserves whitespace so long-form prompts are easier to audit before switching.
  • Developer details:
    • The UI submits POST /api/v2/prompts with action=make_default_prompt, prompt_type (system or response), and prompt_identifier (record id, filename, or blob path). On success the server updates the manifest’s current_id and caches the content so subsequent loads hydrate instantly.
    • Prompt previews render immediately on first selection; clients do not need to re-issue the request if the cache misses.
    • Example: promote a stored response template named assistant_reply.j2 to the default.
curl -X POST "https://yourhost/api/v2/prompts" \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/x-www-form-urlencoded" \
  --data "action=make_default_prompt&prompt_type=response&prompt_identifier=assistant_reply.j2"
import os
import requests

base_url = os.getenv("RAG_API_BASE", "https://yourhost/api/v2")
token = os.getenv("RAG_API_TOKEN")

payload = {
    "action": "make_default_prompt",
    "prompt_type": "system",
    "prompt_identifier": "system_prompts/new_baseline.j2",
}

headers = {"Authorization": f"Bearer {token}"} if token else {}
response = requests.post(f"{base_url}/prompts", data=payload, headers=headers)
response.raise_for_status()
print(response.status_code, response.headers.get("Location"))

Indexes (retrieval controls)

  • UI calls: GET /api/v2/indexes
  • Selected indexes and retrieval settings are passed in fetch_args.AzureSearchFetcher (e.g., query, top_k, vector_search, etc.).
  • When retrieval is disabled, the UI sets skip_knowledge_base: true.

LLM Parameters side panel (auto-save)

  • The Unified Test page includes a side panel for LLM defaults (temperature, top_p, max_tokens, penalties, stop, etc.).
  • Edits are auto-saved to POST /api/v2/settings after ~800ms of inactivity; a manual “Save Parameters” button remains available.
  • If the page URL has subscription-key, it’s forwarded to /api/v2/settings automatically.
  • New defaults apply to subsequent requests without a page reload.

Feedback flow

  • The backend automatically saves a neutral telemetry record (rating=0) for every response returned by Flexible endpoints.
  • Users can submit explicit feedback via: POST /api/v2/feedback

Example payload the UI uses:

{
    "user_id": "demo_user",
    "session_id": "demo_session",
    "experiment_name": null,
    "variant_name": null,
    "response_id": "<the-response-id>",
    "rating": 1, // or -1
    "feedback_text": "",
    "response_time": 2.37,
    "task_completed": true,
    "response_payload": {
        "question": "...",
        "answer": "...",
        "metadata": [ /* retrieved docs */ ]
    }
}

Correlate with the response_id returned by the answer APIs.

Putting it together from your backend

Below are minimal examples showing how to call the same endpoints the UI invokes.

Python (requests)

import os
import json
import requests

BASE_URL = os.getenv("RAG_API_BASE", "http://localhost:8000/api/v2")

# Helper: override model + reasoning effort
OVERRIDES = {
        "llm": "default",            # set to an alias from /api/v2/models
        "reasoning_effort": "auto",
        "reasoning": {"effort": "auto"}
}

# Flexible RAG (text-only)
def flexible_rag(question: str, use_kb: bool = True):
        payload = {
                "question": question,
                "skip_knowledge_base": not use_kb,
                "fetch_args": {"AzureSearchFetcher": {"query": question, "top_k": 5, "vector_search": True}} if use_kb else {},
                "history": [],
                "template_variables": {"domain": "general knowledge", "response_style": "formal"},
                "metadata": {"user_id": "demo_user"},
                "override_config": OVERRIDES,
        }
        r = requests.post(f"{BASE_URL}/flexible-rag", json=payload)
        r.raise_for_status()
        return r.json()

# Flexible Chat (multi-turn)
def flexible_chat(question: str, history=None):
        history = history or []  # list of {role, content}
        payload = {
                "question": question,
                "history": history,
                "fetch_args": {},
                "template_variables": {},
                "metadata": {},
                "override_config": OVERRIDES,
        }
        r = requests.post(f"{BASE_URL}/flexible-chat", json=payload)
        r.raise_for_status()
        return r.json()

# Flexible RAG (multimodal)
def flexible_rag_mm(question: str, images=None):
        images = images or []  # [{"url":"https://..."}] or [{"data_url":"data:image/...","detail":"high"}]
        payload = {
                "question": question,
                "history": [],
                "fetch_args": {"AzureSearchFetcher": {"query": question, "top_k": 5, "vector_search": True}},
                "template_variables": {},
                "metadata": {"images": images},
                "override_config": OVERRIDES,
        }
        r = requests.post(f"{BASE_URL}/flexible-rag-mm", json=payload)
        r.raise_for_status()
        return r.json()

# Feedback
def send_feedback(response_id: str, answer_payload: dict, rating: int):
        body = {
                "user_id": "demo_user",
                "session_id": "demo_session",
                "experiment_name": answer_payload.get("ab_testing", {}).get("experiment_name"),
                "variant_name": answer_payload.get("ab_testing", {}).get("variant_name"),
                "response_id": response_id,
                "rating": rating,
                "feedback_text": "",
                "response_time": answer_payload.get("processing_time"),
                "task_completed": True,
                "response_payload": {
                        "question": answer_payload.get("response_metadata", {}).get("question"),
                        "answer": answer_payload.get("answer"),
                        "metadata": answer_payload.get("metadata", []),
                },
        }
        r = requests.post(f"{BASE_URL}/feedback", json=body)
        r.raise_for_status()
        return r.json()

if __name__ == "__main__":
        ans = flexible_rag("What is the support policy for X?")
        print(json.dumps(ans, indent=2))
        if ans.get("response_id"):
                fb = send_feedback(ans["response_id"], ans, 1)
                print("feedback:", fb)

Node.js (fetch)

// Requires Node 18+ (built-in fetch). For older versions, use node-fetch or axios.
const BASE_URL = process.env.RAG_API_BASE || 'http://localhost:8000/api/v2';

const OVERRIDES = {
    llm: 'default',
    reasoning_effort: 'auto',
    reasoning: { effort: 'auto' }
};

async function flexibleRag(question, useKB = true){
    const payload = {
        question,
        skip_knowledge_base: !useKB,
        fetch_args: useKB ? { AzureSearchFetcher: { query: question, top_k: 5, vector_search: true } } : {},
        history: [],
        template_variables: { domain: 'general knowledge', response_style: 'formal' },
        metadata: { user_id: 'demo_user' },
        override_config: OVERRIDES,
    };
    const r = await fetch(`${BASE_URL}/flexible-rag`, { method:'POST', headers:{'Content-Type':'application/json'}, body: JSON.stringify(payload) });
    if(!r.ok) throw new Error(`HTTP ${r.status}`);
    return r.json();
}

async function flexibleChat(question, history = []){
    const payload = { question, history, fetch_args: {}, template_variables: {}, metadata: {}, override_config: OVERRIDES };
    const r = await fetch(`${BASE_URL}/flexible-chat`, { method:'POST', headers:{'Content-Type':'application/json'}, body: JSON.stringify(payload) });
    if(!r.ok) throw new Error(`HTTP ${r.status}`);
    return r.json();
}

async function flexibleRagMM(question, images = []){
    const payload = { question, history: [], fetch_args: { AzureSearchFetcher: { query: question, top_k: 5, vector_search: true } }, template_variables: {}, metadata: { images }, override_config: OVERRIDES };
    const r = await fetch(`${BASE_URL}/flexible-rag-mm`, { method:'POST', headers:{'Content-Type':'application/json'}, body: JSON.stringify(payload) });
    if(!r.ok) throw new Error(`HTTP ${r.status}`);
    return r.json();
}

async function sendFeedback(responseId, answerPayload, rating){
    const body = {
        user_id: 'demo_user',
        session_id: 'demo_session',
        experiment_name: answerPayload?.ab_testing?.experiment_name || null,
        variant_name: answerPayload?.ab_testing?.variant_name || null,
        response_id: responseId,
        rating,
        feedback_text: '',
        response_time: answerPayload?.processing_time || null,
        task_completed: true,
        response_payload: {
            question: answerPayload?.response_metadata?.question || null,
            answer: answerPayload?.answer || null,
            metadata: answerPayload?.metadata || []
        }
    };
    const r = await fetch(`${BASE_URL}/feedback`, { method:'POST', headers:{'Content-Type':'application/json'}, body: JSON.stringify(body) });
    if(!r.ok) throw new Error(`HTTP ${r.status}`);
    return r.json();
}

(async () => {
    const ans = await flexibleRag('What is the support policy for X?');
    console.log(JSON.stringify(ans, null, 2));
    if(ans.response_id){
        const fb = await sendFeedback(ans.response_id, ans, 1);
        console.log('feedback:', fb);
    }
})();

UI-to-API mapping (quick reference)

  • Model selector → GET /api/v2/models → writes override_config.llm in request
  • Reasoning effort pill → writes override_config.reasoning_effort and override_config.reasoning.effort
  • Knowledge Base toggle → controls skip_knowledge_base and fetch_args.AzureSearchFetcher
  • Index picker → builds fetch_args.AzureSearchFetcher (indexes, top_k, vector_search, etc.)
  • Multimodal section + images → adds metadata.images and switches to *-mm endpoints
  • Prompts section → fetches /api/v2/prompts and optionally sets system_prompt_* / response_template* in request
  • Submit → POST to one of (automatically chosen):
    • /api/v2/flexible-rag
    • /api/v2/flexible-chat
    • /api/v2/flexible-rag-mm
    • /api/v2/flexible-chat-mm
  • Multi-completions (N):
    • Set override_config.n to request multiple completions; alternatively override_config.params.n is supported.
    • Precedence: override_config.n > override_config.params.n > server default (config.llms.defaults.params.n).
    • When answers[] is present, the UI renders each completion and sets answer === answers[0].
  • Feedback buttons → POST /api/v2/feedback with response_id

What to log/store downstream (best practices)

  • response_id — use as a stable key for storage, analytics, and feedback
  • processing_time — latency SLOs and regressions
  • response_metadata.token_usage — for cost tracking and capacity planning
  • response_metadata.model_used — which model/deployment actually produced the answer
  • response_metadata.reasoning_effective — requested vs sent, provider, and sanitize status
  • metadata (retrieval docs) — optional, but useful for auditability
  • errors — structured error list, if present
  • Chat-only: timings may be incomplete; multimodal aligns its timing blocks with RAG

Error handling notes

  • All endpoints can return errors in the response body (an array of structured diagnostics). The UI merges these into the Answer card for visibility.
  • When KB retrieval fails, endpoints attempt to continue with an answer using whatever context is available (best-effort).
  • Reasoning ‘auto’ is accepted on input, but downstream providers may not support it; see response_metadata.reasoning_effective to understand what was actually sent.

Next steps

  • For more details, explore:
    • rag_api_core/endpoints/v2/flexible_rag.py
    • rag_api_core/endpoints/v2/flexible_multimodal.py
    • rag_api_core/templates/v2/unified_rag_test.html
    • rag_api_core/schemas/v2/requests.py, responses.py
    • Feedback plumbing under telemetry/feedback/