Unified Test UI — End-to-End Example Usage and API Mapping
This guide walks through the Unified Test UI (the page served by unified_rag_test.html) and shows how each control maps to API calls. Use this as a blueprint to replicate the same functionality from your own backend or tooling.
The UI focuses on: - Model selection and reasoning effort overrides per request - Flexible RAG and Chat flows (with optional knowledge base retrieval) - Multimodal variant (images + text) - Prompt templating and variable injection - Token usage and reasoning metadata in responses - Feedback collection with response_id correlation
Where the UI is mounted
- HTML template:
rag_api_core/templates/v2/unified_rag_test.html - Route that serves it:
rag_api_core/endpoints/v2/management.py→create_unified_rag_test_endpoint()returns the page.
Key backend endpoints used by the UI
-
Model catalog (for the model picker)
- GET
/api/v2/models?validate=true&filter_unhealthy=true[&deep=true] - Returns default alias, alias list, model details, and optional validation results.
- GET
-
Prompts (for viewing defaults and templates)
- GET
/api/v2/prompts?json=true - GET
/api/v2/prompts?json=true&full=true(on-demand full content for previews)
- GET
-
Indexes (for retrieval controls)
- GET
/api/v2/indexes
- GET
-
Flexible APIs (core actions)
- POST
/api/v2/flexible-rag(single-turn, with optional KB retrieval) - POST
/api/v2/flexible-chat(multi-turn chat) - POST
/api/v2/flexible-rag-mm(multimodal RAG: images + text) - POST
/api/v2/flexible-chat-mm(multimodal chat)
- POST
-
Feedback
- POST
/api/v2/feedback
- POST
The UI uses a single Flexible form and switches the target endpoint at submit time based on: - Chat toggle (OFF → RAG; ON → Chat) - Multimodal section (OFF → text only; ON → image + text) There is no separate "Multimodal" tab anymore. Multimodal is an optional section within the Flexible form where you can drag/drop images or paste URLs; when enabled, the UI calls the -mm endpoint variant automatically.
Instant load bootstrap (/unified-test?json=true)
Business impact
The Unified Test page now renders immediately with lightweight skeleton states and then hydrates using a JSON bootstrap. This removes the long blank screen while defaults load and keeps the UI responsive even when prompt storage is slow.
Developer details
- Endpoint:
GET /api/v2/unified-test?json=true - Payload changes: Response now includes
active_model,llm_params,configured_defaults.system_prompts, and the hydrated system prompt text (current_system_prompt). - Usage: Fetch the JSON payload after the HTML shell loads to pre-fill form controls, mirrors the first-party UI behaviour.
- Migration: No breaking change—existing HTML flow still works. Client apps can optionally call the JSON endpoint to boot quickly or re-use defaults in custom dashboards.
Examples
curl -X GET "https://yourhost/api/v2/unified-test?json=true" \
-H "Authorization: Bearer $TOKEN"
import requests
resp = requests.get(
"https://yourhost/api/v2/unified-test",
params={"json": "true"},
headers={"Authorization": f"Bearer {TOKEN}"},
)
resp.raise_for_status()
payload = resp.json()
print(payload["active_model"], payload["configured_defaults"]["system_prompts"], payload["llm_params"])
The JSON structure is cached server-side, so repeated loads avoid re-fetching prompts or blob client handles. Clients should still handle occasional cache misses (cache_status: "miss") and fall back to the full payload.
Request payload shapes (what the UI sends)
All Flexible requests share the same shape, with some optional fields.
- Flexible RAG (text-only):
{
"question": "What can you tell me about yourself?",
"skip_knowledge_base": false,
"fetch_args": {
"AzureSearchFetcher": {
"query": "What can you tell me about yourself?",
"top_k": 5,
"vector_search": true
// ... other fetcher options may be included
}
},
"history": [ { "role": "user", "content": "..." } ],
"template_variables": { "domain": "general knowledge", "response_style": "formal" },
"metadata": { "user_id": "demo_user" },
"override_config": {
"llm": "alias-selected-in-ui",
"reasoning_effort": "auto",
"reasoning": { "effort": "auto" }
}
}
-
Flexible Chat (text-only) uses the same payload fields; history is read/written as you turn on Chat Mode.
-
Flexible RAG (multimodal) adds images under
metadata.images:
{
"question": "Help me understand this image",
"fetch_args": { /* optional KB fetch args */ },
"history": [],
"template_variables": {},
"metadata": {
"images": [
{ "url": "https://.../img1.jpg", "detail": "auto" },
{ "data_url": "...", "detail": "high" }
]
},
"override_config": { "llm": "alias-selected-in-ui" }
}
The UI also allows image drag/drop to create data_url entries.
Notes:
- Model override is set via override_config.llm.
- Reasoning effort selector writes both override_config.reasoning_effort and override_config.reasoning.effort for compatibility. The router sanitizes unsupported values and preserves what was requested vs what was sent in response_metadata.
- When Knowledge Base is disabled, the UI sends skip_knowledge_base: true and omits fetch_args.
Schemas to reference:
- Requests: rag_api_core/schemas/v2/requests.py (FlexibleRagRequest, FlexibleChatRequest)
Parameter reflection (optional)
Some providers and router paths return a reflection of the effective parameters used to generate the response under response_metadata.parameters.
- Business impact: Helps confirm what values were actually applied when defaults, overrides, or provider sanitization are involved.
- Developer details: When present, expect fields like
temperature,top_p,max_tokens,frequency_penalty,presence_penalty,repetition_penalty, andstop. Treat this as optional; not all providers populate it.
Examples
curl -X POST "https://yourhost/api/v2/flexible-rag" \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{
"question": "Param reflection?",
"override_config": {
"llm": "default",
"n": 2,
"params": {"temperature": 0.2, "top_p": 0.9}
},
"fetch_args": {"AzureSearchFetcher": {"top_k": 0}}
}'
import requests
payload = {
"question": "Param reflection?",
"override_config": {
"llm": "default",
"n": 2,
"params": {"temperature": 0.2, "top_p": 0.9}
},
"fetch_args": {"AzureSearchFetcher": {"top_k": 0}}
}
r = requests.post("https://yourhost/api/v2/flexible-rag", json=payload, headers={"Authorization": f"Bearer {TOKEN}"})
data = r.json()
print(data.get("answers"), data.get("response_metadata", {}).get("parameters"))
Response shape (what the UI expects)
Flexible responses include rich metadata and IDs used by the feedback flow.
Common fields:
- question — echoed question
- answer — primary answer; if multiple completions were generated, this equals the first item of answers[]
- answers[] — optional list of alternative completions when n > 1
- token_usage — with input/prompt and output/completion breakdown and total
- reasoning_effort — what the client asked for
- reasoning_tokens — reasoning token usage (if any, v2.1.1+)
- reasoning_effective — object with requested, sent_effort, provider, and sanitized flags
- model_used — final provider/model/deployment info (when available)
Schemas to reference:
Response examples by endpoint
The UI relies on these endpoints. Below are representative response payloads you can expect and emulate.
GET /api/v2/models?validate=true&filter_unhealthy=true[&deep=true]
Example:
{
"default_alias": "default",
"aliases": ["default", "oai_gpt4o", "or_claude", "or_gemini", "or_grok", "or_llama", "or_gpt4o", "or_claude_opus", "or_gemini_pro", "or_gpt5"],
"models": [
{
"alias": "default",
"provider": "azure",
"deployment": "ChatModelLM",
"api_base_url": "https://aichat-test-openai-msba.cognitiveservices.azure.com/",
"api_version": "2024-12-01-preview",
"auth": "managed-identity",
"supports_multimodal": true
},
{ "alias": "oai_gpt4o", "provider": "openai", "model": "gpt-4o-mini", "supports_multimodal": true },
{ "alias": "or_claude", "provider": "openrouter", "model": "anthropic/claude-3.5-sonnet", "supports_multimodal": true },
{ "alias": "or_gemini", "provider": "openrouter", "model": "google/gemini-2.5-flash", "supports_multimodal": true },
{ "alias": "or_grok", "provider": "openrouter", "model": "x-ai/grok-4", "supports_multimodal": true },
{ "alias": "or_llama", "provider": "openrouter", "model": "meta-llama/llama-3.1-8b-instruct", "supports_multimodal": false },
{ "alias": "or_gpt4o", "provider": "openrouter", "model": "openai/gpt-4o", "supports_multimodal": true },
{ "alias": "or_claude_opus", "provider": "openrouter", "model": "anthropic/claude-3-opus", "supports_multimodal": true },
{ "alias": "or_gemini_pro", "provider": "openrouter", "model": "google/gemini-1.5-pro", "supports_multimodal": true },
{ "alias": "or_gpt5", "provider": "openrouter", "model": "openai/gpt-5", "supports_multimodal": true }
],
"validation": [
{ "alias": "default", "ok": true, "reason": null }
]
}
Notes: - When validate=false, the "validation" array is omitted. - Fields under "models" vary by provider (deployment vs model name, base URLs, etc.).
GET /api/v2/prompts?json=true
Example (listing prompts and effective defaults):
{
"prompts": {
"system_prompts": ["assistant_default.j2"],
"response_templates": ["plain_answer.j2", "markdown_answer.j2"],
"experiments": []
},
"source": "filesystem",
"storage": { "folder": "resources/prompts" },
"current_system_prompt": "You are a helpful AI assistant...",
"selected_prompt_content": "",
"selected_prompt_info": null,
"configured_defaults": {
"system_prompts": ["assistant_default.j2"],
"response_templates": ["plain_answer.j2"]
},
"effective": {
"system_prompt": "You are a helpful AI assistant...",
"response_template": "Based on the following information, please answer the question..."
},
"versions": {
"system": { "assistant_default.j2": ["2025-09-12T18:02:00Z"] },
"response": {}
}
}
Selected content preview (when a specific prompt is requested):
GET /api/v2/prompts?json=true&prompt=system_prompts/assistant_default.j2
{
"selected_prompt_info": { "path": "system_prompts/assistant_default.j2", "source": "filesystem" },
"selected_prompt_content": "You are a helpful AI assistant...",
"prompts": { "system_prompts": ["assistant_default.j2"], "response_templates": ["plain_answer.j2"] },
"source": "filesystem",
"storage": { "folder": "resources/prompts" },
"configured_defaults": { "system_prompts": ["assistant_default.j2"], "response_templates": ["plain_answer.j2"] },
"effective": { "system_prompt": "...", "response_template": "..." },
"versions": { "system": {"assistant_default.j2": ["2025-09-12T18:02:00Z"]}, "response": {} }
}
Tip: The UI sometimes uses &full=true for previews; the response structure stays the same.
GET /api/v2/indexes
Example:
{
"indexes": ["unified_text_index", "kb_faq_index"],
"default": "unified_text_index",
"map": {
"unified_text_index": { "name": "unified_text_index-prod" },
"kb_faq_index": { "name": "kb_faq_index" }
}
}
POST /api/v2/flexible-rag
Example success:
{
"response_id": "7a7d3f3c-f2e0-4a1d-9c9f-7a4a1b0a3f9d",
"answer": "The support policy for X includes LTS releases with quarterly updates...",
"answers": [
"The support policy for X includes LTS releases with quarterly updates...",
"X is supported with quarterly updates and a long-term support cadence..."
],
"metadata": [
{ "index": "unified_text_index", "id": "doc-123", "score": 14.2, "text": "...excerpt..." },
{ "index": "unified_text_index", "id": "doc-987", "score": 12.8, "text": "...excerpt..." }
],
"history": [],
"ab_testing": null,
"errors": null,
"data": null,
"template_info": {
"system_prompt_source": "file:system_prompts/assistant_default.j2",
"response_template_source": "file:response_templates/plain_answer.j2",
"template_variables_used": ["question", "user_id", "metadata"]
},
"processing_time": 1.72,
"tokens_used": 532,
"response_metadata": {
"config_overrides_applied": true,
"custom_templates_used": false,
"request_metadata": { "user_id": "demo_user" },
"question": "What is the support policy for X?",
"system_prompt": "You are a helpful AI assistant...",
"errors_present": false,
"queried_indexes": ["unified_text_index-prod"],
"timings": {
"prepare_vars": 0.004,
"load_system_prompt": 0.012,
"load_response_template": 0.003,
"build_orchestrator": 0.001,
"orchestrator_call": 1.56,
"build_response": 0.002,
"orchestrator": {
"fetch_total": 0.16,
"prompt_build": 0.01,
"llm_generate": 1.38,
"metadata_extract": 0.01,
"history_build": 0.00,
"total": 1.56,
"fetchers": { "AzureSearchFetcher": { "search": 0.16 } }
}
},
"token_usage": { "prompt": 412, "completion": 120, "total": 532 },
"reasoning_effort": "auto",
"reasoning_effective": {
"requested": "auto",
"sent_effort": "medium",
"provider": "openrouter",
"sanitized": true
},
"model_used": { "alias": "default", "provider": "azure", "deployment": "ChatModelLM", "model": "ChatModelLM" }
}
}
POST /api/v2/flexible-chat
Example success:
{
"response_id": "f0a9c8d2-...",
"answer": "Sure, here's a summary...",
"metadata": [
{ "index": "unified_text_index", "id": "doc-222", "score": 8.7, "text": "..." }
],
"history": [
{ "role": "user", "content": "Hello", "sources": [] },
{ "role": "assistant", "content": "Hi! How can I help?", "sources": [] },
{ "role": "user", "content": "Summarize the policy.", "sources": [] },
{ "role": "assistant", "content": "Sure, here's a summary...", "sources": [ { "index": "unified_text_index", "id": "doc-222" } ] }
],
"ab_testing": null,
"errors": null,
"data": null,
"template_info": { "system_prompt_source": "default", "response_template_source": "default", "template_variables_used": ["question", "user_id", "metadata", "history"] },
"processing_time": 1.43,
"tokens_used": 289,
"response_metadata": {
"config_overrides_applied": true,
"custom_templates_used": false,
"request_metadata": {},
"conversation_length": 2,
"is_multi_turn": true,
"total_sources_retrieved": 1,
"conversation_sources_summary": { "total_turns": 4, "assistant_turns_with_sources": 1 },
"question": "Summarize the policy.",
"system_prompt": "You are a helpful AI assistant...",
"errors_present": false,
"token_usage": { "prompt": 230, "completion": 59, "total": 289 },
"reasoning_effort": "auto",
"reasoning_effective": { "requested": "auto", "sent_effort": "low", "provider": "azure", "sanitized": true },
"model_used": { "alias": "default", "provider": "azure", "deployment": "ChatModelLM", "model": "ChatModelLM" }
}
}
POST /api/v2/flexible-rag-mm
Example success (triggered when the Flexible form’s Multimodal section is enabled and Chat mode is OFF):
{
"response_id": "3e2b7b50-...",
"answer": "From the image, it appears to be a circuit board with ...",
"metadata": [],
"history": [
{ "role": "user", "content": "Help me understand this image", "sources": [] },
{ "role": "assistant", "content": "From the image, it appears ...", "sources": [] }
],
"ab_testing": null,
"errors": null,
"template_info": {
"system_prompt_source": "inline",
"response_template_source": "default",
"template_variables_used": ["question", "user_id", "metadata"]
},
"processing_time": 1.95,
"tokens_used": 245,
"response_metadata": {
"custom_templates_used": true,
"request_metadata": { "images": [ { "detail": "high" } ] },
"conversation_length": 0,
"is_multi_turn": false,
"total_sources_retrieved": 0,
"conversation_sources_summary": { "total_turns": 2, "assistant_turns_with_sources": 0 },
"question": "Help me understand this image",
"errors_present": false,
"timings": { "build_orchestrator": 0.02, "orchestrator_call": 1.89, "build_response": 0.01 },
"retrieval": { "attempted": false, "skipped_reason": "kb_disabled", "fetch_args_present": false },
"token_usage": { "prompt": 180, "completion": 65, "total": 245 },
"reasoning_effort": "auto",
"reasoning_effective": { "requested": "auto", "sent_effort": "medium", "provider": "openrouter", "sanitized": true },
"model_used": { "alias": "or_claude", "provider": "openrouter", "deployment": null, "model": "anthropic/claude-3.5-sonnet" }
}
}
POST /api/v2/flexible-chat-mm
Example success (triggered when both the Flexible form’s Multimodal section and Chat mode are ON; shape is identical to flexible-rag-mm):
{
"response_id": "8c9a1ed0-...",
"answer": "The chart indicates a steady increase over time...",
"metadata": [],
"history": [
{ "role": "user", "content": "Explain this chart", "sources": [] },
{ "role": "assistant", "content": "The chart indicates a steady increase...", "sources": [] }
],
"ab_testing": null,
"errors": null,
"template_info": { "system_prompt_source": "default", "response_template_source": "default", "template_variables_used": ["question", "user_id", "metadata"] },
"processing_time": 1.61,
"tokens_used": 198,
"response_metadata": {
"custom_templates_used": false,
"request_metadata": { "images": [ { "detail": "auto" } ] },
"conversation_length": 0,
"is_multi_turn": false,
"total_sources_retrieved": 0,
"conversation_sources_summary": { "total_turns": 2, "assistant_turns_with_sources": 0 },
"question": "Explain this chart",
"errors_present": false,
"timings": { "build_orchestrator": 0.01, "orchestrator_call": 1.57, "build_response": 0.01 },
"retrieval": { "attempted": false, "skipped_reason": "kb_disabled", "fetch_args_present": false },
"token_usage": { "prompt": 150, "completion": 48, "total": 198 },
"reasoning_effort": "auto",
"reasoning_effective": { "requested": "auto", "sent_effort": "low", "provider": "azure", "sanitized": true },
"model_used": { "alias": "default", "provider": "azure", "deployment": "ChatModelLM", "model": "ChatModelLM" }
}
}
POST /api/v2/feedback
Example success:
{ "status": "success", "message": "Feedback recorded" }
Example duplicate click:
{ "status": "duplicate", "message": "Duplicate click ignored" }
Model picker and validation flow
- UI calls:
GET /api/v2/models?validate=true&filter_unhealthy=true(optionally&deep=true) - Response:
default_alias: the server’s configured default aliasaliases: the currently usable aliasesmodels: array with{ alias, provider|type, deployment|model, supports_multimodal? }validation: optional entries with{ alias, ok, reason }
- The UI hides failing models by default but can reveal them. Selecting a model writes
override_config.llm. - The “Reasoning Effort” control is disabled for Azure-provider models (annotated in the UI); non-Azure providers may support it. The router will sanitize unsupported values and expose the actual behavior via
response_metadata.reasoning_effective.
Provider-specific notes
- Azure (AOAI):
- Reasoning effort: not currently supported; the UI disables this control when the selected alias has
provider: azure. If a client still sends an effort value, the router sanitizes it and reports the effective value underresponse_metadata.reasoning_effective. - Multimodal: supported when the alias entry sets
supports_multimodal: true(e.g.,deployment: gpt-4o-mini). Validate withGET /api/v2/models. - Auth: prefers Managed Identity when
use_managed_identity: true; otherwise falls back to API key if configured.
- Reasoning effort: not currently supported; the UI disables this control when the selected alias has
- OpenRouter:
- Reasoning effort: some models support it; others ignore/return errors for invalid values. The router preserves
requestedvssent_effortand setssanitized: trueif it had to adjust. - Multimodal: check
supports_multimodalinGET /api/v2/models. Popular options includeanthropic/claude-3.5-sonnet,openai/gpt-4o, and Google Gemini variants. - Auth: requires API key per alias (
api_key). Ensure the runtime has this key.
- Reasoning effort: some models support it; others ignore/return errors for invalid values. The router preserves
Prompts and templates
- UI calls:
GET /api/v2/prompts?json=trueto list prompt/template names and defaults. It may call&full=trueto preview content. - The Flexible endpoints also accept:
system_prompt_templateorsystem_prompt_fileresponse_templateorresponse_template_filetemplate_variables— for Jinja2 rendering
- In the response,
template_infodescribes which prompt/template was applied (and source: default/inline/file/blob). - Business impact:
- Prompt owners can now promote any stored system or response template to be the default directly from the repository table or selected prompt drawer—no more copying content into the default editor. The preview card preserves whitespace so long-form prompts are easier to audit before switching.
- Developer details:
- The UI submits
POST /api/v2/promptswithaction=make_default_prompt,prompt_type(systemorresponse), andprompt_identifier(record id, filename, or blob path). On success the server updates the manifest’scurrent_idand caches the content so subsequent loads hydrate instantly. - Prompt previews render immediately on first selection; clients do not need to re-issue the request if the cache misses.
- Example: promote a stored response template named
assistant_reply.j2to the default.
- The UI submits
curl -X POST "https://yourhost/api/v2/prompts" \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/x-www-form-urlencoded" \
--data "action=make_default_prompt&prompt_type=response&prompt_identifier=assistant_reply.j2"
import os
import requests
base_url = os.getenv("RAG_API_BASE", "https://yourhost/api/v2")
token = os.getenv("RAG_API_TOKEN")
payload = {
"action": "make_default_prompt",
"prompt_type": "system",
"prompt_identifier": "system_prompts/new_baseline.j2",
}
headers = {"Authorization": f"Bearer {token}"} if token else {}
response = requests.post(f"{base_url}/prompts", data=payload, headers=headers)
response.raise_for_status()
print(response.status_code, response.headers.get("Location"))
Indexes (retrieval controls)
- UI calls:
GET /api/v2/indexes - Selected indexes and retrieval settings are passed in
fetch_args.AzureSearchFetcher(e.g.,query,top_k,vector_search, etc.). - When retrieval is disabled, the UI sets
skip_knowledge_base: true.
LLM Parameters side panel (auto-save)
- The Unified Test page includes a side panel for LLM defaults (temperature, top_p, max_tokens, penalties, stop, etc.).
- Edits are auto-saved to
POST /api/v2/settingsafter ~800ms of inactivity; a manual “Save Parameters” button remains available. - If the page URL has
subscription-key, it’s forwarded to/api/v2/settingsautomatically. - New defaults apply to subsequent requests without a page reload.
Feedback flow
- The backend automatically saves a neutral telemetry record (rating=0) for every response returned by Flexible endpoints.
- Users can submit explicit feedback via:
POST /api/v2/feedback
Example payload the UI uses:
{
"user_id": "demo_user",
"session_id": "demo_session",
"experiment_name": null,
"variant_name": null,
"response_id": "<the-response-id>",
"rating": 1, // or -1
"feedback_text": "",
"response_time": 2.37,
"task_completed": true,
"response_payload": {
"question": "...",
"answer": "...",
"metadata": [ /* retrieved docs */ ]
}
}
Correlate with the response_id returned by the answer APIs.
Putting it together from your backend
Below are minimal examples showing how to call the same endpoints the UI invokes.
Python (requests)
import os
import json
import requests
BASE_URL = os.getenv("RAG_API_BASE", "http://localhost:8000/api/v2")
# Helper: override model + reasoning effort
OVERRIDES = {
"llm": "default", # set to an alias from /api/v2/models
"reasoning_effort": "auto",
"reasoning": {"effort": "auto"}
}
# Flexible RAG (text-only)
def flexible_rag(question: str, use_kb: bool = True):
payload = {
"question": question,
"skip_knowledge_base": not use_kb,
"fetch_args": {"AzureSearchFetcher": {"query": question, "top_k": 5, "vector_search": True}} if use_kb else {},
"history": [],
"template_variables": {"domain": "general knowledge", "response_style": "formal"},
"metadata": {"user_id": "demo_user"},
"override_config": OVERRIDES,
}
r = requests.post(f"{BASE_URL}/flexible-rag", json=payload)
r.raise_for_status()
return r.json()
# Flexible Chat (multi-turn)
def flexible_chat(question: str, history=None):
history = history or [] # list of {role, content}
payload = {
"question": question,
"history": history,
"fetch_args": {},
"template_variables": {},
"metadata": {},
"override_config": OVERRIDES,
}
r = requests.post(f"{BASE_URL}/flexible-chat", json=payload)
r.raise_for_status()
return r.json()
# Flexible RAG (multimodal)
def flexible_rag_mm(question: str, images=None):
images = images or [] # [{"url":"https://..."}] or [{"data_url":"data:image/...","detail":"high"}]
payload = {
"question": question,
"history": [],
"fetch_args": {"AzureSearchFetcher": {"query": question, "top_k": 5, "vector_search": True}},
"template_variables": {},
"metadata": {"images": images},
"override_config": OVERRIDES,
}
r = requests.post(f"{BASE_URL}/flexible-rag-mm", json=payload)
r.raise_for_status()
return r.json()
# Feedback
def send_feedback(response_id: str, answer_payload: dict, rating: int):
body = {
"user_id": "demo_user",
"session_id": "demo_session",
"experiment_name": answer_payload.get("ab_testing", {}).get("experiment_name"),
"variant_name": answer_payload.get("ab_testing", {}).get("variant_name"),
"response_id": response_id,
"rating": rating,
"feedback_text": "",
"response_time": answer_payload.get("processing_time"),
"task_completed": True,
"response_payload": {
"question": answer_payload.get("response_metadata", {}).get("question"),
"answer": answer_payload.get("answer"),
"metadata": answer_payload.get("metadata", []),
},
}
r = requests.post(f"{BASE_URL}/feedback", json=body)
r.raise_for_status()
return r.json()
if __name__ == "__main__":
ans = flexible_rag("What is the support policy for X?")
print(json.dumps(ans, indent=2))
if ans.get("response_id"):
fb = send_feedback(ans["response_id"], ans, 1)
print("feedback:", fb)
Node.js (fetch)
// Requires Node 18+ (built-in fetch). For older versions, use node-fetch or axios.
const BASE_URL = process.env.RAG_API_BASE || 'http://localhost:8000/api/v2';
const OVERRIDES = {
llm: 'default',
reasoning_effort: 'auto',
reasoning: { effort: 'auto' }
};
async function flexibleRag(question, useKB = true){
const payload = {
question,
skip_knowledge_base: !useKB,
fetch_args: useKB ? { AzureSearchFetcher: { query: question, top_k: 5, vector_search: true } } : {},
history: [],
template_variables: { domain: 'general knowledge', response_style: 'formal' },
metadata: { user_id: 'demo_user' },
override_config: OVERRIDES,
};
const r = await fetch(`${BASE_URL}/flexible-rag`, { method:'POST', headers:{'Content-Type':'application/json'}, body: JSON.stringify(payload) });
if(!r.ok) throw new Error(`HTTP ${r.status}`);
return r.json();
}
async function flexibleChat(question, history = []){
const payload = { question, history, fetch_args: {}, template_variables: {}, metadata: {}, override_config: OVERRIDES };
const r = await fetch(`${BASE_URL}/flexible-chat`, { method:'POST', headers:{'Content-Type':'application/json'}, body: JSON.stringify(payload) });
if(!r.ok) throw new Error(`HTTP ${r.status}`);
return r.json();
}
async function flexibleRagMM(question, images = []){
const payload = { question, history: [], fetch_args: { AzureSearchFetcher: { query: question, top_k: 5, vector_search: true } }, template_variables: {}, metadata: { images }, override_config: OVERRIDES };
const r = await fetch(`${BASE_URL}/flexible-rag-mm`, { method:'POST', headers:{'Content-Type':'application/json'}, body: JSON.stringify(payload) });
if(!r.ok) throw new Error(`HTTP ${r.status}`);
return r.json();
}
async function sendFeedback(responseId, answerPayload, rating){
const body = {
user_id: 'demo_user',
session_id: 'demo_session',
experiment_name: answerPayload?.ab_testing?.experiment_name || null,
variant_name: answerPayload?.ab_testing?.variant_name || null,
response_id: responseId,
rating,
feedback_text: '',
response_time: answerPayload?.processing_time || null,
task_completed: true,
response_payload: {
question: answerPayload?.response_metadata?.question || null,
answer: answerPayload?.answer || null,
metadata: answerPayload?.metadata || []
}
};
const r = await fetch(`${BASE_URL}/feedback`, { method:'POST', headers:{'Content-Type':'application/json'}, body: JSON.stringify(body) });
if(!r.ok) throw new Error(`HTTP ${r.status}`);
return r.json();
}
(async () => {
const ans = await flexibleRag('What is the support policy for X?');
console.log(JSON.stringify(ans, null, 2));
if(ans.response_id){
const fb = await sendFeedback(ans.response_id, ans, 1);
console.log('feedback:', fb);
}
})();
UI-to-API mapping (quick reference)
- Model selector → GET
/api/v2/models→ writesoverride_config.llmin request - Reasoning effort pill → writes
override_config.reasoning_effortandoverride_config.reasoning.effort - Knowledge Base toggle → controls
skip_knowledge_baseandfetch_args.AzureSearchFetcher - Index picker → builds
fetch_args.AzureSearchFetcher(indexes, top_k, vector_search, etc.) - Multimodal section + images → adds
metadata.imagesand switches to*-mmendpoints - Prompts section → fetches
/api/v2/promptsand optionally setssystem_prompt_*/response_template*in request - Submit → POST to one of (automatically chosen):
/api/v2/flexible-rag/api/v2/flexible-chat/api/v2/flexible-rag-mm/api/v2/flexible-chat-mm
- Multi-completions (N):
- Set
override_config.nto request multiple completions; alternativelyoverride_config.params.nis supported. - Precedence:
override_config.n>override_config.params.n> server default (config.llms.defaults.params.n). - When
answers[]is present, the UI renders each completion and setsanswer === answers[0].
- Set
- Feedback buttons → POST
/api/v2/feedbackwithresponse_id
What to log/store downstream (best practices)
response_id— use as a stable key for storage, analytics, and feedbackprocessing_time— latency SLOs and regressionsresponse_metadata.token_usage— for cost tracking and capacity planningresponse_metadata.model_used— which model/deployment actually produced the answerresponse_metadata.reasoning_effective— requested vs sent, provider, and sanitize statusmetadata(retrieval docs) — optional, but useful for auditabilityerrors— structured error list, if present- Chat-only: timings may be incomplete; multimodal aligns its timing blocks with RAG
Error handling notes
- All endpoints can return
errorsin the response body (an array of structured diagnostics). The UI merges these into the Answer card for visibility. - When KB retrieval fails, endpoints attempt to continue with an answer using whatever context is available (best-effort).
- Reasoning ‘auto’ is accepted on input, but downstream providers may not support it; see
response_metadata.reasoning_effectiveto understand what was actually sent.
Next steps
- For more details, explore:
rag_api_core/endpoints/v2/flexible_rag.pyrag_api_core/endpoints/v2/flexible_multimodal.pyrag_api_core/templates/v2/unified_rag_test.htmlrag_api_core/schemas/v2/requests.py,responses.py- Feedback plumbing under
telemetry/feedback/