Service Status
This page summarizes how to check the service status and health, and links to the developer-focused Unified Test UI guide.
Health & Monitoring
- Dashboard (v2): GET
/api/v2/health/check - JSON summary (v2): GET
/api/v2/health/service-health?test_services=true|false - Liveness: GET
/api/health/live - Readiness: GET
/api/health/ready - Deep check (v1): GET
/api/health/check
For details about what’s checked and example payloads, see Endpoints > Health.
Unified Test UI
Looking for the end-to-end example usage and API mapping? See the dedicated page:
- Unified Test UI:
/api/v2/docs-center/public/endpoints/unified_test_ui
That page covers model selection, reasoning effort overrides, flexible text and multimodal flows, prompts, token usage telemetry, and the feedback API, with complete request/response examples. history = history or [] # list of {role, content} payload = { "question": question, "history": history, "fetch_args": {}, "template_variables": {}, "metadata": {}, "override_config": OVERRIDES, } r = requests.post(f"{BASE_URL}/flexible-chat", json=payload) r.raise_for_status() return r.json()
Flexible RAG (multimodal)
def flexible_rag_mm(question: str, images=None): images = images or [] # [{"url":"https://..."}] or [{"data_url":"data:image/...","detail":"high"}] payload = { "question": question, "history": [], "fetch_args": {"AzureSearchFetcher": {"query": question, "top_k": 5, "vector_search": True}}, "template_variables": {}, "metadata": {"images": images}, "override_config": OVERRIDES, } r = requests.post(f"{BASE_URL}/flexible-rag-mm", json=payload) r.raise_for_status() return r.json()
Feedback
def send_feedback(response_id: str, answer_payload: dict, rating: int): body = { "user_id": "demo_user", "session_id": "demo_session", "experiment_name": answer_payload.get("ab_testing", {}).get("experiment_name"), "variant_name": answer_payload.get("ab_testing", {}).get("variant_name"), "response_id": response_id, "rating": rating, "feedback_text": "", "response_time": answer_payload.get("processing_time"), "task_completed": True, "response_payload": { "question": answer_payload.get("response_metadata", {}).get("question"), "answer": answer_payload.get("answer"), "metadata": answer_payload.get("metadata", []), }, } r = requests.post(f"{BASE_URL}/feedback", json=body) r.raise_for_status() return r.json()
if name == "main": ans = flexible_rag("What is the support policy for X?") print(json.dumps(ans, indent=2)) if ans.get("response_id"): fb = send_feedback(ans["response_id"], ans, 1) print("feedback:", fb)
### Node.js (fetch)
```js
// Requires Node 18+ (built-in fetch). For older versions, use node-fetch or axios.
const BASE_URL = process.env.RAG_API_BASE || 'http://localhost:8000/api/v2';
const OVERRIDES = {
llm: 'default',
reasoning_effort: 'auto',
reasoning: { effort: 'auto' }
};
async function flexibleRag(question, useKB = true){
const payload = {
question,
skip_knowledge_base: !useKB,
fetch_args: useKB ? { AzureSearchFetcher: { query: question, top_k: 5, vector_search: true } } : {},
history: [],
template_variables: { domain: 'general knowledge', response_style: 'formal' },
metadata: { user_id: 'demo_user' },
override_config: OVERRIDES,
};
const r = await fetch(`${BASE_URL}/flexible-rag`, { method:'POST', headers:{'Content-Type':'application/json'}, body: JSON.stringify(payload) });
if(!r.ok) throw new Error(`HTTP ${r.status}`);
return r.json();
}
async function flexibleChat(question, history = []){
const payload = { question, history, fetch_args: {}, template_variables: {}, metadata: {}, override_config: OVERRIDES };
const r = await fetch(`${BASE_URL}/flexible-chat`, { method:'POST', headers:{'Content-Type':'application/json'}, body: JSON.stringify(payload) });
if(!r.ok) throw new Error(`HTTP ${r.status}`);
return r.json();
}
async function flexibleRagMM(question, images = []){
const payload = { question, history: [], fetch_args: { AzureSearchFetcher: { query: question, top_k: 5, vector_search: true } }, template_variables: {}, metadata: { images }, override_config: OVERRIDES };
const r = await fetch(`${BASE_URL}/flexible-rag-mm`, { method:'POST', headers:{'Content-Type':'application/json'}, body: JSON.stringify(payload) });
if(!r.ok) throw new Error(`HTTP ${r.status}`);
return r.json();
}
async function sendFeedback(responseId, answerPayload, rating){
const body = {
user_id: 'demo_user',
session_id: 'demo_session',
experiment_name: answerPayload?.ab_testing?.experiment_name || null,
variant_name: answerPayload?.ab_testing?.variant_name || null,
response_id: responseId,
rating,
feedback_text: '',
response_time: answerPayload?.processing_time || null,
task_completed: true,
response_payload: {
question: answerPayload?.response_metadata?.question || null,
answer: answerPayload?.answer || null,
metadata: answerPayload?.metadata || []
}
};
const r = await fetch(`${BASE_URL}/feedback`, { method:'POST', headers:{'Content-Type':'application/json'}, body: JSON.stringify(body) });
if(!r.ok) throw new Error(`HTTP ${r.status}`);
return r.json();
}
(async () => {
const ans = await flexibleRag('What is the support policy for X?');
console.log(JSON.stringify(ans, null, 2));
if(ans.response_id){
const fb = await sendFeedback(ans.response_id, ans, 1);
console.log('feedback:', fb);
}
})();
UI-to-API mapping (quick reference)
- Model selector → GET
/api/v2/models→ writesoverride_config.llmin request - Reasoning effort pill → writes
override_config.reasoning_effortandoverride_config.reasoning.effort - Knowledge Base toggle → controls
skip_knowledge_baseandfetch_args.AzureSearchFetcher - Index picker → builds
fetch_args.AzureSearchFetcher(indexes, top_k, vector_search, etc.) - Multimodal toggle + images → adds
metadata.images - Prompts section → fetches
/api/v2/promptsand optionally setssystem_prompt_*/response_template*in request - Submit → POST to one of:
/api/v2/flexible-rag/api/v2/flexible-chat/api/v2/flexible-rag-mm/api/v2/flexible-chat-mm- Feedback buttons → POST
/api/v2/feedbackwithresponse_id
What to log/store downstream (best practices)
response_id— use as a stable key for storage, analytics, and feedbackprocessing_time— latency SLOs and regressionsresponse_metadata.token_usage— for cost tracking and capacity planningresponse_metadata.model_used— which model/deployment actually produced the answerresponse_metadata.reasoning_effective— requested vs sent, provider, and sanitize statusmetadata(retrieval docs) — optional, but useful for auditabilityerrors— structured error list, if present- Chat-only: timings may be incomplete; multimodal aligns its timing blocks with RAG
Error handling notes
- All endpoints can return
errorsin the response body (an array of structured diagnostics). The UI merges these into the Answer card for visibility. - When KB retrieval fails, endpoints attempt to continue with an answer using whatever context is available (best-effort).
- Reasoning ‘auto’ is accepted on input, but downstream providers may not support it; see
response_metadata.reasoning_effectiveto understand what was actually sent.
Next steps
- For more details, explore:
rag_api_core/endpoints/v2/flexible_rag.pyrag_api_core/endpoints/v2/flexible_multimodal.pyrag_api_core/templates/v2/unified_rag_test.htmlrag_api_core/schemas/v2/requests.py,responses.py- Feedback plumbing under
telemetry/feedback/