Service Status

This page summarizes how to check the service status and health, and links to the developer-focused Unified Test UI guide.

Health & Monitoring

Dashboard (v2): GET /api/v2/health/check
JSON summary (v2): GET /api/v2/health/service-health?test_services=true|false
Liveness: GET /api/health/live
Readiness: GET /api/health/ready
Deep check (v1): GET /api/health/check

For details about what’s checked and example payloads, see Endpoints > Health.

Unified Test UI

Looking for the end-to-end example usage and API mapping? See the dedicated page:

Unified Test UI: /api/v2/docs-center/public/endpoints/unified_test_ui

That page covers model selection, reasoning effort overrides, flexible text and multimodal flows, prompts, token usage telemetry, and the feedback API, with complete request/response examples. history = history or [] # list of {role, content} payload = { "question": question, "history": history, "fetch_args": {}, "template_variables": {}, "metadata": {}, "override_config": OVERRIDES, } r = requests.post(f"{BASE_URL}/flexible-chat", json=payload) r.raise_for_status() return r.json()

Flexible RAG (multimodal)

def flexible_rag_mm(question: str, images=None): images = images or [] # [{"url":"https://..."}] or [{"data_url":"data:image/...","detail":"high"}] payload = { "question": question, "history": [], "fetch_args": {"AzureSearchFetcher": {"query": question, "top_k": 5, "vector_search": True}}, "template_variables": {}, "metadata": {"images": images}, "override_config": OVERRIDES, } r = requests.post(f"{BASE_URL}/flexible-rag-mm", json=payload) r.raise_for_status() return r.json()

Feedback

def send_feedback(response_id: str, answer_payload: dict, rating: int): body = { "user_id": "demo_user", "session_id": "demo_session", "experiment_name": answer_payload.get("ab_testing", {}).get("experiment_name"), "variant_name": answer_payload.get("ab_testing", {}).get("variant_name"), "response_id": response_id, "rating": rating, "feedback_text": "", "response_time": answer_payload.get("processing_time"), "task_completed": True, "response_payload": { "question": answer_payload.get("response_metadata", {}).get("question"), "answer": answer_payload.get("answer"), "metadata": answer_payload.get("metadata", []), }, } r = requests.post(f"{BASE_URL}/feedback", json=body) r.raise_for_status() return r.json()

if name == "main": ans = flexible_rag("What is the support policy for X?") print(json.dumps(ans, indent=2)) if ans.get("response_id"): fb = send_feedback(ans["response_id"], ans, 1) print("feedback:", fb)

### Node.js (fetch)

```js
// Requires Node 18+ (built-in fetch). For older versions, use node-fetch or axios.
const BASE_URL = process.env.RAG_API_BASE || 'http://localhost:8000/api/v2';

const OVERRIDES = {
  llm: 'default',
  reasoning_effort: 'auto',
  reasoning: { effort: 'auto' }
};

async function flexibleRag(question, useKB = true){
  const payload = {
    question,
    skip_knowledge_base: !useKB,
    fetch_args: useKB ? { AzureSearchFetcher: { query: question, top_k: 5, vector_search: true } } : {},
    history: [],
    template_variables: { domain: 'general knowledge', response_style: 'formal' },
    metadata: { user_id: 'demo_user' },
    override_config: OVERRIDES,
  };
  const r = await fetch(`${BASE_URL}/flexible-rag`, { method:'POST', headers:{'Content-Type':'application/json'}, body: JSON.stringify(payload) });
  if(!r.ok) throw new Error(`HTTP ${r.status}`);
  return r.json();
}

async function flexibleChat(question, history = []){
  const payload = { question, history, fetch_args: {}, template_variables: {}, metadata: {}, override_config: OVERRIDES };
  const r = await fetch(`${BASE_URL}/flexible-chat`, { method:'POST', headers:{'Content-Type':'application/json'}, body: JSON.stringify(payload) });
  if(!r.ok) throw new Error(`HTTP ${r.status}`);
  return r.json();
}

async function flexibleRagMM(question, images = []){
  const payload = { question, history: [], fetch_args: { AzureSearchFetcher: { query: question, top_k: 5, vector_search: true } }, template_variables: {}, metadata: { images }, override_config: OVERRIDES };
  const r = await fetch(`${BASE_URL}/flexible-rag-mm`, { method:'POST', headers:{'Content-Type':'application/json'}, body: JSON.stringify(payload) });
  if(!r.ok) throw new Error(`HTTP ${r.status}`);
  return r.json();
}

async function sendFeedback(responseId, answerPayload, rating){
  const body = {
    user_id: 'demo_user',
    session_id: 'demo_session',
    experiment_name: answerPayload?.ab_testing?.experiment_name || null,
    variant_name: answerPayload?.ab_testing?.variant_name || null,
    response_id: responseId,
    rating,
    feedback_text: '',
    response_time: answerPayload?.processing_time || null,
    task_completed: true,
    response_payload: {
      question: answerPayload?.response_metadata?.question || null,
      answer: answerPayload?.answer || null,
      metadata: answerPayload?.metadata || []
    }
  };
  const r = await fetch(`${BASE_URL}/feedback`, { method:'POST', headers:{'Content-Type':'application/json'}, body: JSON.stringify(body) });
  if(!r.ok) throw new Error(`HTTP ${r.status}`);
  return r.json();
}

(async () => {
  const ans = await flexibleRag('What is the support policy for X?');
  console.log(JSON.stringify(ans, null, 2));
  if(ans.response_id){
    const fb = await sendFeedback(ans.response_id, ans, 1);
    console.log('feedback:', fb);
  }
})();

UI-to-API mapping (quick reference)

Model selector → GET /api/v2/models → writes override_config.llm in request
Reasoning effort pill → writes override_config.reasoning_effort and override_config.reasoning.effort
Knowledge Base toggle → controls skip_knowledge_base and fetch_args.AzureSearchFetcher
Index picker → builds fetch_args.AzureSearchFetcher (indexes, top_k, vector_search, etc.)
Multimodal toggle + images → adds metadata.images
Prompts section → fetches /api/v2/prompts and optionally sets system_prompt_* / response_template* in request
Submit → POST to one of:
/api/v2/flexible-rag
/api/v2/flexible-chat
/api/v2/flexible-rag-mm
/api/v2/flexible-chat-mm
Feedback buttons → POST /api/v2/feedback with response_id

What to log/store downstream (best practices)

response_id — use as a stable key for storage, analytics, and feedback
processing_time — latency SLOs and regressions
response_metadata.token_usage — for cost tracking and capacity planning
response_metadata.model_used — which model/deployment actually produced the answer
response_metadata.reasoning_effective — requested vs sent, provider, and sanitize status
metadata (retrieval docs) — optional, but useful for auditability
errors — structured error list, if present
Chat-only: timings may be incomplete; multimodal aligns its timing blocks with RAG

Error handling notes

All endpoints can return errors in the response body (an array of structured diagnostics). The UI merges these into the Answer card for visibility.
When KB retrieval fails, endpoints attempt to continue with an answer using whatever context is available (best-effort).
Reasoning ‘auto’ is accepted on input, but downstream providers may not support it; see response_metadata.reasoning_effective to understand what was actually sent.

Next steps

For more details, explore:
rag_api_core/endpoints/v2/flexible_rag.py
rag_api_core/endpoints/v2/flexible_multimodal.py
rag_api_core/templates/v2/unified_rag_test.html
rag_api_core/schemas/v2/requests.py, responses.py
Feedback plumbing under telemetry/feedback/