RAG Service
← Back to Documentation Center

Service Status

This page summarizes how to check the service status and health, and links to the developer-focused Unified Test UI guide.

Health & Monitoring

  • Dashboard (v2): GET /api/v2/health/check
  • JSON summary (v2): GET /api/v2/health/service-health?test_services=true|false
  • Liveness: GET /api/health/live
  • Readiness: GET /api/health/ready
  • Deep check (v1): GET /api/health/check

For details about what’s checked and example payloads, see Endpoints > Health.

Unified Test UI

Looking for the end-to-end example usage and API mapping? See the dedicated page:

  • Unified Test UI: /api/v2/docs-center/public/endpoints/unified_test_ui

That page covers model selection, reasoning effort overrides, flexible text and multimodal flows, prompts, token usage telemetry, and the feedback API, with complete request/response examples. history = history or [] # list of {role, content} payload = { "question": question, "history": history, "fetch_args": {}, "template_variables": {}, "metadata": {}, "override_config": OVERRIDES, } r = requests.post(f"{BASE_URL}/flexible-chat", json=payload) r.raise_for_status() return r.json()

Flexible RAG (multimodal)

def flexible_rag_mm(question: str, images=None): images = images or [] # [{"url":"https://..."}] or [{"data_url":"data:image/...","detail":"high"}] payload = { "question": question, "history": [], "fetch_args": {"AzureSearchFetcher": {"query": question, "top_k": 5, "vector_search": True}}, "template_variables": {}, "metadata": {"images": images}, "override_config": OVERRIDES, } r = requests.post(f"{BASE_URL}/flexible-rag-mm", json=payload) r.raise_for_status() return r.json()

Feedback

def send_feedback(response_id: str, answer_payload: dict, rating: int): body = { "user_id": "demo_user", "session_id": "demo_session", "experiment_name": answer_payload.get("ab_testing", {}).get("experiment_name"), "variant_name": answer_payload.get("ab_testing", {}).get("variant_name"), "response_id": response_id, "rating": rating, "feedback_text": "", "response_time": answer_payload.get("processing_time"), "task_completed": True, "response_payload": { "question": answer_payload.get("response_metadata", {}).get("question"), "answer": answer_payload.get("answer"), "metadata": answer_payload.get("metadata", []), }, } r = requests.post(f"{BASE_URL}/feedback", json=body) r.raise_for_status() return r.json()

if name == "main": ans = flexible_rag("What is the support policy for X?") print(json.dumps(ans, indent=2)) if ans.get("response_id"): fb = send_feedback(ans["response_id"], ans, 1) print("feedback:", fb)

### Node.js (fetch)

```js
// Requires Node 18+ (built-in fetch). For older versions, use node-fetch or axios.
const BASE_URL = process.env.RAG_API_BASE || 'http://localhost:8000/api/v2';

const OVERRIDES = {
  llm: 'default',
  reasoning_effort: 'auto',
  reasoning: { effort: 'auto' }
};

async function flexibleRag(question, useKB = true){
  const payload = {
    question,
    skip_knowledge_base: !useKB,
    fetch_args: useKB ? { AzureSearchFetcher: { query: question, top_k: 5, vector_search: true } } : {},
    history: [],
    template_variables: { domain: 'general knowledge', response_style: 'formal' },
    metadata: { user_id: 'demo_user' },
    override_config: OVERRIDES,
  };
  const r = await fetch(`${BASE_URL}/flexible-rag`, { method:'POST', headers:{'Content-Type':'application/json'}, body: JSON.stringify(payload) });
  if(!r.ok) throw new Error(`HTTP ${r.status}`);
  return r.json();
}

async function flexibleChat(question, history = []){
  const payload = { question, history, fetch_args: {}, template_variables: {}, metadata: {}, override_config: OVERRIDES };
  const r = await fetch(`${BASE_URL}/flexible-chat`, { method:'POST', headers:{'Content-Type':'application/json'}, body: JSON.stringify(payload) });
  if(!r.ok) throw new Error(`HTTP ${r.status}`);
  return r.json();
}

async function flexibleRagMM(question, images = []){
  const payload = { question, history: [], fetch_args: { AzureSearchFetcher: { query: question, top_k: 5, vector_search: true } }, template_variables: {}, metadata: { images }, override_config: OVERRIDES };
  const r = await fetch(`${BASE_URL}/flexible-rag-mm`, { method:'POST', headers:{'Content-Type':'application/json'}, body: JSON.stringify(payload) });
  if(!r.ok) throw new Error(`HTTP ${r.status}`);
  return r.json();
}

async function sendFeedback(responseId, answerPayload, rating){
  const body = {
    user_id: 'demo_user',
    session_id: 'demo_session',
    experiment_name: answerPayload?.ab_testing?.experiment_name || null,
    variant_name: answerPayload?.ab_testing?.variant_name || null,
    response_id: responseId,
    rating,
    feedback_text: '',
    response_time: answerPayload?.processing_time || null,
    task_completed: true,
    response_payload: {
      question: answerPayload?.response_metadata?.question || null,
      answer: answerPayload?.answer || null,
      metadata: answerPayload?.metadata || []
    }
  };
  const r = await fetch(`${BASE_URL}/feedback`, { method:'POST', headers:{'Content-Type':'application/json'}, body: JSON.stringify(body) });
  if(!r.ok) throw new Error(`HTTP ${r.status}`);
  return r.json();
}

(async () => {
  const ans = await flexibleRag('What is the support policy for X?');
  console.log(JSON.stringify(ans, null, 2));
  if(ans.response_id){
    const fb = await sendFeedback(ans.response_id, ans, 1);
    console.log('feedback:', fb);
  }
})();

UI-to-API mapping (quick reference)

  • Model selector → GET /api/v2/models → writes override_config.llm in request
  • Reasoning effort pill → writes override_config.reasoning_effort and override_config.reasoning.effort
  • Knowledge Base toggle → controls skip_knowledge_base and fetch_args.AzureSearchFetcher
  • Index picker → builds fetch_args.AzureSearchFetcher (indexes, top_k, vector_search, etc.)
  • Multimodal toggle + images → adds metadata.images
  • Prompts section → fetches /api/v2/prompts and optionally sets system_prompt_* / response_template* in request
  • Submit → POST to one of:
  • /api/v2/flexible-rag
  • /api/v2/flexible-chat
  • /api/v2/flexible-rag-mm
  • /api/v2/flexible-chat-mm
  • Feedback buttons → POST /api/v2/feedback with response_id

What to log/store downstream (best practices)

  • response_id — use as a stable key for storage, analytics, and feedback
  • processing_time — latency SLOs and regressions
  • response_metadata.token_usage — for cost tracking and capacity planning
  • response_metadata.model_used — which model/deployment actually produced the answer
  • response_metadata.reasoning_effective — requested vs sent, provider, and sanitize status
  • metadata (retrieval docs) — optional, but useful for auditability
  • errors — structured error list, if present
  • Chat-only: timings may be incomplete; multimodal aligns its timing blocks with RAG

Error handling notes

  • All endpoints can return errors in the response body (an array of structured diagnostics). The UI merges these into the Answer card for visibility.
  • When KB retrieval fails, endpoints attempt to continue with an answer using whatever context is available (best-effort).
  • Reasoning ‘auto’ is accepted on input, but downstream providers may not support it; see response_metadata.reasoning_effective to understand what was actually sent.

Next steps

  • For more details, explore:
  • rag_api_core/endpoints/v2/flexible_rag.py
  • rag_api_core/endpoints/v2/flexible_multimodal.py
  • rag_api_core/templates/v2/unified_rag_test.html
  • rag_api_core/schemas/v2/requests.py, responses.py
  • Feedback plumbing under telemetry/feedback/