← Back to Documentation Center

Link Ingestion

This page describes how web links (URLs) are ingested and processed by the data pipelines.

Supported Features

Fetches content from web pages via HTTP
Extracts main text, metadata, and relevant sections
Handles HTML, PDF, and other common formats
Supports scheduled or manual crawling

Typical Workflow

Add or discover a link/URL
Pipeline fetches and extracts content
Data is chunked and indexed into the knowledge base

Back to Data Pipelines