Link Ingestion
This page describes how web links (URLs) are ingested and processed by the data pipelines.
Supported Features
- Fetches content from web pages via HTTP
- Extracts main text, metadata, and relevant sections
- Handles HTML, PDF, and other common formats
- Supports scheduled or manual crawling
Typical Workflow
- Add or discover a link/URL
- Pipeline fetches and extracts content
- Data is chunked and indexed into the knowledge base