Knowledge Base and RAG for Enterprise AI Agents

RAG (Retrieval Augmented Generation) is well understood at this point. Chunk your documents, embed them, retrieve relevant chunks at query time, inject them into the LLM’s context. What’s less understood is how RAG changes when your consumers aren’t humans typing questions into a search bar but AI agents operating autonomously, making decisions and taking actions based on the knowledge they retrieve.

I built the Knowledge Base in AgenticMail Enterprise to handle that shift.

Document Ingestion and Chunking

The system ingests PDFs, Word documents, plain text, Markdown, HTML, and structured data formats like CSV and JSON. Each document goes through a chunking pipeline that splits content into semantically coherent segments. The chunker respects document structure: headings, paragraphs, list items, and table boundaries. It doesn’t blindly split at a character count.

Chunk sizes are configurable, but the defaults are tuned for agent consumption rather than human Q&A. Agents need more context per retrieval than a human who can infer missing details, so the default chunk size is larger with generous overlap to preserve continuity across boundaries.

Bulk import handles the initial migration. Point it at a folder, a SharePoint library, or a Confluence space, and it ingests everything. Incremental updates happen automatically as source documents change.

BM25F Full Text Search

I went with BM25F for retrieval instead of pure vector search. BM25F is a field aware variant of BM25 that lets you weight different fields differently. A match in a document title is worth more than a match in the body text. A match in a section heading is worth more than a match in a footnote.

The reason for choosing BM25F over vector embeddings as the primary method is precision. When an agent needs the company’s refund policy, it’s searching for specific terms and phrases. Vector search is great for semantic similarity, but it can return tangentially related content that wastes context window space. BM25F returns documents that actually contain the terms the agent needs.

That said, the system supports hybrid retrieval. You can enable vector search alongside BM25F and configure the weighting. For most enterprise use cases, BM25F alone outperforms pure vector search because the documents are well structured and the queries are specific.

Automatic Context Injection

This is where the agent specific design matters most. When an agent is about to take an action or formulate a response, the system automatically queries the knowledge base based on the current conversation context and injects relevant chunks into the agent’s prompt.

The injection is smart about token budgets. It ranks retrieved chunks by relevance and fills the available context window without exceeding the limit. If the agent has already used 60% of its context on conversation history, the knowledge injection adapts to fit within the remaining 40%.

Agents don’t have to explicitly search the knowledge base. They just operate with the right information already available to them. This is a critical design choice. An agent that has to decide when to search and what to search for will sometimes fail to search when it should. Automatic injection removes that failure mode.

Per Agent Access Control

Not every agent should see every document. The knowledge base supports access control lists at the document and collection level. An agent handling customer support gets access to the product documentation and FAQ collection. An agent handling internal HR inquiries gets access to the employee handbook and policy documents. Neither can see the other’s corpus.

This maps to the principle of least privilege. The access control system integrates with the same role and permission model used for tool access and action approval.

Agent Contributed Knowledge

Here’s something that doesn’t exist in traditional RAG systems: agents can write back to the knowledge base. When an agent resolves a novel issue, it can contribute the resolution as a new knowledge article. When it discovers that a particular approach works better than what’s documented, it can flag the article for review or submit an update.

This creates a feedback loop. The knowledge base improves over time because the agents using it are also contributing to it. Human reviewers moderate the contributions, but the raw material comes from actual operations.

RAG for Agents Is Different

The fundamental shift is that human RAG is about answering questions. Agent RAG is about informing actions. When an agent retrieves your company’s escalation policy, it’s not summarizing it for someone to read. It’s using it to decide whether to escalate the current case. The stakes are higher because the retrieval quality directly affects the action quality.

That’s why precision matters more than recall, why automatic injection beats manual search, and why access control is non negotiable. The knowledge base isn’t a nice chatbot feature. It’s the foundation that determines whether your agents make good decisions or bad ones.

Source Code

The KBConfig interface controls chunking, embedding, retrieval thresholds, and automatic URL refresh for the knowledge base:

export interface KBConfig {
  chunkSize: number;              // Target tokens per chunk (default: 512)
  chunkOverlap: number;           // Overlap tokens between chunks
  embeddingModel: string;         // e.g. "text-embedding-3-small"
  embeddingProvider: 'openai' | 'local' | 'none';
  maxResultsPerQuery: number;     // Default: 5
  minSimilarityScore: number;     // Default: 0.7
  autoRefreshUrls: boolean;
  refreshIntervalHours: number;   // Default: 24
}

View the full source on GitHub