Skip to content
cd ..

Full Text Search in SQLite: Using FTS5 for Agent Email

// · 5 min read

When your AI agents are processing hundreds of emails, they need to search through them fast. Not just by sender or date, but by content. “Find all emails where someone mentioned the Q3 budget proposal.” That’s a full text search problem, and SQLite’s FTS5 extension handles it remarkably well.

The email_search Virtual Table

AgenticMail creates an FTS5 virtual table called email_search that indexes the fields agents most commonly query against:

The table definition uses FTS5’s content option to make it a content table backed by the main emails table. This means we don’t store the text twice. The FTS5 index points back to the original rows, keeping the database size reasonable even with thousands of emails per agent.

Building the index is straightforward. Every time an email is ingested (whether via IMAP fetch or direct API push), a trigger populates the corresponding FTS5 row. Deletions and updates are handled through FTS5’s delete command to keep the index in sync.

Why FTS5 Over LIKE Queries

The naive approach to email search is WHERE body LIKE '%budget%'. This works fine with 50 emails. It falls apart at 5,000. The LIKE operator does a full table scan on every query, and with email bodies that can be tens of kilobytes each, that gets slow fast.

FTS5 builds an inverted index. Instead of scanning every row, it looks up the search term in its index and jumps directly to the matching rows. The performance difference is dramatic. Searches that took 200ms with LIKE complete in under 5ms with FTS5 on the same dataset. For an agent that might search its mailbox multiple times per conversation turn, that difference adds up.

FTS5 also gives you ranking out of the box. The bm25() function scores results by relevance, so when an agent searches for “project update deadline,” the emails that mention all three terms prominently come back first. Agents make better decisions when the most relevant emails are at the top.

The Query Sanitization Problem

Here’s the part that bit me. FTS5 has its own query syntax. Users (or in this case, agents) can type things like subject:meeting AND body:friday and FTS5 will parse that as a structured query. Sounds useful, right?

The problem is that agents don’t know FTS5 syntax, and they’ll inevitably generate queries that break the parser. An agent might search for something like re: follow up (urgent) and FTS5 will choke on the parentheses and colon because those are query operators. Even worse, a malicious input could exploit the syntax to craft queries that return unexpected results or cause errors.

The fix is deceptively simple: wrap the entire query in double quotes. When FTS5 sees a double quoted string, it treats the contents as a literal phrase search. No operator parsing, no syntax interpretation. The query "re: follow up (urgent)" searches for that exact phrase, colons and parentheses and all.

But you have to handle one more edge case: what if the query itself contains double quotes? The sanitization function escapes any existing double quotes by doubling them (FTS5’s escape convention), then wraps the whole thing. This prevents what is essentially FTS5 syntax injection, which is a real category of bugs that doesn’t get enough attention.

The sanitization function is small:

function sanitizeFtsQuery(raw: string): string {
  const escaped = raw.replace(/"/g, '""');
  return `"${escaped}"`;
}

Every search query passes through this before hitting FTS5. It’s not fancy, but it means agents can throw whatever natural language search terms they want at the system and it won’t break.

Scoping by Agent

Since multiple agents share the same database, every FTS5 query includes an agent_id filter. This ensures Agent A can’t search through Agent B’s emails. The filter happens at the SQL level:

SELECT * FROM email_search
WHERE email_search MATCH ? AND agent_id = ?
ORDER BY bm25(email_search)

The agent_id column being part of the FTS5 index means this filter is efficient. FTS5 can use the index to satisfy both the text match and the agent scope in a single pass.

The Pragmatic Choice

I looked at Elasticsearch and Meilisearch before settling on FTS5. Both are excellent search engines, but they’re external dependencies that need their own processes, configuration, and monitoring. FTS5 lives inside SQLite, which AgenticMail already depends on. No extra services, no network hops, no operational overhead. For the scale AgenticMail operates at (thousands to tens of thousands of emails per agent, not millions), FTS5 is more than sufficient, and the simplicity of keeping everything in one database file is a real operational win.

Source Code

The EmailSearchIndex class wraps FTS5 queries with agent scoping, sanitization, and ranked results in a single method call:

export class EmailSearchIndex {
  search(agentId: string, query: string, limit = 20): SearchableEmail[] {
    const sanitized = '"' + query.replace(/"/g, '""') + '"';
    const stmt = this.db.prepare(`
      SELECT agent_id as agentId, message_id as messageId, subject,
             from_address as fromAddress, to_address as toAddress,
             body_text as bodyText, received_at as receivedAt
      FROM email_search
      WHERE agent_id = ? AND email_search MATCH ?
      ORDER BY rank LIMIT ?
    `);
    return stmt.all(agentId, sanitized, limit) as SearchableEmail[];
  }
}

View the full source on GitHub

// share

// subscribe

New posts and updates straight to your inbox. No noise.

cd ..