The Outbound Guard: Preventing AI Agents from Leaking Sensitive Data

Here’s a scenario that keeps me up at night: you give an AI agent the ability to send email, it gets a cleverly worded prompt injection in an incoming message, and it replies with your AWS credentials, your home address, or the contents of your .env file.

This is why I built outbound-guard.ts in AgenticMail. Every outgoing email passes through this module before it leaves the server. If the message contains something that should never be in an email, the guard blocks it.

Five categories of detection

The outbound guard runs 30+ detection rules organized into five categories, each with its own severity level:

PII Detection catches social security numbers, credit card numbers (with Luhn validation), passport numbers, and date of birth patterns. These are things an agent should never include in an outgoing email, period. An SSN match is an instant block.

Credential Detection looks for API keys, bearer tokens, JWT tokens, private keys (RSA, EC, PGP), database connection strings, and password patterns. The regex patterns here are deliberately broad. I’d rather have a false positive on “my password is ‘correct horse battery staple’” than let a real AWS secret key slip through.

System Internals catches file paths, environment variables, IP addresses with ports, stack traces, and SQL queries. If an agent is dumping /home/user/.ssh/id_rsa or DATABASE_URL=postgres://... into an email body, something has gone very wrong.

Owner Privacy protects personal information about the agent’s owner. Home addresses, phone numbers, financial details, and medical information all get flagged. The patterns here are tuned to catch the kinds of information that a social engineering attack might try to extract.

Attachment Risk examines attachment filenames and flags executable files, scripts, and archives that might contain sensitive data. An agent shouldn’t be sending .exe files or .sql dumps through email.

HTML stripping with concatenation

One of the trickier problems is evasion through HTML formatting. Imagine an email that looks like this in raw HTML:

<span>AKIA</span><span style="display:none">garbage</span><span>IOSFODNN7EXAMPLE</span>

A naive scanner that just strips HTML tags would see AKIAgarbageIOSFODNN7EXAMPLE and miss the AWS key. A scanner that only checks visible text might also miss it depending on how it handles hidden elements.

The outbound guard strips all HTML tags and concatenates the remaining text, then runs detection on both the original HTML and the stripped version. Hidden elements get removed first, so the stripped version would correctly yield AKIAIOSFODNN7EXAMPLE and trigger the credential detection rule.

Severity levels and scoring

Not every detection carries the same weight. The guard assigns severity levels:

Critical: Instant block. SSNs, private keys, database connection strings.
High: Blocked unless the score stays under threshold. API keys, credit card numbers.
Medium: Contributes to the cumulative score. IP addresses, file paths.
Low: Logged but rarely blocked alone. Generic patterns that might be false positives.

The total score across all detections determines the final verdict: clean, suspicious, or blocked.

Internal email bypass

There’s one important exception. Emails sent between agents on the same AgenticMail instance bypass the outbound guard entirely. If Agent A needs to pass a database URL to Agent B as part of a task, that’s a legitimate internal operation. The guard only activates for messages leaving the system.

This distinction matters because inter agent communication often legitimately includes the kind of data that would be dangerous in an external email. Configuration values, credentials for shared services, system paths; these are all normal in internal coordination but catastrophic if leaked externally.

Why this matters

The outbound guard is the last line of defense. The spam filter catches malicious inbound messages. The sanitizer strips dangerous content. But if both of those fail and a prompt injection convinces the agent to exfiltrate data, the outbound guard is what prevents the actual leak.

Source Code

The scanOutboundEmail function is the entry point for all outbound scanning. It first checks whether the recipients are all internal (same instance), and if so, skips scanning entirely. For external recipients, it strips HTML, concatenates all text content, and runs every detection rule against the combined body.

export function scanOutboundEmail(input: OutboundScanInput): OutboundScanResult {
  const recipients = Array.isArray(input.to) ? input.to : [input.to];
  const allInternal = recipients.every(r => {
    const domain = r.split('@').pop()?.toLowerCase();
    return domain === 'localhost';
  });
  if (allInternal) {
    return { warnings: [], hasHighSeverity: false, blocked: false, summary: '' };
  }

  const warnings: OutboundWarning[] = [];
  const strippedHtml = input.html ? stripHtmlTags(input.html) : '';
  const combined = [input.subject ?? '', input.text ?? '', strippedHtml].join('\n');

  for (const rule of OUTBOUND_TEXT_RULES) {
    const match = rule.test(combined);
    if (match) {
      warnings.push({
        category: rule.category, severity: rule.severity,
        ruleId: rule.id, description: rule.description,
        match: match.length > 80 ? match.slice(0, 80) + '...' : match,
      });
    }
  }
  const hasHigh = warnings.some(w => w.severity === 'high');
  return { warnings, hasHighSeverity: hasHigh, blocked: hasHigh, summary };
}

View the full source on GitHub

I sleep a little better knowing it’s there.