Skip to content
cd ..

Teaching AI Agents to Read SMS and Verification Codes

// · 5 min read

There’s a gap in every AI agent’s capability set: SMS. An agent can browse the web, write code, send emails, but when a service says “we just sent you a verification code,” the agent hits a wall. It can’t read text messages.

Unless it has AgenticMail’s SmsManager.

The Google Voice bridge

Google Voice forwards incoming SMS messages to your Gmail inbox as email. This is the bridge that makes everything work. Instead of building a direct SMS integration (which would require a Twilio account, phone number provisioning, and a whole telephony stack), AgenticMail treats SMS as a special case of email.

The SmsManager class orchestrates the whole flow, and SmsPoller handles the mechanics of watching for new messages.

Two polling modes

SmsPoller supports two modes of operation:

Continuous polling runs in a loop, checking the inbox at a configurable interval (default: every 10 seconds). This is what you use when an agent is actively waiting for a verification code. The poller watches for new emails from Google Voice’s notification address, parses them, and delivers the SMS content to the agent.

One shot polling waits for a single matching SMS and returns it. You call it when you’ve just triggered a verification code send and you know a message is coming. It polls until it finds the message or hits a timeout.

The two modes exist because the usage patterns are different. An agent that’s monitoring an ongoing SMS conversation needs continuous polling. An agent that just needs one OTP code needs the one shot mode so it can move on as soon as the code arrives.

Parsing Google Voice email formats

This turned out to be the messiest part of the implementation. Google Voice has changed its email notification format multiple times over the years, and there’s no documentation for any of them. The parser handles multiple known formats:

Plain text notifications where the SMS body is directly in the email body, with a header line identifying the sender’s phone number.

HTML notifications with the SMS content wrapped in Google’s template markup. The parser extracts the message from the HTML structure, which varies depending on whether it’s a single message or part of a conversation thread.

Conversation style notifications where Google groups multiple messages from the same sender into a single email. The parser splits these into individual messages with their timestamps.

Each format requires its own extraction logic. I discovered most of these formats empirically by sending test messages to a Google Voice number and examining what arrived in Gmail. There’s no spec; you just have to handle what you see.

Seven OTP patterns

Once the SMS text is extracted, the next step is often pulling out a verification code. The extractVerificationCode() function supports seven distinct OTP patterns:

  1. Numeric codes with labels: “Your code is 123456” or “Verification code: 7890”
  2. Alphanumeric codes: “Your code is A1B2C3” for services that use mixed character codes
  3. Google G codes: The format Google uses for its own 2FA, which looks like “G 123456” with that distinctive G prefix
  4. Dash separated codes: “Your code is 123 456” or “Code: 12 34 56” where the digits are grouped with spaces
  5. Parenthetical codes: “(123456) is your verification code”
  6. Standalone numeric sequences: When the entire message is just a code with minimal surrounding text
  7. URL embedded codes: Verification links that contain the code as a query parameter

The function tries each pattern in order and returns the first match. The ordering matters because some patterns are more specific than others. You want to try “Your Google verification code is G 123456” before falling back to “any six digit number in the message.”

Putting it together

A typical flow looks like this: the agent navigates to a signup page, enters a phone number (the Google Voice number), submits the form, then calls SmsManager.waitForCode(). Under the hood, this starts a one shot poll, watches for a new Google Voice email, parses it, extracts the verification code, and returns it to the agent. The agent types the code into the form and continues.

The whole thing takes a few seconds. The agent never knows it’s going through Gmail. It just asks for the latest SMS code and gets it.

Source Code

The extractVerificationCode function is the heart of OTP extraction. It runs through a prioritized list of regex patterns, from the most specific (labeled codes with keywords) to the most generic (standalone six digit sequences). The first match wins, which is why ordering matters.

export function extractVerificationCode(smsBody: string): string | null {
  const patterns = [
    /(?:code|pin|otp|token|password)\s*(?:is|:)\s*(\d{4,8})/i,
    /(\d{4,8})\s+is\s+your\s+(?:code|pin|otp|verification)/i,
    /[Gg]-(\d{4,8})/,
    /^\s*(\d{6})\s*$/m,
  ];
  for (const pattern of patterns) {
    const match = smsBody.match(pattern);
    if (match?.[1]) return match[1];
  }
  return null;
}

View the full source on GitHub

It’s a hack, honestly. A proper SMS integration would be cleaner. But it works with zero additional infrastructure, zero monthly fees, and zero phone number provisioning. For agents that occasionally need to handle verification codes, that tradeoff is worth it.

// share

// subscribe

New posts and updates straight to your inbox. No noise.

cd ..