Architecture
Cron Schedule
The mailbox ingest runs on the*/15 * * * * cron schedule (every 15 minutes). It is a no-op when:
MAILBOX_INGEST_ENABLEDis not set to"true"- The Microsoft Graph credentials are not configured (
AZURE_TENANT_ID,AZURE_CLIENT_ID,AZURE_CLIENT_SECRET) GRAPH_SHARED_MAILBOXis not set
Configuration
The following environment variables control the email intake pipeline:| Variable | Description |
|---|---|
MAILBOX_INGEST_ENABLED | Set to "true" to enable polling |
GRAPH_SHARED_MAILBOX | Email address of the shared mailbox (e.g., submissions@) |
AZURE_TENANT_ID | Microsoft Entra (Azure AD) tenant ID |
AZURE_CLIENT_ID | App registration client ID |
AZURE_CLIENT_SECRET | App registration client secret |
GRAPH_MAILBOX_DEFAULT_ORG_ID | Fallback org ID when no per-mailbox mapping exists |
mailbox-org:{email}) with fallback to GRAPH_MAILBOX_DEFAULT_ORG_ID and then to the literal string "default".
Processing Pipeline
1. Fetch Unread Messages
The cron job uses@openinsure/notify’s createGraphMailboxReader to authenticate with Microsoft Graph and fetch up to 25 unread messages from the configured shared mailbox.
2. Deduplication
Each message is deduplicated using its RFC 2822internetMessageId (the Message-ID header), which is stable across folder moves. The Exchange internal message ID changes when messages are moved between folders, so the internet message ID is the reliable dedup key.
Dedup keys are stored in Cloudflare KV with a 30-day TTL:
3. Attachment Download
Attachments up to 25 MB are downloaded from Graph and uploaded to Cloudflare R2 under a structured path:4. Email Body Extraction
HTML email bodies are stripped of tags and normalized to plain text. The system constructs a synthetic RFC 2822 message containing:Fromheader (sender address)SubjectheaderDateheader (received timestamp)Message-IDheader- Plain text body
5. EmailIntakeAgent Processing
The normalized email is POSTed to theEmailIntakeAgent Durable Object, which is instantiated per organization (intake-{orgId}). The agent performs two functions:
Heuristic field extraction parses the subject and body to identify:
| Field | Detection Method |
|---|---|
| Insured name | Pattern matching for “Named Insured:”, “Account:”, “Client:”, etc. |
| Line of business | Keyword matching against a map of insurance terms |
| Risk state | US state abbreviation detection |
| Input Phrase | Mapped LOB |
|---|---|
| general liability, commercial general | GL |
| workers compensation, workers’ comp | WC |
| cyber liability, cyber risk | Cyber |
| directors and officers, D&O | D&O |
| errors and omissions, professional liability | E&O |
| medical stop loss, stop loss | MedStopLoss |
| commercial property, property insurance | Property |
| umbrella | Umbrella |
priority: "high".
6. Submission Creation
The agent creates a new submission record with statusreceived:
missingItems array tracks which required fields could not be extracted: insured_name, line_of_business, risk_state.
7. SubmissionAgent Handoff
After creating the submission, the EmailIntakeAgent spawns aSubmissionAgent Durable Object for AI-powered deep extraction. The SubmissionAgent processes the raw email text and any uploaded attachments (ACORD forms, loss runs, dec pages) using LLM-based extraction to fill in missing fields and enrich the submission data.
8. Event Emission
Asubmission.mailbox_ingested event is sent to the Cloudflare Queue with:
Dual Intake Paths
The EmailIntakeAgent supports two intake paths: Cloudflare Email Routing (preferred) — When configured, inbound emails are routed directly to the agent’sonEmail handler via the Agent SDK. This path uses PostalMime to parse the raw email bytes and checks for auto-reply headers to skip bounce-backs and out-of-office responses.
Cron-based Graph polling (legacy) — The sweepMailboxIngest cron polls the Microsoft 365 shared mailbox and POSTs normalized emails to the agent’s /email REST endpoint. This path is used when Cloudflare Email Routing is not configured or when the mailbox is hosted on Microsoft 365.
Both paths converge on the same extraction logic and submission creation code.
Auto-Reply Filtering
The Cloudflare Email Routing path automatically filters out auto-replies by inspecting email headers. Messages matching auto-reply patterns (out-of-office, delivery status notifications, bounce messages) are silently discarded.Error Handling
The cron job tracks three counters per run:| Counter | Description |
|---|---|
processed | Messages successfully processed through the pipeline |
skipped | Messages skipped due to deduplication |
errors | Messages that failed (logged via logNonFatal) |
Monitoring
The mailbox ingest cron is monitored through the standard Axiom observability pipeline. Key signals to watch:cron.mailbox_ingest— top-level cron failurescron.mailbox_ingest.message— per-message processing failures- The
submission.mailbox_ingestedevent in the queue provides a real-time feed of successfully ingested submissions