Back to Blog
Product

Bulk email validation: processing 100k addresses in under a minute

Engineering·Mar 14, 2026·6 min read

Our async bulk pipeline was built from scratch to handle enterprise-scale validation without blocking your application. Here's the architecture — job queues, worker distribution, and result pagination.

When we launched single-address validation, the architecture was straightforward: receive request, probe SMTP, return result. Bulk validation introduced a different class of problem. Processing 50,000 addresses synchronously in a single HTTP request isn't feasible — timeouts, memory limits, and server load all become concerns. So we built an async pipeline from the ground up.

How the pipeline works

When you POST to /api/v1/validate/bulk, we immediately deduplicate the address list, create a job record in the database, enqueue all addresses as individual tasks, and return a jobId — usually within 50–100ms. Your application doesn't block waiting for results.

Behind the scenes, a fleet of worker processes picks tasks off the queue. Each worker handles a batch of addresses, performing MX lookups and SMTP probes in parallel with concurrency limits tuned per domain to avoid rate-limiting the receiving servers. Workers report progress back to the job record in real time.

Polling for results

Poll GET /api/v1/jobs/:jobId to check progress. The response includes the job status (PENDING, PROCESSING, COMPLETED, FAILED), counts by result type, and — once complete — a paginated results array. Each page returns 50 records. For a 100k job, that's 2,000 pages of results you can fetch as fast as your application needs them.

For jobs under 1,000 addresses, typical completion time is under 30 seconds. For 100k addresses, expect 45–90 seconds depending on domain mix.

Credits and failure handling

Credits are charged per processed address, not per job. If a job fails partway through, only the addresses that were actually validated are billed. Unprocessed addresses can be re-submitted in a new job. This means a network outage or infrastructure issue on our end never results in double-billing.

What about deduplication?

We deduplicate your input list before processing begins. If you submit 50,000 addresses with 3,000 duplicates, you'll be charged for 47,000 validations. The deduplicated count is returned in the queued field of the initial response so you always know exactly what you're being billed for.