How to implement a modular passthrough system within your Merge implementation

Last updated: June 9, 2026

Overview

Merge's passthrough feature lets you reach provider-specific endpoints that aren't normally hit by Merge's Unified API. This guide walks through a repeatable architecture for pulling that data and writing it into your application database, structured so that adding a new provider means filling in a well-defined pattern, not reinventing the wheel.

Who this guide is for

This is primarily written for engineers implementing passthrough ingestion, but technical PMs will find the architecture overview and pitfalls sections useful for scoping work and reviewing implementations that use passthrough.

If you're already integrated with Merge's Unified API, you've likely built retries, backoff, pagination handling, and batch writes as part of that implementation. None of that needs to be rebuilt to add-on passthrough. The incremental work is the provider-specific module: finding the right endpoints, generating the request plan, and converting responses into your existing update format.

The core implementation pattern

Every implementation of passthrough, regardless of provider, does four things:

Find: identify the endpoints and field IDs you need for this provider
Plan: generate the list of passthrough requests to make (accounting for any pagination and fan-out)
Convert: parse responses into a clean, internal set of update records
Write: apply those update records to your database in batches

Steps 1–3 are provider-specific. Steps 3–4 (especially the write layer) can and should be shared infrastructure.

Key concepts

Provider: the third-party HR, payroll, or other system you're calling via Merge passthrough.
Request plan: the ordered list of passthrough calls needed to fetch all required data, including pagination. Think of it as a queue of HTTP calls your shared execution layer will process.
Update record: a normalized, replayable unit of work: Employee X → field Y = value Z, observed at time T, from request R. Update records are your handoff from provider-specific code to shared write code.
Field definition: the single source of truth for how a passthrough-derived field is validated, typed, and stored. Defined once, referenced by all providers that produce that field.

Architecture overview

The shared infrastructure on the right is what you already have. Passthrough ingestion only adds the left side.

Provider-specific code               Shared infrastructure
─────────────────────                ──────────────────────────────────────
  Find fields/endpoints       →      Execution layer
  Generate request plan       →        (retries, backoff, rate limiting,
  Convert responses           →         pagination, tracing)
    into update records
                                     Write layer
                              →        (batch DB writes, entity
                                        upserts, idempotency)

Keeping this boundary clean is what makes the system modular. When you add a new provider, you only touch the left side.

Implementing a new provider: step-by-step

Step 1: Define the outcome

Before writing any code, answer:

What field(s) are you ingesting?
What entity are you updating (e.g., Employee)?
What remote identifier will you join on, and is that identifier consistent across all the endpoints you'll hit?

The last question is the most likely source of early bugs. Different endpoints on the same provider sometimes use different employee identifiers (internal ID vs. SSN vs. email). Confirm your join key before you go further.

Step 2: Identify provider surfaces

Which endpoint(s) expose the data you need?
Is there a discovery step required first — for example, listing available custom field IDs before fetching values?
What pagination scheme does this provider use? Common patterns: cursor-based, offset/limit, or a next URL in the response.

If there's a discovery step, treat it as the first entry in your request plan and consider caching the result (see Schema caching below).

Step 3: Implement provider-specific logic

This is the core of what you're writing. It has three sub-parts:

Find: locate the field IDs and endpoint paths for this provider. For providers with dynamic custom fields, this usually means making a discovery call and storing the field schema.

Plan: produce the list of passthrough requests to make. This is where you handle:

Fan-out: one request per field ID, if the provider requires it
Pagination: for each paginated endpoint, your plan or execution layer needs to follow next links until exhausted

Convert: transform the raw responses into update records. Each update record should carry:

The subject's identity (e.g., employee ID in your system)
The field and value
observedAt timestamp
Source metadata (method, path, request/response IDs, sync run ID)

Source metadata is what lets you trace a database value back to the exact API call that produced it — invaluable when debugging data discrepancies.

Step 4: Hook into shared execution and write layers

This step connects your provider-specific code to the execution and write infrastructure you already use for your Merge Unified API implementation — no new infrastructure required. That layer:

Runs each request with standard retries, exponential backoff, and rate-limit handling
Handles pagination (following next links, detecting completion)
Emits trace data for observability

The shared write layer then:

Applies update records in batches
Upserts related entities where needed
Guarantees idempotency so repeated runs don't duplicate data

Step 5: Add fixtures and tests

Do this before iterating further. Save representative passthrough responses as fixture files and write contract tests that assert the expected update records for each fixture. This is much faster than running against a live provider during development, and it protects you from regressions when providers change their response format.

At minimum, add:

A happy-path fixture with multiple employees
A pagination fixture (at least two pages)
A partial-failure fixture if your provider returns per-item errors

Step 6: Operationalize

Log request counts, pagination depth, and error rates per provider
Link log entries to sync run IDs so you can correlate a bad write to the specific API call
If your discovery step is expensive, cache the field schema and version it — see schema caching below

Schema caching

If a provider requires a discovery call to find field IDs (e.g., "list all custom fields, then fetch values for field abc123"), that schema rarely changes. Cache it, store a version/hash, and refresh on a cadence or when a downstream call fails with an unexpected field error. When schema drift occurs, fail with a clear error rather than silently writing null values.

Suggested implementation order

If you're adding passthroughs from scratch:

Field definitions: define validation and storage targets for each field you intend to ingest
First provider implementation: implement one provider's endpoint end-to-end to validate the pattern

The execution and write layers (retries, backoff, pagination, batch DB writes) are infrastructure you've already built for your Merge Unified API integration.

Each subsequent passthrough only requires Step 2.

Quick reference checklist (new provider)

Confirm join key is consistent across all endpoints you'll hit
Document which endpoint(s) hold the data and what pagination scheme they use
Identify whether a discovery step is required; if so, plan for schema caching
Implement Find → Plan → Convert for this provider
Connect to shared execution and write layers
Add happy-path, pagination, and partial-failure fixtures
Write contract tests asserting expected update records for each fixture
Add to the capability flag registry
Verify idempotency by running the sync twice against test data