Skip to main content
All CollectionsBest Practice GuidesFile Storage
Guide to Merge File Storage Integrations

Guide to Merge File Storage Integrations

What to know before you get started with Merge's File Storage integrations

Updated this week

Overview

Merge lets you connect to any supported File Storage system through a single, unified API. While we work to standardize API-specific differences, some limitations and nuances remain—these are outlined below.

OneDrive has specific nuances that are documented in depth here, we recommend reading this before onboarding customers onto OneDrive.

Authentication

See our guide to understand why Merge OAuth apps require specific scopes.

Google Drive

For Group User relationships, admin credentials are required.

Sharepoint

For Writes access, non-admins may need admin consent.

Follow this guide to request consent from your SharePoint admin or have them to link using their admin credentials.

OneDrive

Personal accounts cannot be linked. OneDrive accounts must be tied to a school or work organization.

Dropbox

Requires you to host your own OAuth app. Merge provides a demo OAuth app for testing purposes only.

Box

For Group access, admin credentials are required.

Merge offers three categories of authentication, support varies based on integration:

  1. Super admin

  2. Admin

  3. Individual

Merge recommends admin or super admin authentication for File Storage integrations: With admin authentication, Merge pulls metadata for all files that the admin has access to. For select integrations, Merge also supports super admin access, which includes access to private files created by other users.

  • Using up less third-party rate limits: Merge automatically handles third-party rate limits and backs off when a Linked Account is nearing these rate limits. Connecting one admin-level Linked Account is a more efficient use of these rate limits

  • Faster syncs: Backing off when nearing third-party rate limits slows down syncs. Multiple individual-user Linked Accounts will unnecessarily slow down how quickly you can sync all the data you need from a particular end-user organization

  • Access Control Lists (ACLs): Merge’s permissions models allow you to track who should have access to each file and build accurate ACLs. These are especially valuable when admin authentication is used as they provide precise data on permissions across the end user’s organization. Permissions models will still be mapped for Linked Accounts using individual authentication. However, they will not include permissions data for files that individual does not have access to.

While Merge supports on-demand downloads, third-party APIs need to be polled to pull the metadata that will support ACLs and file downloads. In a scenario where your end user opts to do individual instead of admin authentication, they run the risk of delayed syncs. In that scenario, some drives, files, and folders will be retrieved from third-party APIs across multiple linked accounts. The repeated API calls for the same files will use up that third-party instances rate limits at a higher rate.

Access control lists

An Access Control List is a security feature used in systems (e.g., networking, file systems, databases) to define rules for what users or systems can access specific resources and what actions they can perform on them.

See our guide to learn about File Storage ACL best practices.

Downloading files

To learn how to download files see the Direct File Download guide.

Merge includes a checksum with each file object to help verify file integrity and authenticity. This is provided as a string that includes the hash type (as supported by the third-party API) and the content_hash. In some cases—such as when a file is moved from a personal to a business account—the hash type may change. When this happens, Merge automatically updates the check_sum to reflect the new hash type and value.

check_sum: {
type: sha256
content_hash: 149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
}

Google Drive

To specify the MIME type for file download leverage the mime_types query parameter.

mime_types (String, optional)A comma-separated list of preferred MIME types in priority order. If provided and supported by the third-party, the file(s) will be returned in the first supported MIME type from the list. The default MIME type is PDF. For information on supported export formats, please refer to our export format help center article.

Sharepoint, OneDrive

To specify the MIME type for file download leverage the mime_types query parameter.

mime_types (String, optional)A comma-separated list of preferred MIME types in priority order. If provided and supported by the third-party, the file(s) will be returned in the first supported MIME type from the list. The default MIME type is PDF. For information on supported export formats, please refer to our export format help center article.

Note that Merge only supports the mime_type parameter on our direct download endpoints for Sharepoint and OneDrive. The proxy download endpoint does not support this parameter.

Dropbox

Files are downloaded in their original MIME type. Merge does not currently support the ability to specify a different MIME type for downloads.

Box

To download files users must select “Read and write for admins” in Merge Link.

Files are downloaded in their original MIME type. It is not possible to specify a different MIME type for downloads. See Box’s documentation.

Optimized data transfer

Selective Sync

Selective Sync lets your users select which folders and drives to grant access to. It also speeds up initial syncs by ensuring only relevant data is synced. This can be surfaced to your users as part of the Merge linking flow and be edited at any point.

See our guide to learn more about File Picker Selective Sync.

Syncing data

Merge uses two mechanisms to efficiently sync data from third-party APIs:

  1. Polling third-party APIs on a cadence to capture recurring updates

  2. Receiving near real-time updates via third-party webhooks

These are optimized to sync data as efficiently as possible within the constraints of the third-party system involved. The highest sync frequencies provided by Merge are optimized based on third-party rate limits and performance at scale.

File storage systems are highly dynamic, with constant changes to both content and permissions. Users frequently add, move, edit and delete files, while permissions evolve as employees join organizations and content gets reorganized across folders and drives.

For applications that require near-real time delivery of accurate information, having access to the latest data is critical. This means keeping up with newly created files, excluding recently deleted files, or updating who should no longer have access to a file.

For AI and other near real-time use cases, Merge recommends using the highest sync frequency. For some models and integrations, recurring syncs run every few minutes to poll the latest data, ensuring that your system keeps up with the frequency of updates made by your end users. Consistently polling the third-party API provides the most comprehensive way of capturing all relevant changes. Third-party webhooks, while complementary, may have coverage gaps or be affected by third-party system outages, leading to missed updates.

For most integrations, any creations, edits, or deletions to a file or folder will trigger updates in Merge. Some exceptions are:

Google Drive, Sharepoint, OneDrive

  • Most updates occur in near real-time

  • Otherwise, they are captured during periodic syncs

  • Providers batch updates every ~1-2 minutes, so webhooks may not be sent immediately

Dropbox

  • Webhook receiver support is not yet available

  • Changes are captured during periodic syncs

  • Deleted files/folders are detected via a deletion detection process, which runs every other day

Box

  • Root directory (“All Files” folder): Synced every hour (not in real-time).

  • All other folders: Synced in near real-time if webhook receivers are enabled.

  • Reason: Box does not allow webhooks in the root “All Files” level, so real-time updates for root-level changes are not possible. Webhooks in Box are scoped to specific folders or files.

Rate limits

Merge accounts for third-party rate limits when syncing data. These are different from the standardized Merge rate limits (number of API calls per minute that you can request to Merge's API).

Google Drive

Queries: Up to 12,000 queries per 60 seconds per user

Sharepoint

Daily Limits: Vary between 1.2 million to 6 million requests, depending on the number of licenses in the plan.

OneDrive

General Limits: Subject to Microsoft Graph’s global limits of 130,000 requests per 10 seconds across all tenants. -

Dropbox

Specific rate limits: Not publicly disclosed. Developers are advised to handle rate limiting by monitoring API responses and implementing appropriate backoff strategies.

Box

Monthly Limits: Range from 50,000 to 200,000 API calls, depending on the plan.

Supported and unsupported data nuances

Our integrations aim to sync all file storage data, although some limitations exist due to third-party restrictions.

Integration

What Merge syncs

Data that may be missing

Google Drive

All files and folders the linked users has access to including My Drive and Shared Drives.

Group users – are only accessible to Google Workspace admins.

Files shared with a Group Google Drive API prevents access beyond the user’s own role, so we surface the “READ” permission.

Box

All files and folders the linked users has access to.

Box does not use Drives, only Files Folders.

Collections are not synced since they are just groupings of existing files under “All Files.”

Groups – Returned only if the linked account has admin-level permissions.

Dropbox

All file folders the linked user has access to whether on a Personal or Business account.

Dropbox does not use Drives, only Files Folders.

Group users – Returned only of the linked account has admin-level permissions.

Sharepoint

All file folders the linked user has access to within document libraries of each accessible site.

Document libraries are normalized as Drives, named based on the site name + document library name.

Pages are not synced as they are unrelated to file storage.

Group users – Only email-associated groups return member data.

Merge does not fetch site pages. These are web pages. We already fetch all of the files, folders, drives surfaced on the site page.

Merge does not fetch users from SharePoint site groups. We support Azure AD groups and Microsoft 365 Groups, which are the modern standard for managing SharePoint access. Microsoft is phasing out site groups as they are part of a legacy permission model rarely used in current deployments.​​Owners/Members/Visitors Groups are legacy types for Sharepoint and may appear in Merge's datasets without a remote_id for newer Sharepoint accounts. Older Sharepoint accounts may still utilize them.

OneDrive

All files and folders the linked user has access to in My Files, Home, and Shared tabs.

The OneDrive Drive is the same as the “My Files” tab.

Files in the Home tab (recently accessed, even outside of OneDrive) are synced via OneDrive’s /recent endpoint.

Quick Access files – OneDrive API does not provide an endpoint for this.

If SharePoint files appear in Quick Access but are missing, users should link their SharePoint account separately.

Shared files do not support webhooks or checksums

Did this answer your question?