File Storage Expectations

An outline of our File Storage expectations per integration

Amy Leung avatar
Written by Amy Leung
Updated over a week ago

We will keep this guide up to date weekly as we continue building towards the GA release of File Storage.

Expectations and Nuances

There are certain nuances with File Storage integrations that we would like to point out for transparency and clarification. We are actively working on abstracting these nuances behind our unified interface.

Authenticating Permissions

  • Google Drive

    • All users should be able to link.

  • Box

    • All users should be able to link, but only Box admins will have access to Group information.

  • Dropbox

    • All users should be able to link, including Dropbox personal & Dropbox business accounts.

  • Sharepoint

    • Non-admins may require admin consent in order to utilize full capabilities of the integration, such as write permissions. The linking flow provides 3 ways to authenticate. Here is what your customers see when they go through the linking flow for SharePoint:

      • Read-only permissions for non-admin users allows the integration to read all Files, Folders, and Drives that only you have access to. It does not allow the integration to access Groups, upload Files, and create Folders.

        • Note your administrator may still restrict you from consenting. Please contact your SharePoint administrator according to the guide here.

      • Read-only permissions allow the integration to read all the Files, Folders, and Drives in your Sites. It does not allow the integration to upload Files and create Folders.

      • Read & Write permissions allow the integration to read all the Files, Folders, and Drives in your Sites, as well as upload Files and create Folders.

    • If admin consent is necessary, please read the guide here and follow instructions here to ask your SharePoint administrator to grant user consent. Alternatively, you can ask your administrator to link their SharePoint account.

  • OneDrive

    • All users except personal OneDrive accounts should be able to link.

Downloading Files

  • Google Drive

    • Google Drive does not allow downloading Google Forms

    • Google Drive has a 10MB on export size. In our experience, this estimates to a ~200kb Google Spreadsheet file or ~15MB Google Presentation file.

    • All exportable files (Google Sheets, Google Presentation, Google Docs, Google Drawings) will be exported as PDF. Note that this is the file mime type with the smallest export size out of the other file types. See Google Drive’s documentation here on export types.

      • We’re actively investigating a way for customers to define what mime type to download the file as.

    • All other files should be downloaded as their original mime type format.

  • Box

    • N/A — Files will be downloaded as their original mime type format. There is no way to define the mime type to download a Box file as. See Box’s documentation here.

    • "Read and Write" authentication in Merge Link is required to be able to download files from Box.

  • Dropbox

    • Dropbox returns most files as binary data. You may need to convert the binary data back to the File’s mime type via binary conversion.

      • We’re investigating a way to always return the data as the File’s mime type, instead of simply passing through Dropbox’s API response.

      • We’re actively investigating a way for customers to define what mime type to download the file as.

  • Sharepoint

    • N/A — Files will be downloaded as their original mime type format.

      • We’re actively investigating a way for customers to define what mime type to download the file as.

  • OneDrive

    • N/A — Files will be downloaded as their original mime type format

      • We’re actively investigating a way for customers to define what mime type to download the file as.

Sync Times

  • Periodic syncs — Accounts can take hours if not days to fully sync. We’ve seen it range from a couple minutes to 2-4 days.

  • Capturing real-time updates via webhook receivers — For most integrations, any creations, edits, or deletions to a file or folder will trigger updates in Merge. Some exceptions are:

    • Box

      • Not all updates will be in near real-time. Changes in the root directory (aka “All Files” folder) will be synced every hour, while other changes will be synced in near real-time if webhook receivers are enabled.

      • This is because Box does not allow developers to create webhooks in the root “All Files” folder, so we cannot capture real-time changes if someone adds/changes/deletes a new folder/file in the root “All Files” folder.

    • Dropbox

      • Updates will not be in near real-time. We do not yet have webhook receiver support here yet, so changes will be captured during periodic syncs.

      • Deleted files/folders will be detected via deletion detection feature which runs every other day.

    • OneDrive/SharePoint/Google Drive

      • Most updates should be in near real-time. Otherwise, they will be captured by periodic syncs. This spreadsheet here shows sync times for edge cases.

      • Note that these 3rd-party providers typically batch their updates every ~1-2 minutes, so they may not emit a webhook receiver payload instantaneously after the user makes a change.

  • In Q1 2024, to speed up syncs, we’re actively working on

    • Introducing selective sync, so that time is not wasted syncing irrelevant data.

    • Introducing avoiding making 3rd-party API calls to retrieve a File’s Permissions if scope is turned off, to hopefully speed up syncs by 2-3x.

Nuances in missing data

Our integrations aim to sync all File Storage data. There are some caveats due to 3rd-party limitations.

  • Google Drive

    • What we sync:

      • Our Google Drive integration should sync all file & folder data that the linked user has access to. This includes folders & files in “My Drive” and “Shared Drives”.

    • Data that may be missing:

      • Group users — This requires a different set of permissions that only Google Workspace admins can access. We’re actively working on this in Q1 2024.

      • File permissions that are shared with the Group that the linked user belongs to, but with view permissions only. — Google Drive API does this to respect permissions. For such files, we still surface the linked end user’s Permission with role of “READ”.

      • File file_thumbnail_urlWe’ve backlogged this, please reach out if this is important!

  • Dropbox

    • What we sync:

      • Our Dropbox integration should sync all file & folder data that the linked user has access to, regardless of whether it’s a Dropbox Personal or Dropbox Business account.

      • Note that Dropbox does not have the concept of Drives, just Files & Folders.

    • Data that may be missing:

      • Group users — This requires a different set of permissions that only admins have access to. We’re actively working on this in Q1 2024.

      • File file_thumbnail_urlWe’ve backlogged this, please reach out if this is important!

  • Box

    • What we sync:

      • Our Box integration should sync all file & folder data that the linked user has access to.

      • Note that Box does not have the concept of Drives, just Files & Folders.

      • The integration does not sync collections. These are simply groups of files that already exist under “All Files”.

    • Data that may be missing:

      • Group users — You may see this populated for some linked accounts but not at all. This is because only Box admins are allowed access to retrieve this info.

      • File file_thumbnail_urlWe’ve backlogged this, please reach out if this is important!

  • SharePoint

    • What we sync:

      • Our SharePoint integration should sync all file & folder data that the linked user has access to. Specifically, we sync all document library data in each site that the user has access to.

      • Note that we normalize document libraries into Drives, since that is how SharePoint denotes their drives. The name of the Drive is an appended string of the site name and the document library name.

      • Our SharePoint integration does NOT sync pages, because this seems to be unrelated to File Storage.

    • Data that may be missing:

      • Group users — You may see this populated from some Groups, but not all. This is because …

        • SharePoint has 2 types of groups:

          • SharePoint siteGroups (ie.: "My Communicate Site Owners")

          • Groups associated with the site's email (ie.: "[email protected]").

        • SharePoint’s Graph API only returns member info on the latter type of groups, not SharePoint siteGroups.

        • Fortunately, during QA testing, we found that the latter type of groups represent all the users who have access to the site, regardless of whether they're a "Visitor", "Member", "Owner".

      • File file_thumbnail_urlWe’ve backlogged this, please reach out if this is important!

  • OneDrive

    • What we sync:

      • Our OneDrive integration should sync all file & folder data that the linked user has access to. Specifically, we sync data in the “My Files”, “Home”, and “Shared” tabs.

      • Note that the Drive named “OneDrive” is the same as the “My Files” tab. This name is mapped from the name that the OneDrive API returns to us.

      • Files in the “Home” tab are files that the user recently accessed. These may exist outside of OneDrive completely, but we still sync these via OneDrive’s /recent endpoint here.

      • The integration does not sync files under the “Quick Access” tab, since OneDrive API does not have an endpoint that returns those items. End users may ask why SharePoint files under the “Quick Access” tab are not synced. We recommend linking their SharePoint account via the SharePoint authentication flow to sync such files.

    • Data that may be missing:

      • File file_thumbnail_urlWe’ve backlogged this, please reach out if this is important!

Did this answer your question?