Data structure
Files and Permissions
Google Drive organizes files into Drives, which contain folders and files. Files and folders are accessible to users and groups through permissions (sometimes called Access Control Lists or ACLs).
Files belonging to shared drives are expected to have the "drive_id" populated with the drive that originated the file. For files that are shared to the authenticating user, this will not be the case. The drive_id is expected to be null as the user does not have access to the drive that originated the share file.
Checksums
The file checksum object is used to store a Google Drive generated checksum value, providing a way to confirm if file contents have changed. This field is mapped from the Google API, not generated by Merge.
This field is only expected to be populated for non-native Google Drive files. Those files include Excel sheets, Word docs, PDFs, and other file types imported to Google Drive. Google native files (ie Docs, Sheets, Slides etc.) will not have this field populated. This is because Google does not generate checksum values for those file types.
Instead, Google uses their revision model to store a running history of the file. In effect, the "id" of the most recent revision object can serve as a stand in for a generated checksum value. Merge does not currently integrate with this endpoint. If capturing the revision model is important to your use case, we recommend using a passthrough request to fetch this data.
Users
Users are individuals with access to the Google Drive instance. They can be associated through Groups, and granted access to files/folders/drives at the group level.
Users can also be associated to a domain (e.g. "@merge.dev"). Google Drive supports whitelisting of external domains, meaning it is possible for an instance to allow access at the domain level to multiple entities. For example, the "Alphabet" domain may choose to whitelist the "@youtube.com" domain in their Google Drive instance.
Ingestion
Sync cadence
Merge polls the Google Drive API at regular intervals, defined in this table, for updates to all common models. The sync interval is limited by rate limits Google enforces on their APIs. Merge will always use timestamp filtering to ensure polling is as efficient as possible.
If you believe the Highest sync frequency listed in the above table is insufficient for your use case, please request a call with our Sales Team to discuss a custom sync frequency.
Webhooks
In addition to regular polling, Merge leverages webhooks to capture common model updates from Google Drive. These webhooks, where supported, allow for real-time updates to files, folders, drives, and permissions.
For most use cases, webhooks will not be broad enough or reliable enough to guarantee data accuracy. Polling should always set the minimum data freshness benchmark for your application.
Mime Types
Google supports specifying a mime type parameter when downloading file contents. Merge allows passing this parameter on both our /download and /direct-download endpoints. For support mime_type values, see this guide.
Authentication
Supported authentication types
Merge supports 4 authentication types for Google drive. For a description of the scopes requested for each type, refer to this guide.
Admin read & write
Merge will launch a Google OAuth flow, which will prompt your customer for their username and password.
The authenticating user must be a Google Drive administrator to successfully link using this method.
Merge will have access to all shared drives and folders.
This authentication method is only recommended if your application needs to POST files back to Google Drive.
Admin read only
Merge will launch a Google OAuth flow, which will prompt your customer for their username and password.
The authenticating user must be a Google Drive administrator to successfully link using this method.
This is the most common authentication method used, as it grants Merge a single set of credentials to access all shared drives and folders.
Non-admin read only
Merge will launch a Google OAuth flow, which will prompt your customer for their username and password.
The authenticating user can be anyone with valid Google drive log in credentials.
Merge will only have access to the authenticated users personal drive, as well as shared drives and files explicitly shared with them.
Super admin
Your customer will create a service account, with a role of Super Admin, and generate a client id and client secret to provision access to Merge.
This method grants access to all shared and personal drives.
This is only recommended when your application requires access to all files in the Google Drive instance, including private files.
Authentication errors
When authenticating, Merge will validate that the provided credentials have the requested access level. In the section below, we go over the possible failure modes.
403 on GET /files: the provided credentials are not valid. The user is blocked from connecting
403 on GET /users, GET /groups, or both: expected for a non-admin connection. If the requested access type is admin read only, admin read + write, or super admin, connection will be blocked.