Throughout this guide, the term 'Files' will be used as a catch-call for all file-related Salesforce objects that GRAX supports. Salesforce itself uses a variety of different, and sometimes confusing names, such as Chatter Files, Content, Attachment etc. For now, we'll stick with 'Files' to represent the overarching category and any specific objects within this category will be called out.
The complexity with Files is that not only does GRAX need to back up the data (meta information such as Name, CreatedDate, SystemModstamp, custom fields etc.) but the actual binary must be downloaded and uploaded to your storage provider.
GRAX currently supports 3 main "File" objects:
Content- this is not an object name, but instead represents 3 specific objects that are typically dealt with together:
The information here applies to GRAX Versions 3.50 and above. If you still are using a version below 3.50, please reach out to GRAX Support to upgrade or understand how behavior may differ in these prior versions.
Backing up Files with GRAX requires you to create a separate job that only includes one or more File-related objects. Use the
Backup Type dropdown selector for this.
Where are the rest of my "Content" objects?
For Content objects, please use the provided
ContentDocumentobject, which will allow you to filter on
ContentDocumentand will automatically back up the latest
ContentVersion(represents the binary) as well as all related
ContentNoteis automatically included as part of your ContentDocument query. These are what Salesforce refers to as 'enhanced notes'. You may or may not have them enabled in your environment.
It is critical to read through the recommendations here before kicking off any File backups, as there are various Salesforce gotchas and considerations you will want to confirm.
File Backup Considerations and Recommendations
GRAX will use 1 REST API call per each File in order to download the binary.
File backups will be slower than data object backups, due to the fact that in addition to the "data" portion of the record, GRAX must individually download and upload each binary to storage.
Given the above considerations on API usage and speed, GRAX recommendations limiting the number of File records you back up per job. Chunk up your backup jobs in ranges by an indexed audit date field such as SystemModstamp. One easy rule of thumb is start by running 1 job at a time that includes a max of 500,000 records across the selected File objects. Of course, depending on your API credit availability, this may differ for each customer, but 500,000 records is the recommended max per job.
Ensure all relevant GRAX users have the
Query All FilesSalesforce permission, otherwise only files related to this user will be backed up and it could result in data integrity issues. Click here for more on GRAX permissions.
When clicking on the Summary Link for a
Files type backup, you will notice some additional columns. The first three columns represent the same information as always. This will show how many records were provided by Salesforce versus how many were inserted/updated into GRAX. With Files, however, you need to remember that there is also an actual file binary that must be downloaded from Salesforce and uploaded to your specified storage provider. The last two columns represent this information. Additionally, you will see information about how many REST API calls were used.
Binaries Processed shows how many file binaries were successfully downloaded and subsequently uploaded to storage. Note that this number may not always exactly match the number that were queried from Salesforce as GRAX will only process binaries if it does not already have the latest version.
Unable to Process represents files that GRAX was not able to successfully process, likely due to either the file having been deleted out of Salesforce before GRAX could download, or some other issue such as a corrupted file or another error in the download/upload process that could not be resolved after GRAX retries.
When you select
ContentDocumentin a backup, as mentioned above, GRAX will automatically query all related ContentDocumentLinks as well as the latest ContentVersion. On the Summary page, you will only see a number for
Binaries Processedin the row for
When you upload files,
ContentVersion is the object that contains the actual binary that GRAX will download and upload to your storage bucket. Salesforce determines file types: CSV, EXCEL, LINK, PDF, PNG, TEXT, WORD, ZIP, UNKNOWN. The
filetype can be a
LINK when you contribute a web link via the Libraries feature. In this case, there is nothing to actually download as it is a 0 byte file and Salesforce does not offer anything to download. It is simply a URL as the file's title. The implications of this:
- You may see errors in the backup summary CSV with the error message
Skipping file with 0 bytes. These web links and any other 0 byte files will indeed be skipped. The field information will be captured, but there is no binary to download so that piece is skipped.
- GRAX does not currently support restoring these 0 byte files.
Much of the same information above applies whenever you include any File objects within a GRAX Archive. After all, GRAX does still need to back up all the objects first, and then can start deleting the data.
However, with Archives, given that there is a single root/parent object that must be selected, as well as children within the hierarchy where File objects could appear many times, let's take a look at some of the different scenarios.
If you select a standard "non-file" root/parent object such as
Account, there are now 2 key objects that show up in the hierarchy:
ContentDocumentLink. You won't see
ContentVersion in the hierarchy, because
ContentDocumentLink in the hierarchy view serves as a proxy for the relevant 'content' objects.
Note on ContentDocumentLinks
Once GRAX knows all the ContentDocuments that are related to the Parent object, GRAX will additionally back up all related ContentDocumentLinks. Even though some of these ContentDocumentLinks may not related to the parent object selected, GRAX will back them up because when the ContentDocument is deleted Salesforce will automatically delete all related ContentDocumentLinks.
Note: this may not be true in all GRAX versions, so if this is an important use case for you please confirm with GRAX Support.
Another option you have is to select
ContentDocument as the root element if you are only interested in archiving Content and children, rather than an object such as Account along with children (that includes many objects).
GRAX recommends always using ContentDocument as the root element, even if your intention is to only archive ContentVersion, due to the complex relationship between these objects. GRAX will only backup the most recent ContentVersion, so be aware that you can lose previous versions.
If you select
Attachment as the root object in a hierarchy process, you will notice that the hierarchy view is a bit different. The
Attachment object doesn't have any children as you probably know, but GRAX will expose a list of objects to allow for more effective querying. So you could query all attachments that are linked to any of the objects that you check off. This can be a flexible tool, and would allow you to run an archive saying something like "give me all attachments modified in the past month that are linked to Accounts or Cases".
Let's take a look at some different ways to restore Attachments and Content. Content, behind the scenes, represent a much more complex set of interconnected objects so you will need to ensure this is done by an Admin who understand the relationships and business use case.
Attachments can be restored similar to other objects, and is simpler given that each Attachment can only relate to a single parent record.
Okay, so attachments sound straightforward. Let’s try the same thing but for the files object. In the developer console, search for the Files or Content object. You’ll notice that it doesn’t exist. The files/content object is a virtualized object - it is created in the user interface on demand when the user needs it. But the data must exist somewhere - let’s look at the data structure that the files object references. There are 3 core objects:
- ContentDocument - This is the core object that mimics the attachment object but it only stores the metadata about the document object
- ContentDocumentLink - This object stores the link between the Content Document object itself and the Salesforce Record it is attached to
- ContentVersion - This object stores the Base64 Content. Files can be updated, so you can store multiple versions of a document and retrieve the correct version while maintaining the history. This is very useful for documents you share with partners or customers such as Customer Support Resources or Marketing Assets. By default GRAX will back up the latest version.
After inspecting the objects above we can infer the following data model:
Given the complex relationship structure, the easiest way to restore a "file" is to search for the parent record and ensure you are restoring children along with it. So I could search for a specific Account and restore children, ensuring that everything just works and the ContentVersion, ContentDocumentLink, and ContentDocument are created.
Salesforce automatically creates the ContentDocument
You cannot manually create a ContentDocument via API, Salesforce will create it automatically. Thus, in certain scenarios, you may notice an error in the restore logs for ContentDocument object. As long as the ContentVersion and ContentDocumentLink succeeded and everything is linked to the parent record, you can rest assured the file is back in Salesforce.
You can search for a particular ContentDocument or ContentVersion as well.
If you restore using the option to restore children, then the general sequence of events for the restore will be:
- Restore ContentVersion
- Salesforce auto-creates the ContentDocument
- Create relevant ContentDocumentLinks (could point to users or other objects)
- If one of the ContentDocumentLinks pointed to the File along with a Case, for example, attempt to insert the Case, and then we can create the ContentDocumentLink that links the Case with the ContentDocument.
Fields on ContentVersion
The only field GRAX actively restores on ContentVersion is
Title. Most other fields are auto-generated by Salesforce such as
FileTypeetc. This means the
Descriptionfield and any custom fields on ContentVersion are not restored.
If you know a specific ContentDocumentLink you can individually restore this as well via the Search tab, but note this is not directly related to the ContentVersion, so we don't recommend this unless you already have the ContentVersion restored.
We recommend using the GRAX Lightning Connect if you want to visualize ContentVersions that are related to a particular parent record. Even though this relationship is indirect, GRAX does the heavy lifting and will still allow you to view related "files" through the lightning component. When you restore, the algorithm will trace back through the relationships and create the ContentVersion and the ContentDocumentLink for the specific record you are viewing, so that the effect is that you have restored a file and related it to the current record.
- Click on
GRAXtab, followed by the
- Select either the
- Optionally enter filter criteria and then click
- You will notice a file preview icon in the action bar that will allow you to preview the record.
Content Type Warning
If your Attachment does not have a Content Type set, the GRAX preview functionality within the browser will not work (you would just need to download the attachment to view). This Salesforce help article will explain why Content Type may not get set properly by the browser.
You can add the
Content Typefield to your GRAX view to double check.
Only the Following File Types Can be Previewed
You will need to download other file types to view.
Updated 8 days ago