Archive Best Practices

Let's take a look at recommendations, best practices, and necessary preparation that will be required before kicking off a GRAX archive.

Introduction

GRAX archive processes involve just a few clicks to set up, but your Salesforce Admin and team will need to prepare the environment and ensure all considerations are taken into account. Unlike backing up data, archives involve modifying data in your Salesforce org (deletes) which can have unintended consequences if the necessary preparation and validation is not done. GRAX leverages standard Salesforce deletions, but depending on the way your schema is constructed, there are several rules and cascading impacts that can affect your data. This article will give you a good idea of what to expect and best practices we have seen with customers that are successful with archiving across large amounts of data and with complex logic.

Permissions

The most important piece to understand before starting an archive is the permissions you have assigned to the relevant users. This is crucial to ensure that all records are backed up and you don't inadvertently cause some records to get cascade deleted. Please read through the GRAX permission guide and ensure the Integration User and Apex User have the proper permissions. Query All Files is a common one that we see customers forget as it does not come standard with a System Admin profile.

📘

The recommendation is that you ensure the apex user and integration user are the same. This will help to prevent permission mismatches that could result in data loss.

Full-Org Backup

GRAX recommends you first take a backup snapshot of all your data, at the very least all relevant objects involved in the archive. By setting up a full-org, incremental backup first, you can be sure that all relevant data is backed up. In the event that you create an archive job with incorrect settings or inadvertent hierarchy selections that cause cascade deletes, you will still have data from the latest backup jobs.

Soft vs. Hard Delete

GRAX leverages an effective "hard" delete. Records are deleted and put into the Salesforce recycle bin and then immediately cleared out from the recycle bin, to ensure there is enough space. It is important to be aware of this as deletes are essentially 'final'.

Record Volume

GRAX recommends that the total number of records in your archive process maxes out at 2 million records for optimal performance and to make sure you don't have long-running (>24 hours) jobs that hog Salesforce resources such as API credits and apex credits/queue positions. Given the complex, hierarchical nature of queries and number of different objects involved, you will not know how many records will be involved in the archive ahead of time. Thus, there are some suggestions here that can help achieve a successful archive:

  • If there are child objects selected in the hierarchy, you will want to limit the parent object to 100-200K total records, as this will assume there could be 10-20x that number in child records across various objects, which should bring the archive size to ~2m total records.
  • You can find out the exact number of parent records that will be picked up based on your date range filters. Use the Salesforce developer console or similar tool to issue a SOQL query and get that number.
  • If you are not checking off any child objects (because the root element selected has no children, or because you decided to override the hierarchy and don't need to back up the children as you may be okay with them being deleted) then feel free to have up to 2m records for the parent object selected.

📘

Estimate Button

Starting with GRAX 3.71.2, you will see an Estimate button in the archive interface, which will estimate the number of parent-level records, based on the date filter, report, or SOQL query you entered. This can give you a better idea how many top-level records the archive will pick up, though the total number may be much higher depending on how many children were selected.

Dry Run

If you want a more exact summary of all the records that will be picked up in an archive, without actually deleting any records, you can perform a "dry run" archive job. If you disable the Delete Data toggle, when the jobs runs GRAX will perform all steps of the archive (query+ backup) except the actual delete. You will be able to see a summary, and also search on all records. This will give you a much better idea of whether you are following best practices regarding record volumes etc. For potentially larger one-time archives, we recommend doing the dry run.

You can see here that the "Delete Data" toggle is disabled, and thus this will be a "dry run" archive.

No Parallel Jobs

We recommend always executing only 1 GRAX job (backup or archive) at a time to prevent data integrity issues. For archives specifically, we strongly recommend you stagger and space out any archive jobs so only 1 is running at a time. For example, if an Account archive hierarchy process will delete related contacts, do not run another archive process that involves contacts, as this could lead to Salesforce record lock issues.

Records Deleted During an Archive

Once an archive process is kicked off, it can take time to first query all the records in the hierarchy, then back them up, and finally then begin the delete. If a user manually deletes records that are involved in the archive during this time period, GRAX may not be able to back them up or may fail when trying to delete. For example, let's say you start an archive process that finds 1m Cases. When starting the back up of these 1m Cases, GRAX only is able to back up 999,999 because 1 was manually deleted by a user. If this happens, GRAX will throw an error when trying to delete this record because it was not backed up. We recommend ensuring when possible that users are not interacting with records that are part of an ongoing archive.

Archive Object Restrictions

Certain objects cannot be archived (deleted), for example CaseStatus object does not support the delete() call. Please review Salesforce objects list to ensure you are archiving objects that can be deleted. If an object is not supported, your GRAX archive process may be blocked.

Validation

Once you have an idea of which objects you will want to archive and what the rules will be for various archive processes, always conduct initial validation in a dev environment, and then follow that up by validating fully in a staging/full sandbox. Expect to run into a few of the 'gotchas' outlined in this article. As a Salesforce Admin, you would never blindly mass delete records across various objects without understanding the implications to your org, and the same is true with GRAX. While GRAX facilitates the archive process and allows you to create the necessary logic, each org is different and you must rely on your Admin and/or SI to conduct the proper validation.

Custom Code

By far the biggest area to consider are Salesforce custom apex triggers and other custom code you have specific to your environment. Document and list out any potential apex triggers you have that could fire on delete events and either prevent the delete during an archive job, or lead to other unintended DML operations.

Deleting in Salesforce is an intensive operation that can fire off chains of non-GRAX code and cause things such as apex CPU time limits. This means that it is not always a given the delete will succeed. GRAX has an option to set the delete batch size which can come in handy if the default delete batch size (500) leads to CPU timeouts due to non-GRAX code.

🚧

Be sure to research all triggers and other custom code, flows, processes etc that could fire upon the deletes that will occur at the end of a GRAX archive job. We often see customers putting exceptions into the triggers that will skip these when deletes are done by a designated GRAX user.

Lookups and Master-Detail Relationships

Without diving into the intricacies of lookup vs master-detail relationships in Salesforce, suffice it to say that there are serious implications when deleting records that are linked to each other via these Salesforce relationship structures. In the simplest use case, you would be archiving a custom object that has zero relationships to anything else and can be sure the deletion of these records would not cascade or roll down/up to any other objects.

🚧

Ensure you understand all Salesforce considerations for relationships especially as it relates to deletions

Master-Detail

Be very careful when choosing your hierarchy objects for an archive. The system will force you to select all master-detail relationships by default, in order to prevent an accidental delete of children (as any Admin will tell you, deleting a parent in a master-detail relationship will automatically delete the children).

Lookups

Even if you stick with the system-defaulted selected of all details objects, you need to also understand implications around choosing children related to the parent via lookup relationships. Salesforce allows you to choose at the lookup field level how you want to handle deletes to the related parent. For example, let's say you have a custom object InvoiceHeader and a child object InvoiceLine that relates via lookup relationship. If you conduct an archive job on the InvoiceHeader and fail to also include the InvoiceLine, the job will delete the parent and leave the InvoiceLine records with a blank value in that lookup field pointing to InvoiceHeader. Again, it will depend how each lookup field is set up.

🚧

We recommend if you are archiving data to always have a unique external ID on the objects aside from the lookup relationship fields. In the above scenario, once you conduct the archive you would have 'orphan' InvoiceLines in Salesforce without linkages to the InvoiceHeader. If you later archive these InvoiceLines they would exist in GRAX without a link to the InvoiceHeader, so having a link via some other external identifier field is always a best practice as well.

Some customers enable cascade deletes on lookup relationships (similar to how Salesforce treats master-detail out of the box), so this is another consideration. Your Salesforce Admin team or SI will want to inventory out all considerations unique to your organization.

Speed vs. Number of Relationships

GRAX supports archiving a parent and all children below (up to 3 levels), but given the architecture is operating atop Salesforce and the query limits imposed, the speed of an archive job will drastically be reduced by including multiple levels down the tree and multiple children. You will want to balance this with the necessity to archive certain objects (and implications if you don't archive them as discussed in this article).

GRAX does provide the ability to override the default hierarchy objects in an archive to improve speed. For example, if your team has done the proper research on all considerations, you may know many of the objects in the hierarchy tree are not actually being used or can be ignored.

Maximum Hierarchy Levels

GRAX supports hierarchy processes up to 3 levels deep (parent, child, and grandchild). If you have objects that fall outside this hierarchy (e.g. great grandchildren objects and lower) you will need to consider this for archives, as it could lead to data loss without the proper setup and understanding. For each object you select in the hierarchy, especially for the objects at the lowest 3rd level, think about whether there are other child objects that are not visible and whether you need to first archive those. For example, if you had Account selected as the root/parent element you could imagine this hierarchy:

Account --> Case --> EmailMessage --> Attachment

You may only be able to check off Case (Level 2) and EmailMessage (Level 3). However, you know there are attachments beneath the EmailMessage and so you would not run this type of archive and risk deletion of the EmailMessage which could cause cascade deletes of the related Attachments. Instead you could do a Case-based or EmailMessage-based archive.

🚧

We recommend creating separate jobs if you have objects further down the chain in this scenario. Be sure to reference your org's full schema to ensure you are not missing any objects that you want covered.

Delete Sequence

GRAX will delete records from the 'bottom up', meaning that it will first delete all records that are on the lowest level (i.e. grandchildren in the scenario where there are 3 total levels), and then delete all records on Level 2, and finally try to delete the Root records (main parent object selected in the hierarchy process).

If there are any errors deleting records from Level 3, for example, this could trickle up to Level 2 and Level 1. For instance, let's say we try to delete InvoiceLine records, but get errors due to a trigger preventing deletes --> now when trying to delete InvoiceHeader (parent to InvoiceLine) this could fail if there were a rule preventing deletes of InvoiceHeaders that still had related InvoiceLines.

GRAX will attempt to delete 3 times, so even if records from Level 3 error out the first time, they may succeed in a subsequent attempt depending on the reason they initially failed. There is no specific logical order that the objects from the same level are chosen for deletion so if there are rules/relationships governing the deletion of some of these objects in a certain sequence, they could fail in the first attempt and succeed in a future attempt.

Record Locking

We strongly recommend you stagger and space out any archive jobs that involve the same objects. For example, if an Account archive hierarchy process will delete related contacts, do not run another archive process that involves contacts, as this could lead to Salesforce record lock issues. If you do want to run multiple archive jobs in parallel, be sure they involve objects that are not related to each other.

🚧

Be sure you understand the potential and risk levels for record lock. However, GRAX will re-attempt deletes up to 3 times if failures occur.

Competing DML Operations

When running an initial archive job (or other very large archive jobs) we recommend understanding the other DML operations that may be scheduled in your environment. Given Salesforce's multitenant structure, you want to be sure there isn't another event such as a manual mass delete or clean operation that could impact the same objects being archived.

Salesforce Out of Box Delete Blockers

Salesforce has some built-in failsafes and rules regarding deletions of certain parent-child relationships. For example, you cannot delete a Contact that has related Cases. GRAX will sometimes run into these Salesforce rules, but will retry the delete. Given that GRAX deletes from the 'ground up' this should be rather rare, but can occur depending on your schema.

Third-Party Packages

❗️

WARNING

GRAX does not recommend archiving (deleting) data for third party package customizations/objects such as FinancialForce, Veeva etc. unless fully validated by customer.

Often customers will have third-party managed packages installed which come with a host of custom objects, fields, apex triggers and other configurations that cannot be modified. The CloudCraze e-commerce application is one example. Veeva CRM is another example as this is essentially a customized Salesforce environment similar to a managed package. While GRAX can archive data in customized orgs, it is critical that the SME of the application for the customer understands implications and potential blockers for deletes. GRAX can only attempt to delete the data, it cannot override Salesforce or package-imposed restrictions such as triggers. For example, Veeva CRM customization can have several built-in failsafes to prevent deletion of data to ensure the application functionality works as intended. On top of this, some customers further customize with their own triggers, so validation in a sandbox for the specific use case is imperative to understand which objects you would like to archive and the implications of doing so.

Other Settings

There are a host of other considerations, but this is intended to help you understand the necessary preparation around archiving data.

For example, if you have the "Contacts to Multiple Accounts" feature enabled, there could be other options as to how you want to handle deletes. You will always want to revisit settings like this to ensure a smooth scalable deployment.

Email Message Archiving Consideration

For orgs that have set up enhanced email, they might find the need to perform archives against the emailmessage object. There will be considerations when attempting to archive these records. If these records are considered "orphaned", meaning the ActivityId=NULL and the ParentId=NULL, only the record owner can delete them, therefore, the standard archive process will not allow these records to be purged form Salesforce However, they may get deleted in a cascade operation.

If you experience this challenge, the best practice would be to perform a backup and then request for Salesforce to perform a mass purge against those records
For specific details please reference Considerations when deleting emailmessage records

Private Draft Email Messages

You may have enabled email drafts in Salesforce Classic or Lightning, and this can cause some unexpected behavior in various scenarios. The EmailMessage object has several limitations and specific guidelines on updating and deleting. For example, if IsPrivateDraft field is true, then only the CreatedById user would be able to view or delete the EmailMessage. You can read more about specific EmailMessage fields here.

Let's see how this could play out in a GRAX archive scenario: you kick off an archive and the GRAX user is unable to query any of the private draft users (only accessible by CreatedById user). Yet, when the delete occurs, GRAX would delete the parent Case which would cause a cascade delete on the Email Message and its children. Thus, you should be aware that Private Draft email messages are beyond GRAX's control in many scenarios. Typically, customers don't create draft email messages against old Cases that are due for archiving, but this is one scenario to make sure you understand.

Validation and Go-Live Recommendations

In larger, more complex Salesforce environments, GRAX recommends setting up policies and procedures specifically for the archive process to ensure:

  • Validation is done for the environment(s) where you want to include archiving
  • Go-Live phases are implemented for archiving, where the business builds up the level and complexity of archives in their production environment rather than trying to do everything at once

When backing up data, it is much easier to "set and forget" as you are not altering any of your existing data. However, with archives you need to be very thorough and careful about the process. We have seen general patterns used by customers that have been successful archiving with GRAX, and the consensus is that prioritizing speed of the archive job and trying to get as many objects archived in the smallest amount of time can often be counterproductive. We recommend a more methodical approach of going through the following steps in any environment(s) you plan on archiving:

Step

Comments

Initial archive criteria

Obtain criteria for each object you plan on archiving. This should ideally come from someone familiar with GRAX and the considerations involved with deleting SFDC data

Create GRAX Scheduled Processes

Will you create one archive job with parents and lots of children, or will you split out into several more manageable archive jobs

Technical Checklist

For each scheduled process, have someone familiar with all customizations within the environment go through each object checked off within the hierarchy of the scheduled process and list out potential customizations that could interfere with deletes. Examples can be found in this article (triggers, validations, SFDC standard rules, lookup relationships etc)

Business Checklist

For each scheduled process, have someone familiar with the environment's business process go through objects checked off and provide feedback on how business process could be affected if certain objects (and their relationships to other objects) are deleted

Review Checklists with GRAX Sponsors

Before running any archives even in sandboxes, ensure the initial technical and business checklists are reviewed with the GRAX sponsors as this exercise will help various teams be on the same page regarding a specific org's archive needs and balancing with SFDC restrictions and impacts of deleting that data. Take note of each archive rule/job and how complex it will be, as this will come in handy when prioritizing which jobs should be completed first once you move to Production.

Run Single, Small Archive

When validating in sandboxes, always start with just one archive job, and be sure the dataset is small to begin with. You don't want to have to wait days for the process to finish and to gain any valuable insight. Typically with a small representative data set you can learn a lot about your specific archive job.

Review Error Report

There is an error report link that you can access via the action dropdown on the Schedule tab. After running the job, document out the object, record, and reason it cannot be deleted as you'll use this inventory later on.

Regression Testing of Key Processes

After each archive finishes it may make sense to have someone familiar with this environment's functionality to ensure the deletes have not affected any key business processes

Repeat Archive Cycle

Repeat this archive cycle (kick off, monitor errors, regression test) with some of the other archive rules, and gradually expand to larger datasets to get a more representative inventory of potential issues on certain objects

Address Findings

Now that you may have specific delete issues identified, you will need to either resolve the root cause (could involve editing trigger, validation rule) or take another look whether the reason for blocking deletes is legitimate and these records should not actually be archived

Final Validations

If you do end up making tweaks or changes to the custom functionality to allow for more liberal delete policies, be sure to run archives post-changes to ensure everything now works as expected

Production Archive Phases

We recommend starting archives in production on 'cleaner' objects as the initial phase, and gradually moving on to some of the more complex rules and objects. For example, customers have had good experiences with first running archives on more transaction objects that do not have many fields and complex relationships to other objects. This ensures the jobs will run faster on objects without children. One example we've seen are integration jobs that may push in error logs or other isolated custom records by the thousands each day into Salesforce, but they can easily be archived after a certain number of days.

Ongoing Change Management

Even after you have kicked off production archive jobs, we recommend putting a process in place within the change management practices within your company. For instance, if new triggers/validations are built, they should be analyzed with respect to potential impacts on existing archive jobs, before deployments.