Legacy Archive Best Practices

Recommendations, best practices, and necessary prep for GRAX archives

FEATURE RETIREMENT ALERT - EOL Dec 31, 2022

One or more features described here is part of a retirement. Please see the official GRAX Feature Retirement page for more information on this and other feature retirements.


GRAX archive processes involve just a few clicks to set up, but your Salesforce administrator and team needs to prepare the environment and ensure all considerations are taken into account. Unlike backing up data, archives involve modifying (deleting) data in your Salesforce org which can have unintended consequences if the necessary preparation and validation isn't done. GRAX leverages standard Salesforce deletions, but depending on the way your schema is constructed, there are several rules and cascading impacts that can affect your data. This article gives an idea of what to expect and best practices we have seen with customers that are successful with archiving across large amounts of data with complex logic.

CRITICAL: Complete Full Backup + Nightly Incremental BEFORE Archiving ANY Data

it's absolutely critical you COMPLETE a Full Backup + Schedule/Enable a full-org nightly incremental. Due to Salesforce Cascading Deletes, Delete Triggers, and/or Custom Processes you can NEVER assume hierarchy archive secures all data. The only way to protect all data is via full-org nightly incremental backups.

Permissions

The most important piece to understand before starting an archive is the permissions you have assigned to the relevant users. This is crucial to ensure that all records are backed up and you don't inadvertently cause some records to get cascade deleted. Please read through the GRAX permission guide and ensure the Integration User and Apex User have the proper permissions. Query All Files is a common one that we see customers forget as it doesn't come standard with a System Administrator profile.

User Consistency

The recommendation is that you ensure the Apex user and integration user are the same. This helps to prevent permission mismatches that could result in data loss.

Full-Org Backup

GRAX recommends you first take a backup snapshot of all your data, at the very least all relevant objects involved in the archive. By setting up a full-org, incremental backup first, you can be sure that all relevant data is backed up. If you create an archive job with incorrect settings or inadvertent hierarchy selections that cause cascade deletes, you still have data from the latest backup jobs.

Soft vs. Hard Delete

GRAX leverages an effective "hard" delete. Records are deleted and put into the Salesforce recycle bin and then immediately cleared out from the recycle bin, to ensure there is enough space. it's important to be aware of this as deletes are essentially 'final'.

Record Volume

GRAX recommends that the total number of records in your archive process maxes out at 2 million records for optimal performance and to make sure you don't have long-running (>24 hours) jobs that hog Salesforce resources such as API credits and Apex credits/queue positions. Given the complex, hierarchical nature of queries and number of different objects involved, you won't know how many records are involved in the archive ahead of time. Thus, there are some suggestions here that can help achieve a successful archive:

  • If there are child objects selected in the hierarchy, you should limit the parent object to 100-200K total records, as this assumes there could be 10-20x that number in child records across various objects, which should bring the archive size to ~2m total records.
  • You can find out the exact number of parent records that are picked up based on your date range filters. Use the Salesforce Developer Console, or similar tool, to issue a SOQL query and get that number.
  • If you aren't checking off any child objects (because the root element selected has no children, or because you decided to override the hierarchy and don't need to back up the children as you may be okay with them being deleted) then feel free to have up to 2m records for the parent object selected.

Estimate Button

Starting with GRAX 3.71.2, you can see an Estimate button in the archive interface, which estimates the number of parent-level records based on the date filter, report, or SOQL query you entered. This can give you a better idea how many top-level records the archive picks up, though the total number may be much higher depending on how many children were selected.

Dry Run

If you want a more exact summary of all the records that are picked up in an archive, without actually deleting any records, you can perform a "dry run" archive job. If you deselect the Delete Data toggle, when the jobs runs GRAX performs all steps of the archive (query+backup) except the actual delete. You can see a summary and search on all records. This gives you a much better idea of whether you are following best practices regarding record volumes etc. For potentially larger one-time archives, we recommend doing the dry run.

You can see here that the "Delete Data" toggle is disabled, and thus this is a "dry run" archive.

No Parallel Jobs

We recommend always executing only 1 GRAX job (backup or archive) at a time to prevent data integrity issues. For archives specifically, we strongly recommend you stagger and space out any archive jobs so only 1 is running at a time. For example, if an Account archive hierarchy process plans to delete related contacts, don't run another archive process that involves contacts, as this could lead to Salesforce record lock issues.

Records Deleted During an Archive

Once an archive process is kicked off, it can take time to first query all the records in the hierarchy, back them up, and begin the delete. If a user manually deletes records that are involved in the archive during this time period, GRAX may not be able to back them up or may fail when trying to delete. For example, let's say you start an archive process that finds 1m Cases. When starting the back up of these 1m Cases, GRAX only is able to back up 999,999 because 1 was manually deleted by a user. If this happens, GRAX throws an error when trying to delete this record because it was not backed up. We recommend ensuring when possible that users aren't interacting with records that are part of an ongoing archive.

Archive Object Restrictions

Certain objects cannot be archived (deleted), for example CaseStatus object doesn't support the delete() call. Please review Salesforce objects list to ensure you are archiving objects that can be deleted. If an object isn't supported, your GRAX archive process may be blocked.

Validation

Once you have an idea of which objects you want to archive and what the rules are be for various archive processes, always conduct initial validation in a development environment, and then follow that up by validating fully in a staging/full sandbox. Expect to run into a few of the 'gotchas' outlined in this article. As a Salesforce administrator, you would never blindly mass delete records across various objects without understanding the implications to your org, and the same is true with GRAX. While GRAX facilitates the archive process and allows you to create the necessary logic, each org is different and you must rely on your administrator and/or SI to conduct the proper validation.

Custom Code

By far the biggest area to consider are Salesforce custom Apex triggers and other custom code you have specific to your environment. Document and list out any potential Apex triggers you have that could fire on delete events and either prevent the delete during an archive job, or lead to other unintended DML operations.

Deleting in Salesforce is an intensive operation that can fire off chains of non-GRAX code and cause things such as Apex CPU time limits. This means that it isn't always a given the delete succeed. GRAX has an option to set the delete batch size which can come in handy if the default delete batch size (500) leads to CPU timeouts due to non-GRAX code.

Delete Triggers

Be sure to research all triggers and other custom code, flows, processes etc that could fire upon the deletes that occur at the end of a GRAX archive job. We often see customers putting exceptions into the triggers that skip these when deletes are done by a designated GRAX user.

Lookup and Master-Detail Relationships

Without diving into the intricacies of lookup vs master-detail relationships in Salesforce, suffice it to say that there are serious implications when deleting records that are linked to each other via these Salesforce relationship structures. In the simplest use case, you would be archiving a custom object that has zero relationships to anything else and can be sure the deletion of these records would not cascade or roll down/up to any other objects.

Relationships

Ensure you understand all Salesforce considerations for relationships especially as it relates to deletions.

Master-Detail

Be very careful when choosing your hierarchy objects for an archive. The system forces you to select all master-detail relationships by default, to prevent an accidental delete of children. Any administrator can tell you, deleting a parent in a master-detail relationship automatically deletes the children.

Lookup

Even if you stick with the system-defaulted selected of all details objects, you need to also understand implications around choosing children related to the parent via lookup relationships. Salesforce allows you to choose at the lookup field level how you want to handle deletes to the related parent. For example, let's say you have a custom object InvoiceHeader and a child object InvoiceLine that relates via lookup relationship. If you conduct an archive job on the InvoiceHeader and fail to also include the InvoiceLine, the job deletes the parent and leave the InvoiceLine records with a blank value in that lookup field pointing to InvoiceHeader. Again, it depends how each lookup field is set up.

Protecting Cross-References

We recommend if you are archiving data to always have a unique external ID on the objects aside from the lookup relationship fields. In the above scenario, once you conduct the archive you would have 'orphan' InvoiceLines in Salesforce without linkages to the InvoiceHeader. If you later archive these InvoiceLines they would exist in GRAX without a link to the InvoiceHeader, so having a link via some other external identifier field is always a best practice as well.

Some customers enable cascade deletes on lookup relationships (similar to how Salesforce treats master-detail out of the box), so this is another consideration. Your Salesforce administrator team or SI can inventory out all considerations unique to your organization.

Speed vs. Number of Relationships

GRAX supports archiving a parent and all children below (up to 3 levels), but given the architecture is operating atop Salesforce and the query limits imposed, the speed of an archive job is drastically reduced by including multiple levels down the tree and multiple children. You should balance this with the necessity to archive certain objects (and implications if you don't archive them as discussed in this article).

GRAX does provide the ability to override the default hierarchy objects in an archive to improve speed. For example, if your team has done the proper research on all considerations, you may know many of the objects in the hierarchy tree aren't actually being used or can be ignored.

Maximum Hierarchy Levels

GRAX supports hierarchy processes up to 3 levels deep (parent, child, and grandchild). If you have objects that fall outside this hierarchy (for example great grandchildren objects and lower) you need to consider this for archives, as it could lead to data loss without the proper setup and understanding. For each object you select in the hierarchy, especially for the objects at the lowest third level, think about whether there are other child objects that aren't visible and whether you need to first archive those. For example, if you had Account selected as the root/parent element you could imagine this hierarchy:

Account --> Case --> EmailMessage --> Attachment

You may only be able to check off Case (Level 2) and EmailMessage (Level 3). However, you know there are attachments beneath the EmailMessage and so you would not run this type of archive and risk deletion of the EmailMessage which could cause cascade deletes of the related Attachments. Instead you could do a Case-based or EmailMessage-based archive.

Covering Deep Hierarchies

We recommend creating separate jobs if you have objects further down the chain in this scenario. Be sure to reference your org's full schema to ensure you aren't missing any objects that you want covered.

Delete Sequence

GRAX deletes records from the 'bottom up', meaning that it first deletes all records that are on the lowest level (that is grandchildren in the scenario where there are 3 total levels), and then deletes all records on Level 2, and finally tries to delete the Root records (main parent object selected in the hierarchy process).

If there are any errors deleting records from Level 3, for example, this could trickle up to Level 2 and Level 1. For instance, let's say we try to delete InvoiceLine records, but get errors due to a trigger preventing deletes --> now when trying to delete InvoiceHeader (parent to InvoiceLine) this could fail if there were a rule preventing deletes of InvoiceHeaders that still had related InvoiceLines.

GRAX attempts to delete 3 times, so even if records from Level 3 error out the first time, they may succeed in a subsequent attempt depending on the reason they initially failed. There is no specific logical order that the objects from the same level are chosen for deletion so if there are rules/relationships governing the deletion of some of these objects in a certain sequence, they could fail in the first attempt and succeed in a future attempt.

Record Locking

We strongly recommend you stagger and space out any archive jobs that involve the same objects. For example, if an Account archive hierarchy process plans to delete related contacts, don't run another archive process that involves contacts, as this could lead to Salesforce record lock issues. If you do want to run multiple archive jobs in parallel, be sure they involve objects that aren't related to each other.

Locking Risks

Be sure you understand the potential and risk levels for record lock. However, GRAX re-attempts deletes up to 3 times if failures occur.

Competing DML Operations

When running an initial archive job (or other very large archive jobs) we recommend understanding the other DML operations that may be scheduled in your environment. Given Salesforce's multi-tenant structure, you want to be sure there isn't another event such as a manual mass delete or clean operation that could impact the same objects being archived.

Salesforce Out of Box Delete Blockers

Salesforce has some built-in fail-safes and rules regarding deletions of certain parent-child relationships. For example, you cannot delete a Contact that has related Cases. GRAX sometimes runs into these Salesforce rules, but retries the delete. Given that GRAX deletes from the 'ground up' this should be rather rare, but can occur depending on your schema.

Third-Party Packages

Third Party Support

GRAX doesn't recommend archiving (deleting) data for third party package customizations/objects such as FinancialForce, Veeva etc. unless fully validated by customer.

Often customers have third-party managed packages installed which come with a host of custom objects, fields, Apex triggers and other configurations that cannot be modified. The CloudCraze e-commerce app is one example. Veeva CRM is another example as this is essentially a customized Salesforce environment similar to a managed package. While GRAX can archive data in customized orgs, it's critical that the SME of the app for the customer understands implications and potential blockers for deletes. GRAX can only attempt to delete the data, it cannot override Salesforce or package-imposed restrictions such as triggers. For example, Veeva CRM customization can have several built-in fail-safes to prevent deletion of data to ensure the app works as intended. On top of this, some customers further customize with their own triggers, so validation in a sandbox for the specific use case is imperative to understand which objects you would like to archive and the implications of doing so.

Other Settings

There are a host of other considerations, but this is intended to help you understand the necessary preparation around archiving data.

For example, if you have the "Contacts to Multiple Accounts" feature enabled, there could be other options as to how you want to handle deletes. You always want to revisit settings like this to ensure a smooth scalable deployment.

Screen Shot 2019-08-22 at 6.17.28 PM.png

Email Message Archiving Consideration

For orgs that have set up enhanced email, they might find the need to perform archives against the EmailMessage object. There are considerations when attempting to archive these records. If these records are considered "orphaned," meaning the ActivityId=NULL and the ParentId=NULL, only the record owner can delete them, therefore, the standard archive process won't allow these records to be purged form Salesforce. However, they may get deleted in a cascade operation.

If you experience this challenge, the best practice would be to perform a backup and then request for Salesforce to perform a mass purge against those records
For specific details please reference these considerations when deleting EmailMessage records.

Private Draft Email Messages

You may have enabled email drafts in Salesforce Classic or Lightning, and this can cause some unexpected behavior in various scenarios. The EmailMessage object has several limitations and specific guidelines on updating and deleting. For example, if IsPrivateDraft field is true, then only the CreatedById user would be able to view or delete the EmailMessage. You can read more about specific EmailMessage fields here.

Let's see how this could play out in a GRAX archive scenario: you kick off an archive and the GRAX user is unable to query any of the private draft users (only accessible by CreatedById user). Yet, when the delete occurs, GRAX would delete the parent Case which would cause a cascade delete on the Email Message and its children. Thus, you should be aware that Private Draft email messages are beyond GRAX's control in many scenarios. Typically, customers don't create draft email messages against old Cases that are due for archiving, but this is one scenario to make sure you understand.

Validation and Go-Live Recommendations

In larger, more complex Salesforce environments, GRAX recommends setting up policies and procedures specifically for the archive process to ensure:

  • Validation is done for the environments where you want to include archiving
  • Go-Live phases are implemented for archiving, where the business builds up the level and complexity of archives in their production environment rather than trying to do everything at once

When backing up data, it's much easier to "set and forget" as you aren't altering any of your existing data. However, with archives you need to be very thorough and careful about the process. We have seen general patterns used by customers that have been successful archiving with GRAX, and the consensus is that prioritizing speed of the archive job and trying to get as many objects archived in the smallest amount of time can often be counterproductive. We recommend a more methodical approach of going through the following steps in any environment you plan on archiving:

StepComments
Initial archive criteriaObtain criteria for each object you plan on archiving. This should ideally come from someone familiar with GRAX and the considerations involved with deleting Salesforce data
Create GRAX Scheduled ProcessesDo you create one archive job with parents and lots of children, or do you split out into several more manageable archive jobs
Technical ChecklistFor each scheduled process, have someone familiar with all customizations within the environment go through each object checked off within the hierarchy of the scheduled process and list out potential customizations that could interfere with deletes. Examples can be found in this article (triggers, validations, Salesforce standard rules, lookup relationships etc)
Business ChecklistFor each scheduled process, have someone familiar with the environment's business process go through objects checked off and provide feedback on how business process could be affected if certain objects (and their relationships to other objects) are deleted
Review Checklists with GRAX SponsorsBefore running any archives even in sandboxes, ensure the initial technical and business checklists are reviewed with the GRAX sponsors as this exercise helps various teams be on the same page regarding a specific org's archive needs and balancing with Salesforce restrictions and impacts of deleting that data. Take note of each archive rule/job and how complex it is, as this comes in handy when prioritizing which jobs should be completed first once you move to Production.
Run Single, Small ArchiveWhen validating in sandboxes, always start with just one archive job, and be sure the dataset is small to begin with. You don't want to have to wait days for the process to finish and to gain any valuable insight. Typically with a small representative data set you can learn a lot about your specific archive job.
Review Error ReportThere is an error report link that you can access via the action dropdown on the Schedule tab. After running the job, document out the object, record, and reason it cannot be deleted as you'll use this inventory later on.
Regression Testing of Key ProcessesAfter each archive finishes it may make sense to have someone familiar with this environment's operations to ensure the deletes have not affected any key business processes
Repeat Archive CycleRepeat this archive cycle (kick off, monitor errors, regression test) with some of the other archive rules, and gradually expand to larger datasets to get a more representative inventory of potential issues on certain objects
Address FindingsNow that you may have specific delete issues identified, you need to either resolve the root cause (could involve editing trigger, validation rule) or take another look whether the reason for blocking deletes is legitimate and these records should not actually be archived
Final ValidationsIf you do end up making tweaks or changes to the custom features to allow for more liberal delete policies, be sure to run archives post-changes to ensure everything now works as expected
Production Archive PhasesWe recommend starting archives in production on 'cleaner' objects as the initial phase, and gradually moving on to some of the more complex rules and objects. For example, customers have had good experiences with first running archives on more transaction objects that don't have many fields and complex relationships to other objects. This ensures the jobs run faster on objects without children. One example we've seen are integration jobs that may push in error logs or other isolated custom records by the thousands each day into Salesforce, but they can easily be archived after a certain number of days.
Ongoing Change ManagementEven after you have kicked off production archive jobs, we recommend putting a process in place within the change management practices within your company. For instance, if new triggers/validations are built, they should be analyzed with respect to potential impacts on existing archive jobs, before deployments.