GRAX Documentation

The GRAX Documentation Hub

Welcome to the GRAX Documentation hub. You'll find comprehensive guides and walkthroughs to help you start working with GRAX as quickly as possible, as well as support if you get stuck. Let's jump right in!

Get Started    

Marketing Cloud

Learn about GRAX Marketing Cloud Connect, which allows you to schedule and sync data from your GRAX environment to the Salesforce Marketing Cloud.

Summary and Capabilities

GRAX Marketing Cloud Connect works by leveraging Salesforce Marketing Cloud's (SFMC) capability around FTP Sites. You can create a SFMC 'Import Activity' that will automatically take a CSV dropped into the Marketing Cloud FTP Site and import that CSV into a SFMC Data Extension. Some of the capabilities include:

  • Use GRAX queries to send CSV of filtered data to a defined marketing cloud FTP endpoint
  • Define a schedule (every 10 mins, daily, hourly) to run the sync
  • Specify queries across multiple objects
  • Pick which fields you would like to include in your CSV output
  • Use date/time fields such as LastModifiedDate in order to process deltas
  • Automated email alert when process ends

Syncing to Other Platforms

Although this guide specifically walks through the process for syncing data from GRAX to SFMC, the same exact JSON/architecture can be used to sync to other platforms such as Einstein Analytics or GRAX Storage only

Walkthrough

Setting up Marketing Cloud

Please ensure you have a folder named GRAX underneath your FTP site's Import directory as the GRAX sync will try to send all CSVs to /Import/GRAX

Please refer to your SI for any other information on setting up file drop automations and imports to data extensions.

Build Your JSON Query File

To sync data from the GRAX Data Lake to SFMC, we need to first create a JSON file that will house the indices you want to sync, fields on the indices, filters on the records, and other configuration parameters. When the sync process kicks off, it will read from this JSON, build a CSV based on the query for each index, and send a CSV for each index to the specified FTP site. Let's take a look at some example JSON snippets.

Elasticsearch Query DSL

Since the GRAX Data Lake records are housed within Elasticsearch, you need to use Elastic's Query DSL, which is a very powerful way to query and specify exactly what you need to sync.

Here are some sample JSONs, feel free to use as a starting point to modify specific to your GRAX environment:

[
{
               "objectType": "Account",
                "destination": ["MC"],
               "elasticQuery": {
                   "index": "graxaccountindex",
                   "body": {
                       "query": {
                           "match_all": {}
                       }
                   }
               }
           }
]
[
    {
        "destination":["MC"] ,
        "objectType": "Account",
        "elasticQuery": {
            "index": "graxaccountindex",
            "body": {
                "_source": {
                    "includes": [
                        "doc.Id",
"doc.Name",
"doc.graxid"

                    ]
                },
                "query": {
                    "range": {
                        "doc.LastModifiedDate": {
                            "gte": "now-100d/d",
                            "lt": "now/d"
                        }
                    }
                }
            }
        }
    },
    
	{
        "destination":["MC"] ,
        "objectType": "Contact",
        "elasticQuery": {
            "index": "graxcontactindex",
            "body": {
                "_source": {
                    "includes": [
                        "doc.Id",
"doc.Name",
"doc.OwnerId",
"doc.Phone",
"doc.FirstName",
"doc.LastName",
"doc.graxid"
                    ]
                },
                "query": {
                    "range": {
                        "doc.LastModifiedDate": {
                            "gte": "now-10d/d",
                            "lt": "now/d"
                        }
                    }
                }
            }
        }
    }
        
]

JSON Parameter
Description

objectType

SFDC Object API Name

destination

MC = Marketing Cloud
EA = Einstein Analytics
GRAXSTORAGE = S3 Bucket, Azure etc

index

Elasticsearch index name. Get all index names here

includes

Can add a list of elastic index field names here that will appear as columns on the CSV sent to SFMC. The default if you don't specify will be all fields, but be aware this will add significantly to the file size so we recommend only including fields you need synced.

query

Use Elasticsearch Query DSL to specify filters on the index.

Use Elasticsearch API Commands

You can use Kibana's Dev Tools Console in order to test out various Elasticsearch queries. Be careful as this is very powerful and if used incorrectly can permanently alter and delete your GRAX data.

Upload Your JSON File to S3

Once you create your JSON file, you need to store it in your S3 Bucket, in the following folder path:

Amazon S3 > Vandelay Industries Prod Bucket > grax > audittrail > salesforce > Org ID of Salesforce Environment > *store JSON file here in this folder, which will also have folders for all objects backed up

Configuration Variables

You can manage high level settings related to the sync in your heroku app within Setting-->Config Vars:

Config Var Name
Value/Description

ANALYTICS_QUERY_FILE

The name of the JSON file that resides in the S3 folder, e.g. samplecustomersyncquery.json



ANALYTICS_BATCH_SIZE

Up to 10,000 records



MC_HOST

FTP Site URL/Host



MC_PASSWORD

FTP Password



MC_USERNAME

FTP Username

Schedule the Sync

Let's go ahead and create a new sync schedule. We'll use the Heroku Scheduler add-on that comes with your GRAX purchase (you'll see it as one of the add-ons in your heroku app).

You will paste the following run command when creating the scheduler job.

./grax analytics-integration

Click the Heroku Scheduler add-on and create a new job using the run command from above.  You can have multiple jobs depending on the schedule you want, but note that each job will reference the same single JSON file specified in the config vars.

Click the Heroku Scheduler add-on and create a new job using the run command from above. You can have multiple jobs depending on the schedule you want, but note that each job will reference the same single JSON file specified in the config vars.

Dyno Size / Memory Limits

It is important to select the correct dyno size (Private-S, Private-M, Private-L) when creating the scheduler. Each dyno has a different memory limit. Given that we package and send all the data as a single CSV to the FTP site, you need to make sure the size of the index you are sending is not greater than the memory limit of the scheduler's dyno. Here are the memory limits:

Private-S: 1024MB
Private-M: 2.5GB
Private-L: 14GB

Compare the max memory with the total size of the index you are syncing, which you can find by running this command. The pri.store.size column will tell you the size of your index, and as long as this is less than the dyno's memory limit, you will be okay.

Conduct a One-Off Sync

Let's say you have a daily sync scheduled and running to send Account data to SFMC. However, you want to do a one-time sync of certain Contacts. To do this, you would:

  • Create a new JSON for this Contact query
  • Upload it to the S3 folder (you can have as many JSON files in the folder as you want, just remember only 1 file at a time can be referenced in the config vars)
  • Change the ANALYTICS_QUERY_FILE heroku config var to reference this new file name temporarily
  • Create a new Scheduler job to immediately (or at a certain time) run this Contact sync
  • Once it finishes, remember to delete the Scheduler job so it doesn't run again as you only wanted to do this one-time sync
  • Remember to go back to the ANALYTICS_QUERY_FILE heroku config var and replace it with the Account JSON that you want syncing daily

Last Run Time

One other important feature to understand is the lastRunTime parameter in the JSON. This will always automatically be set with the current sync time for each object as it is synced. You will see shortly in the below section how this lastRunTime will be used and why it is important. So after you do a sync here is what the JSON might look like:

[
    {
        "destination":["MC"] ,
        "objectType": "Account",
        "elasticQuery": {
            "index": "graxaccountindex",
            "body": {
                "_source": {
                    "includes": [
                        "doc.Id",
"doc.Name",
"doc.graxid"

                    ]
                },
                "query": {
                    "range": {
                        "doc.LastModifiedDate": {
                            "gte": "now-100d/d",
                            "lt": "now/d"
                        }
                    }
                }
            }
        },
"lastRunTime": "2019-04-16T13:50:55.000+0000"
    },
    
	{
        "destination":["MC"] ,
        "objectType": "Contact",
        "elasticQuery": {
            "index": "graxcontactindex",
            "body": {
                "_source": {
                    "includes": [
                        "doc.Id",
"doc.Name",
"doc.OwnerId",
"doc.Phone",
"doc.FirstName",
"doc.LastName",
"doc.graxid"
                    ]
                },
                "query": {
                    "range": {
                        "doc.LastModifiedDate": {
                            "gte": "now-10d/d",
                            "lt": "now/d"
                        }
                    }
                }
            }
        },
"lastRunTime": "2019-04-16T13:50:55.000+0000"
    }
        
]

Incremental/Delta Sync

Now that you understand how to use JSON to sync various indices, what if you wanted to do an 'incremental' sync? For example, rather than syncing the same Accounts to SFMC each day, you only want to sync any new accounts that were added to GRAX since the last sync.

To handle this scenario, you will need to modify the JSON by adding a new key-value parameter for each object that you want to be incremental. In the below JSON, notice that we have added this runtype key with a value of incremental only for the Account object. So the Contact object will continue to sync normally (not incremental).

[
    {
        "destination":["MC"] ,
        "objectType": "Account",
        "runType": "incremental",
        "elasticQuery": {
            "index": "graxaccountindex",
            "body": {
                "_source": {
                    "includes": [
                        "doc.Id",
"doc.Name",
"doc.graxid"

                    ]
                },
                "query": {
                    "range": {
                        "doc.LastModifiedDate": {
                            "gte": "now-100d/d",
                            "lt": "now/d"
                        }
                    }
                }
            }
        }
    },
    
	{
        "destination":["MC"] ,
        "objectType": "Contact",
        "elasticQuery": {
            "index": "graxcontactindex",
            "body": {
                "_source": {
                    "includes": [
                        "doc.Id",
"doc.Name",
"doc.OwnerId",
"doc.Phone",
"doc.FirstName",
"doc.LastName",
"doc.graxid"
                    ]
                },
                "query": {
                    "range": {
                        "doc.LastModifiedDate": {
                            "gte": "now-10d/d",
                            "lt": "now/d"
                        }
                    }
                }
            }
        }
    }
        
]

The first time this above sync runs, it will get stamped with the lastRunTime for each object, which will be used for the next incremental sync. Now when the next daily sync runs, we will grab all Accounts that have a graxModifiedTime greater than the lastRunTime. For Contacts, we will just sync the same normal query as it has not been tagged as incremental.

graxModifiedTime

Each record in Elastic will have a field called graxModifiedTime which will indicate the exact time this record was added/updated in GRAX. So any time you do a backup/archive out of Salesforce all those records will get stamped with graxModifiedTime which is necessary to use for incremental syncs where we can compare graxModifiedTime against lastRunTime to know which records are 'incremental'.

CSV Details

The CSV will have a file name of the Object Name along with the Date Time in UTC that the file was processed.

Example File Name: Account_2019-05-22T232441.csv

Also, a single CSV will be sent to the FTP site regardless of batch size specified, so the CSV file can get quite large depending.

Speed/Benchmarks

As a very rough benchmark, we have typically seen 1 million records sent per hour. Of course, this depends on the number of fields (columns) you are sending in the CSV, how many of those are long text etc.

Considerations

Marketing Cloud Automations

Please refer to your marketing cloud SI/vendor for more information on how your file drop automations will be set up to pick up and process the CSVs sent to the FTP site.

Sending Data to GRAX During a Marketing Cloud Sync

If you are doing a backup/archive out of Salesforce and thus sending data to GRAX while there is an ongoing marketing cloud sync happening on that same index, the sync may fail. GRAX will calculate record counts for sync at start and end of process to confirm nothing was skipped, so adding records to GRAX during the sync will result in mismatches and potential errors to the sync. Be sure to run your syncs for a particular index only when you are certain there will not be any changing data/additions to that index.

Updated 6 months ago

Marketing Cloud


Learn about GRAX Marketing Cloud Connect, which allows you to schedule and sync data from your GRAX environment to the Salesforce Marketing Cloud.

Suggested Edits are limited on API Reference Pages

You can only suggest edits to Markdown body content, but not to the API spec.