GRAX Documentation

The GRAX Documentation Hub

Welcome to the GRAX Documentation hub. You'll find comprehensive guides and walkthroughs to help you start working with GRAX as quickly as possible, as well as support if you get stuck. Let's jump right in!

Get Started    

Einstein Analytics

Learn about GRAX Einstein Analytics Connect, which allows you to schedule and sync data from your GRAX Data Lake to Einstein Analytics.

❗️

In GRAX versions 3.30 and beyond, please use the configuration interface within Salesforce to control settings for einstein sync.

Summary and Capabilities

GRAX Einstein Analytics Connect is a simple data migration tool that allows for you to migrate data from your GRAX Data Lake to your Einstein Analytics instances. The tool allows for you to include all data, apply filters, specify objects and fields and the frequency for when the data is updated. Some of the other capabilities include:

  • Define a schedule (every 10 mins, daily, hourly) to run the sync
  • Specify queries across multiple objects
  • Use date/time fields such as LastModifiedDate in order to process deltas
  • Automated email alert when process ends
  • Sends data in batches to Einstein rather than as one large send (batch size configurable via heroku configuration)

🚧

Einstein Analytics App

In order to send data from GRAX to EA, please be sure you have an Einstein Analytics app named "GRAX" in your environment. The EA dataset will get created within this app with a default name of GRAX_{ObjectName}. However, see code snippets below to understand how to specify your own dataset name.

Walkthrough

Build Your JSON Query File

The first step in sending data from the GRAX Data Lake to Einstein Analytics is to create a JSON file that will contain the details of the objects, fields, filters and configuration parameters for the data that will be sent to Einstein Analytics. When the sync process kicks off, it will read from this JSON file, build a CSV file based on the query for each index, and send a CSV file for each index to Einstein Analytics. Let's take a look at some example JSON snippets.

📘

Elasticsearch Query DSL

Since the GRAX Data Lake records are housed within Elasticsearch, you need to use Elastic's Query DSL, which is a very powerful way to query and specify exactly what you need to sync.

Here are some sample JSONs, feel free to use as a starting point to modify specific to your GRAX environment:

🚧

Be sure the proper formatting is used in the JSON query, including placing the [ ] square brackets at the beginning and end

[
{
               "objectType": "Account",
               "destination": ["EA"],
               "elasticQuery": {
                   "index": "graxaccountindex",
                   "body": {
                       "query": {
                           "match_all": {}
                       }
                   }
               }
           }
]
[
{
               "objectType": "Account",
               "SyncEADatasetName": "MyGraxAccountDataSet",
               "destination": ["EA"],
               "elasticQuery": {
                   "index": "graxaccountindex",
                   "body": {
                       "query": {
                           "match_all": {}
                       }
                   }
               }
           },
 
  {
               "objectType": "Lead",
               "SyncEADatasetName": "MyGRAXLeadDataSet",
               "destination": ["EA"],
               "elasticQuery": {
                   "index": "graxleadindex",
                   "body": {
                       "query": {
                           "match_all": {}
                       }
                   }
               }
           }
]
[
    {
        "destination":["EA"] ,
        "objectType": "Account",
        "elasticQuery": {
            "index": "graxaccountindex",
            "body": {
                "_source": {
                    "includes": [
                        "doc.Id",
"doc.Name",
"doc.graxid"

                    ]
                },
                "query": {
                    "range": {
                        "doc.LastModifiedDate": {
                            "gte": "now-100d/d",
                            "lt": "now/d"
                        }
                    }
                }
            }
        }
    },
    
    {
        "destination":["EA"] ,
        "objectType": "Contact",
        "elasticQuery": {
            "index": "graxcontactindex",
            "body": {
                "_source": {
                    "includes": [
                        "doc.Id",
"doc.Name",
"doc.OwnerId",
"doc.Phone",
"doc.FirstName",
"doc.LastName",
"doc.graxid"
                    ]
                },
                "query": {
                    "range": {
                        "doc.LastModifiedDate": {
                            "gte": "now-10d/d",
                            "lt": "now/d"
                        }
                    }
                }
            }
        }
    }
        
]

JSON Parameter

Description

objectType

SFDC Object API Name

destination

EA = Einstein Analytics

index

Elasticsearch index name. Get all index names here

includes

Can add a list of elastic index field names here that will appear as columns on the CSV sent to Einstein Analytics. The default if you don't specify will be all fields, but be aware this will add significantly to the file size so we recommend only including fields you need synced.

query

Use Elasticsearch Query DSL to specify filters on the index.

📘

Use Elasticsearch API Commands

You can use Kibana's Dev Tools Console in order to test out various Elasticsearch queries. Be careful as this is very powerful and if used incorrectly can permanently alter and delete your GRAX data.

Upload Your JSON File to S3

Once you create your JSON file, you need to store it in your S3 Bucket, in the following folder path:

Amazon S3 > Vandelay Industries Prod Bucket > grax > audittrail > salesforce > Org ID of Salesforce Environment > *store JSON file here in this folder, which will also have folders for all objects backed up

Configuration Variables

You can manage high level settings related to the sync in your heroku app within Setting-->Config Vars:

Config Var Name

Value/Description

ANALYTICS_QUERY_FILE

The name of the JSON file that resides in the S3 folder, e.g. samplecustomersyncquery.json

GRAX_SFDC_URL

SFDC Production (login.salesforce.com) or Sandbox (test.salesforce.com) login URL

GRAX_SFDC_USERNAME

SFDC Username

GRAX_SFDC_PASSWORD

SFDC Password

GRAX_SFDC_TOKEN

SFDC Token

Schedule the Sync

Let's go ahead and create a new sync schedule. We'll use the Heroku Scheduler add-on that comes with your GRAX purchase (you'll see it as one of the add-ons in your heroku app).

You will paste the following run command when creating the scheduler job.

./grax analytics-integration

Click the Heroku Scheduler add-on and create a new job using the run command from above. You can have multiple jobs depending on the schedule you want, but note that each job will reference the same single JSON file specified in the config vars.

Conduct a One-Off Sync

Let's say you have a daily sync scheduled and running to send Account data to Einstein Analytics. However, you want to do a one-time sync of certain Contacts. To do this, you would:

  • Create a new JSON for this Contact query
  • Upload it to the S3 folder (you can have as many JSON files in the folder as you want, just remember only 1 file at a time can be referenced in the config vars)
  • Change the ANALYTICS_QUERY_FILE heroku config var to reference this new file name temporarily
  • Create a new Scheduler job to immediately (or at a certain time) run this Contact sync
  • Once it finishes, remember to delete the Scheduler job so it doesn't run again as you only wanted to do this one-time sync
  • Remember to go back to the ANALYTICS_QUERY_FILE heroku config var and replace it with the Account JSON that you want syncing daily

Last Run Time

One other important feature to understand is the lastRunTime parameter in the JSON. This will always automatically be set with the current sync time for each object as it is synced. You will see shortly in the below section how this lastRunTime will be used and why it is important. So after you do a sync here is what the JSON might look like:

[
    {
        "destination":["EA"] ,
        "objectType": "Account",
        "elasticQuery": {
            "index": "graxaccountindex",
            "body": {
                "_source": {
                    "includes": [
                        "doc.Id",
"doc.Name",
"doc.graxid"

                    ]
                },
                "query": {
                    "range": {
                        "doc.LastModifiedDate": {
                            "gte": "now-100d/d",
                            "lt": "now/d"
                        }
                    }
                }
            }
        },
"lastRunTime": "2019-04-16T13:50:55.000+0000"
    },
    
    {
        "destination":["EA"] ,
        "objectType": "Contact",
        "elasticQuery": {
            "index": "graxcontactindex",
            "body": {
                "_source": {
                    "includes": [
                        "doc.Id",
"doc.Name",
"doc.OwnerId",
"doc.Phone",
"doc.FirstName",
"doc.LastName",
"doc.graxid"
                    ]
                },
                "query": {
                    "range": {
                        "doc.LastModifiedDate": {
                            "gte": "now-10d/d",
                            "lt": "now/d"
                        }
                    }
                }
            }
        },
"lastRunTime": "2019-04-16T13:50:55.000+0000"
    }
        
]

Incremental/Delta Sync

Now that you understand how to use JSON to sync various indices, what if you wanted to do an 'incremental' sync? For example, rather than syncing the same Accounts to Einstein Analytics each day, you only want to sync any new accounts that were added to GRAX since the last sync.

To handle this scenario, you will need to modify the JSON by adding a new key-value parameter for each object that you want to be incremental. In the below JSON, notice that we have added this runtype key with a value of incremental only for the Account object. So the Contact object will continue to sync normally (not incremental).

[
    {
        "destination":["EA"] ,
        "objectType": "Account",
        "runType": "incremental",
        "elasticQuery": {
            "index": "graxaccountindex",
            "body": {
                "_source": {
                    "includes": [
                        "doc.Id",
"doc.Name",
"doc.graxid"

                    ]
                },
                "query": {
                    "range": {
                        "doc.LastModifiedDate": {
                            "gte": "now-100d/d",
                            "lt": "now/d"
                        }
                    }
                }
            }
        }
    },
    
    {
        "destination":["EA"] ,
        "objectType": "Contact",
        "elasticQuery": {
            "index": "graxcontactindex",
            "body": {
                "_source": {
                    "includes": [
                        "doc.Id",
"doc.Name",
"doc.OwnerId",
"doc.Phone",
"doc.FirstName",
"doc.LastName",
"doc.graxid"
                    ]
                },
                "query": {
                    "range": {
                        "doc.LastModifiedDate": {
                            "gte": "now-10d/d",
                            "lt": "now/d"
                        }
                    }
                }
            }
        }
    }
        
]

The first time this above sync runs, it will get stamped with the lastRunTime for each object, which will be used for the next incremental sync. Now when the next daily sync runs, we will grab all Accounts that have a graxModifiedTime greater than the lastRunTime. For Contacts, we will just sync the same normal query as it has not been tagged as incremental.

📘

graxModifiedTime

Each record in Elastic will have a field called graxModifiedTime which will indicate the exact time this record was added/updated in GRAX. So any time you do a backup/archive out of Salesforce all those records will get stamped with graxModifiedTime which is necessary to use for incremental syncs where we can compare graxModifiedTime against lastRunTime to know which records are 'incremental'.

Updated about a month ago

Einstein Analytics


Learn about GRAX Einstein Analytics Connect, which allows you to schedule and sync data from your GRAX Data Lake to Einstein Analytics.

Suggested Edits are limited on API Reference Pages

You can only suggest edits to Markdown body content, but not to the API spec.