Troubleshooting FAQ

Common question and steps for troubleshooting GRAX

This document is a list of the most important questions for troubleshooting and evaluating issues blocking general operations of GRAX including networking failures, boot failures, restart behaviors, etc. This guide includes commands specific to the Amazon Linux 2 AMI maintained by AWS, but attempts to remain otherwise infrastructure-agnostic where possible. Some steps may not work as intended if you have made heavy customizations to networking, image, or environment.

Where are GRAX files located?

First, let's note the locations of the GRAX binary and environment file. Keeping track of their paths will help validate service configurations in later steps. In a typical installation, GRAX is stored under /home/ec2-user/graxinc/grax and the .env file is stored under /home/ec2-user. Thus, we'll use the following paths:

  • GRAX binary: /home/ec2-user/graxinc/grax/grax
  • GRAX command line tools: /home/ec2-user/graxinc/grax/graxctl
  • Environment file: /home/ec2-user/.env

Is GRAX executable?

To ensure that linux knows GRAX is an executable, we can check permissions on the file as follows:

[[email protected] grax]# ls -la
total 278108
drwxr-xr-x 2 root root      137 Aug 20 10:09 .
drwxr-xr-x 3 root root       18 Aug 18 12:38 ..
-rwxr-xr-x 1 root root 71471248 Aug 20 10:09 grax
-rwxr-xr-x 1 root root 56687264 Aug 19 15:48 graxctl
-rw-r--r-- 1 root root 52411443 Aug 18 12:39 master.zip

The "x" in the permissions strings at the beginning of each line denotes an executable file. If the grax and graxctl files are not executable, we can mark them as such:

[[email protected] grax]# chmod +x grax graxctl

How should the Environment file be formatted?

There are several important rules to remember for .env files:

  1. Only one key/value pair per line
  2. Only = is supported as a key/value separator
  3. Comments are not supported

Comments are the most commonly seen issue as teams often attempt to label values for later reference. Unfortunately, this causes most .env parsers to immediately return (sometimes non-fatally). This can lead to partial configurations and thus cause indeterminate symptoms.

For a total example of a valid .env file, see our Linux Install Guide.

Is the service (systemd) working properly?

This guide assumes you're operating GRAX as a permanent service on the instance with systemd. The most common issues with systemd are configuration issues in the service file. In a typical installation, the GRAX service file is at the path /lib/systemd/system/grax.service.

Validate Configuration

We can see the contents of the service configuration by using cat:

[[email protected] grax]# cat /lib/systemd/system/grax.service
[Install]
WantedBy=multi-user.target
[Service]
EnvironmentFile=/home/ec2-user/.env
ExecStart=/home/ec2-user/graxinc/grax/grax
Restart=always
Type=simple
[Unit]
Description=grax daemon

Check the following:

  1. EnvironmentFile is a valid absolute path that points to your GRAX .env file.
  2. ExecStart is a valid absolute path that points to your GRAX executable.
  3. Restart is "always", to ensure GRAX is always running regardless of exit-singaling.

Service Status

The services run via systemd are managed and interacted with via the systemctl command. To see the current status of the GRAX service, we can use the status subcommand:

[[email protected] grax]# systemctl status grax.service
● grax.service - grax daemon
   Loaded: loaded (/usr/lib/systemd/system/grax.service; enabled; vendor preset: disabled)
   Active: active (running) since Fri 2022-08-19 17:35:29 UTC; 5 days ago
 Main PID: 13125 (grax)
   CGroup: /system.slice/grax.service
           └─13125 /home/ec2-user/graxinc/grax/grax

(Log Lines omitted for brevity)

Hint: Some lines were ellipsized, use -l to show in full.

We always expect the GRAX service to be "active" unless under maintenance or GRAX was intentionally taken offline. If the service is not "active" (i.e. "failed" or "stopped"), you can restart GRAX at any time by running the restart subcommand:

[[email protected] grax]# systemctl restart grax.service

If the GRAX service is entirely disabled, you can enable it with the enable subcommand, and then enforce a start immediately with start:

[[email protected] grax]# systemctl enable grax.service; systemctl start grax.service

A successful start of the service (and thus application) will output logs in the application log file. If you have a regular health check configurated, you'll see logs in relation to those calls being submitted to the log if the application is active.

Are Health Checks succeeding?

Health check endpoints are used for automated detection of application downtime and to enforce replacement rules. The health check endpoint for GRAX is an HTTP/1.1 HTTPS-only GET handler on /health. In a typical installation, GRAX runs on port 8000.

We can manually check the status of the application from the instance by curling the local route:

[[email protected] grax]# curl -k https://localhost:8000/health
ok

The expected value from the endpoint is "ok", and signifies a healthy service. A failed call, either via timeout or rejection, is a sign of a failed service/application.

Is Connectivity intact?

The GRAX application is a data-processor at its core. To process data, it must be able to retrieve that data, write it to storage, and read it back. When you add application maintenance, licensing, and telemetry to the equation, connectivity is critical to ensure proper operation.

Only some pieces of overall connectivity requirements are possible to test from the instance. These are the egress connections that are used to push or pull data to/from remote resources. Ingress communications, as they start from other sources, are harder to test.

Timeouts, rejections, or broken connections during the following tests are considered failures. All failures should be investigated.

GRAX HQ

Communication to GRAX HQ is egress-only, and can be tested relatively simply. To start, we can verify connectivity to the GRAX packaging API, which allows downloading of the application in the first place:

[[email protected] grax]# curl -L -o testgrax https://hq.grax.com/api/v2/download/graxinc/grax/master
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100    68  100    68    0     0   4270      0 --:--:-- --:--:-- --:--:--  4533
100 48.6M    0 48.6M    0     0  6344k      0 --:--:--  0:00:07 --:--:-- 9270k

The -L flag is utilized to follow the ALB redirect that points to HQ. -o is used to avoid printing the binary data to the terminal. A successful download of several dozen MB can be considered a passing test.

Next, let's confirm POST requests to HQ succeed:

[[email protected] grax]# curl -L -X POST https://hq.grax.com/api/v2/dd/logs/api/v2/logs
{"cause":"","status":401,"message":"Unauthorized"}

This may seem an unusual result, but the 401 return is a good enough response to know that your POST request made it to HQ without requiring you construct a valid set of credentials for a simple test. If you do not get a JSON response in line with the above, consider the test a failure.

Salesforce

To read and write Salesforce data via the Salesforce APIs, GRAX must first be able to connect. We can test that connectivty much the same as above:

[[email protected] grax]# curl https://test.salesforce.com

The response to the above should be an HTML document, too large to post here. Repeat that test for the following:

  1. https://login.salesforce.com
  2. Any custom/my-domain paths configured in your organization

Postgres

To test connectivity to your DB instance, we will use postgresql, a linux command line tool that allows direct interaction with Postgres clusters. Installing the tool may be unnecessary depending on image, but can easily be done like the following:

[[email protected] grax]# yum install postgresql
Loaded plugins: extras_suggestions, langpacks, priorities, update-motd
amzn2-core                                                                                                                                                                           | 3.7 kB  00:00:00
amzn2extra-docker                                                                                                                                                                    | 3.0 kB  00:00:00
Package postgresql-9.2.24-6.amzn2.x86_64 already installed and latest version
Nothing to do

As you can see, our typical installation already includes the right tooling. We can connect in two ways:

  1. Copy the DB_URL value from your .env file, and run psql [db_url]
  2. Use the graxctl psql subcommand

A valid connection will result in an interactive psql shell, which can be exited with \q:

[[email protected] grax]# ./graxctl psql
2022/08/25 16:25:54 trace C9tATg        VBlnGV start main mainWithCode:152 e=0s
2022/08/25 16:25:54 pprof addr: [::]:46569
2022/08/25 16:25:54 trace C9tATg        VBlnGV info  config setTemplateDefaults:427 msg="loading general template v1.0.0 defaults" template=virtual-appliance e=0s
2022/08/25 16:25:54 trace C9tATg        VBlnGV info  config/secrets New:175 secretStore=database e=11ms

psql (9.2.24, server 13.5)
WARNING: psql version 9.2, server version 13.0.
         Some psql features might not work.
SSL connection (cipher: ECDHE-RSA-AES256-GCM-SHA384, bits: 256)
Type "help" for help.

grax=> \q

If a connection cannot be made, graxctl will try again every 5 seconds for a few minutes. This is usually a sign of the following:

  1. DB_URL is not set
  2. DB_URL is not properly formatted
  3. DB_URL contains invalid cluster name
  4. DB_URL is not exported to current environment
  5. Route tables are forcing DB traffic outside of the VPC
  6. Security groups are not allowing traffic from the Instance into the DB

If a connection can be made but you receive a Postgres error about DB existence, credentials, etc., then you likely have an issue with correctness in your DB_URL value (i.e. username, password, or DB name).

Is additional assistance available?

If you have exhaused the steps here and require further assistance (or have recommendations for quality/completeness of this guide), send an email to [email protected].