Login

Troubleshooting FAQ

Common question and steps for troubleshooting GRAX

This document is a list of the most important questions for troubleshooting and evaluating issues blocking general operations of GRAX including networking failures, boot failures, restart behaviors, etc. This guide includes commands specific to the Amazon Linux 2 AMI maintained by AWS, but attempts to remain otherwise infrastructure-agnostic where possible. Some steps may not work as intended if you have made heavy customizations to networking, image, or environment.

Where are GRAX files located?

First, let's note the locations of the GRAX binary and environment file. Keeping track of their paths helps validate service configurations in later steps. In a typical installation, GRAX is stored under /home/ec2-user/graxinc/grax and the .env file is stored under /home/ec2-user. Thus, we'll use the following paths:

  • GRAX binary: /home/ec2-user/graxinc/grax/grax
  • GRAX command-line tools: /home/ec2-user/graxinc/grax/graxctl
  • Environment file: /home/ec2-user/.env

Is GRAX executable?

To ensure that Linux knows GRAX is an executable, we can check permissions on the file as follows:

[root@grax-test-runtime grax]# ls -la
total 278108
drwxr-xr-x 2 root root      137 Aug 20 10:09 .
drwxr-xr-x 3 root root       18 Aug 18 12:38 ..
-rwxr-xr-x 1 root root 71471248 Aug 20 10:09 grax
-rwxr-xr-x 1 root root 56687264 Aug 19 15:48 graxctl
-rw-r--r-- 1 root root 52411443 Aug 18 12:39 master.zip

The "x" in the permissions strings at the beginning of each line denotes an executable file. If the grax and graxctl files aren't executable, we can mark them as such:

[root@grax-test-runtime grax]# chmod +x grax graxctl

How should the Environment file be formatted?

There are several important rules to remember for .env files:

  1. Only one key-value pair per line
  2. Only = is supported as a key-value separator
  3. Comments aren't supported

Comments are the most commonly seen issue as teams often attempt to label values for later reference. Unfortunately, this causes most .env parsers to immediately return (sometimes non-fatally). This can lead to partial configurations and thus cause indeterminate symptoms.

For a total example of a valid .env file, see our Linux Install Guide.

Is the service (systemd) working properly?

This guide assumes you're operating GRAX as a permanent service on the instance with systemd. The most common issues with systemd are configuration issues in the service file. In a typical installation, the GRAX service file is at the path /lib/systemd/system/grax.service.

Validate Configuration

We can see the contents of the service configuration by using cat:

[root@grax-test-runtime grax]# cat /lib/systemd/system/grax.service
[Install]
WantedBy=multi-user.target
[Service]
EnvironmentFile=/home/ec2-user/.env
ExecStart=/home/ec2-user/graxinc/grax/grax
Restart=always
Type=simple
[Unit]
Description=grax daemon

Check the following:

  1. EnvironmentFile is a valid absolute path that points to your GRAX .env file.
  2. ExecStart is a valid absolute path that points to your GRAX executable.
  3. Restart is "always" to ensure GRAX is always running regardless of exit-singaling.

Service Status

The services run via systemd are managed and interacted with via the systemctl command. To see the current status of the GRAX service, we can use the status subcommand:

[root@grax-test-runtime grax]# systemctl status grax.service
● grax.service - grax daemon
   Loaded: loaded (/usr/lib/systemd/system/grax.service; enabled; vendor preset: disabled)
   Active: active (running) since Fri 2022-08-19 17:35:29 UTC; 5 days ago
 Main PID: 13125 (grax)
   CGroup: /system.slice/grax.service
           └─13125 /home/ec2-user/graxinc/grax/grax

(Log Lines omitted for brevity)

Hint: Some lines were ellipsized, use -l to show in full.

We always expect the GRAX service to be "active" unless under maintenance or GRAX was intentionally taken offline. If the service isn't "active" (that is "failed" or "stopped"), you can restart GRAX at any time by running the restart subcommand:

[root@grax-test-runtime grax]# systemctl restart grax.service

If the GRAX service is entirely disabled, you can enable it with the enable subcommand, and then enforce a start immediately with start:

[root@grax-test-runtime grax]# systemctl enable grax.service; systemctl start grax.service

A successful start of the service (and thus app) outputs logs in the app log file. If you have a regular health check configured, you'll see logs in relation to those calls being submitted to the log if the app is active.

Is the Web Server Serving Requests?

GRAX is a web server and API. It offers an endpoint for an external health check to see if the app is available. The health check endpoint for GRAX is an HTTP/1.1 HTTPS-only GET handler on /health. In a typical installation, GRAX runs on port 8000.

We can manually check the status of the app from the instance by curling the local route:

[root@grax-test-runtime grax]# curl -k https://localhost:8000/health
ok

The expected value from the endpoint HTTP status 200; this signifies a healthy service. A failed call, either via timeout, rejection, or different status is a sign of a failed service/app. This endpoint is designed for load balancer registration and de-registration, not for instance replacement.

Is Connectivity intact?

The GRAX app is a data-processor at its core. To process data, it must be able to retrieve that data, write it to storage, and read it back. When you add app maintenance, licensing, and telemetry to the equation, connectivity is critical to ensure proper operation.

Only some pieces of overall connectivity requirements are possible to test from the instance. These are the egress connections that are used to push or pull data to/from remote resources. Ingress communications, as they start from other sources, are harder to test.

Timeouts, rejections, or broken connections during the following tests are considered failures. All failures should be investigated.

GRAX HQ

Communication to GRAX HQ is egress-only, and can be tested relatively simply. To start, we can verify connectivity to the GRAX packaging API, which allows downloading of the app in the first place:

[root@grax-test-runtime grax]# curl -L -o testgrax https://hq.grax.com/api/v2/download/graxinc/grax/master/linux/amd64
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100    68  100    68    0     0   4270      0 --:--:-- --:--:-- --:--:--  4533
100 48.6M    0 48.6M    0     0  6344k      0 --:--:--  0:00:07 --:--:-- 9270k

The -L flag is utilized to follow the ALB redirect that points to HQ. -o is used to avoid printing the binary data to the terminal. A successful download of several dozen MB can be considered a passing test.

Next, let's confirm POST requests to HQ succeed:

[root@grax-test-runtime grax]# curl -L -X POST https://hq.grax.com/api/v2/dd/logs/api/v2/logs
{"cause":"","status":401,"message":"Unauthorized"}

This may seem an unusual result, but the 401 return is a good enough response to know that your POST request made it to HQ without requiring you construct a valid set of credentials for a simple test. If you don't get a JSON response in line with the above, consider the test a failure.

Salesforce

To read and write Salesforce data via the Salesforce API, GRAX must first be able to connect. We can test that connectivity much the same as above:

[root@grax-test-runtime grax]# curl https://test.salesforce.com

The response to the above should be an HTML document, too large to post here. Repeat that test for the following:

  1. https://login.salesforce.com
  2. Any custom/my-domain paths configured in your organization

Postgres

To test connectivity to your DB instance, we use postgresql, a Linux command-line tool that allows direct interaction with Postgres clusters. Installing the tool may be unnecessary depending on image, but can easily be done like the following:

[root@grax-test-runtime grax]# yum install postgresql
Loaded plugins: extras_suggestions, langpacks, priorities, update-motd
amzn2-core                                                                                                                                                                           | 3.7 kB  00:00:00
amzn2extra-docker                                                                                                                                                                    | 3.0 kB  00:00:00
Package postgresql-9.2.24-6.amzn2.x86_64 already installed and latest version
Nothing to do

As you can see, our typical installation already includes the right tooling. We can connect in two ways:

  1. Copy the DB_URL value from your .env file, and run psql [db_url]
  2. Use the graxctl psql subcommand

A valid connection results in an interactive psql shell, which can be exited with \q:

[root@grax-test-runtime grax]# ./graxctl psql
2022/08/25 16:25:54 trace C9tATg        VBlnGV start main mainWithCode:152 e=0s
2022/08/25 16:25:54 pprof addr: [::]:46569
2022/08/25 16:25:54 trace C9tATg        VBlnGV info  config setTemplateDefaults:427 msg="loading general template v1.0.0 defaults" template=virtual-appliance e=0s
2022/08/25 16:25:54 trace C9tATg        VBlnGV info  config/secrets New:175 secretStore=database e=11ms

psql (9.2.24, server 14.5)
WARNING: psql version 9.2, server version 14.0.
         Some psql features might not work.
SSL connection (cipher: ECDHE-RSA-AES256-GCM-SHA384, bits: 256)
Type "help" for help.

grax=> \q

If a connection cannot be made, graxctl tries again every 5 seconds for a few minutes. This is usually a sign of the following:

  1. DB_URL isn't set
  2. DB_URL isn't properly formatted
  3. DB_URL contains invalid cluster name
  4. DB_URL contains a password with special characters that need to be escaped
  5. DB_URL isn't exported to current environment
  6. Route tables are forcing DB traffic outside of the VPC
  7. Security groups aren't allowing traffic from the Instance into the DB

If a connection can be made but you receive a Postgres error about DB existence, credentials, etc., then you likely have an issue with correctness in your DB_URL value (that is username, password, or DB name).

Is additional assistance available?

If you have exhausted the steps here and require further assistance (or have recommendations for quality/completeness of this guide), send an email to [email protected].