Monitoring

Ensuring major resources remain operational

Whether you design and implement the infrastructure for GRAX yourself or use GRAX-provided designs/templates, installing self-managed GRAX means running cloud infrastructure within your own environment. This infrastructure is not accessible or manageable by the GRAX team for the sake of safety, security, and compliance. Automated monitoring policies can help ensure that issues with this infrastructure are noticed quickly and downtime of your application remains low.

Application Support Policy

For more information about what is covered within the scope of GRAX support obligations, review our support documentation.

What can be monitored?

The exact observability/monitoring tools and configurations will vary based on cloud provider or environment utilized for installing GRAX, but at a high-level we want to monitor major components:

  1. Load Balancer (if applicable)
  2. Instance Utilization
  3. GRAX Service
  4. Postgres Database

Cost-based alerts for budgeting thresholds and forecasts can be configured separately from the GRAX infrastructure if required. See the documentation for your cloud provider of choice for more information. Automatic restriction of resources based on cost thresholds or budgets may cause interruption to your GRAX service.

Global services (like AWS' S3 or IAM) can be monitored at a per-service level but do not require further custom monitoring individual to your account.

Load Balancer

If your infrastructure deployment contains a load balancer for reachability, it must be reachable on a given domain name with a valid certificate and have healthy targets behind it. Thus, monitoring criteria is:

  1. Application domain is registered
  2. Application domain is non-expired
  3. Domain certificate is non-expired
  4. Domain certificate is assigned to ALB
  5. Certificate is installed on instance (if applicable)
  6. Certificate installed on instance is non-expired (if applicable)
  7. Load Balancer is reachable from intended network segment
  8. Targets are healthy (see below for health checks)

Instance Utilization

The GRAX application workload can be bursty and inconsistent based on Salesforce utilization. As such, there will be heavy-load periods and periods with almost no utilization. We recommend the following monitoring criteria:

  1. CPU utilization should remain below 80% on average (4-8hr rollup)
  2. RAM utilization should remain below 80% on average (4-8hr rollup)
  3. Temp directory total size should be at least 500GB
  4. Temp directory free space should be at least 15% of total size
  5. Network utilization should remain below 80% on average (4-8hr rollup)

For more information about the required specifications of GRAX hardware, please review the technical specifications document. If utilizing AWS, more documentation on instance and auto-scaling metrics is available here.

GRAX Service

Ensuring that the GRAX application remains running on the instance is foundational to success. It is highly recommended to run GRAX as a service with an auto-restart configuration so that the application boots again in case of a fatal error.

Health Checks

The GRAX application maintains an endpoint for external health checks like those by AWS's Target Groups. Post a request like the following to check if the application is available:

Port: 8000 (default)
Path: /health
Method: GET
Protocol: HTTPS (HTTP1 Only)

If properly configured, the GET request above will return a body of ok and a status of 200.

Postgres Database

A valid connection to the application database is required for boot and operation of GRAX. Monitoring the GRAX database is not unlike monitoring any other application database. Monitoring should cover the following:

  1. CPU utilization should remain below 80% on average (4-8hr rollup)
  2. RAM utilization should remain below 80% on average (4-8hr rollup)
  3. Total disk utilization should remain below 80% (if applicable)

More options for monitoring Postgres are available based on platform/vendor including queue depth, IOPs statistics, and network throughput. For more information on how you can monitor these metrics on AWS's RDS, check out:

For similar information pertaining to Azure Postgres, check out: