Monitoring
Ensuring major resources remain operational
Whether you design and implement the infrastructure for GRAX yourself or use GRAX-provided designs/templates, installing self-managed GRAX means running cloud infrastructure within your own environment. This infrastructure isn't accessible or manageable by the GRAX team for the sake of safety, security, and compliance. Automated monitoring policies can help ensure that issues with this infrastructure are noticed quickly and downtime of your app remains low.
Application Support Policy
For more information about what's covered within the scope of GRAX support obligations, review our support documentation.
What can be monitored?
The exact observability/monitoring tools and configurations vary based on cloud provider or environment utilized for installing GRAX, but at a high-level we want to monitor major components:
- Load Balancer (if applicable)
- Instance Usage
- GRAX Service
- Postgres Database
Cost-based alerts for budgeting thresholds and forecasts can be configured separately from the GRAX infrastructure if required. See the documentation for your cloud provider of choice for more information. Automatic restriction of resources based on cost thresholds or budgets may cause interruption to your GRAX service.
Global services (like AWS' S3 or IAM) can be monitored at a per-service level but don't require further custom monitoring individual to your account.
Load Balancer
If your infrastructure deployment contains a load balancer for stable connectivity, it must be reachable on a given domain name with a valid certificate and have healthy targets behind it. Thus, monitoring criteria is:
- Application domain is registered
- Application domain is non-expired
- Domain certificate is non-expired
- Domain certificate is assigned to ALB
- Certificate is installed on instance (if applicable)
- Certificate installed on instance is non-expired (if applicable)
- Load Balancer is reachable from intended network segment
- Targets are healthy (see below for health checks)
Instance Usage
The GRAX app workload can be varied and inconsistent based on Salesforce usage. As such, occasional heavy-load periods and periods with almost no usage are normal. We recommend the following monitoring criteria:
- CPU usage should remain below 80% on average (4-8hr roll up)
- RAM usage should remain below 80% on average (4-8hr roll up)
- Temp directory total size should be at least 500GB
- Temp directory free space should be at least 15% of total size
- Network usage should remain below 80% on average (4-8hr roll up)
For more information about the required specifications of GRAX hardware, please review the technical specifications document. If utilizing AWS, more documentation on instance and auto-scaling metrics is available here.
GRAX Service
Ensuring that the GRAX app remains running on the instance is foundational to success. it's highly recommended to run GRAX as a service with an auto-restart configuration so that the app boots again in case of a fatal error.
Health Checks
The GRAX app maintains an endpoint for external health checks like those by AWS's Target Groups. Post a request like the following to check if the app is available:
Port: 8000 (default)
Path: /health
Method: GET
Protocol: HTTPS (HTTP1 Only)
If properly configured, the GET request above returns a body of ok
and a status of 200
.
Postgres Database
A valid connection to the app database is required for boot and operation of GRAX. Monitoring the GRAX database isn't unlike monitoring any other app database. Monitoring should cover the following:
- CPU usage should remain below 80% on average (4-8hr roll up)
- RAM usage should remain below 80% on average (4-8hr roll up)
- Total disk usage should remain below 80% (if applicable)
More options for monitoring Postgres are available based on platform/vendor including queue depth, IOPs statistics, and network throughput. For more information on how you can monitor these metrics on AWS's RDS, check out:
For similar information pertaining to Azure Postgres, check out:
Updated about 2 months ago