Agoric monitoring by Prometheus + Grafana [EN]
Introduction
Hello. In this article, we will finally begin to learn how to monitor servers according to the parameters that are of interest to us. We will monitor using Prometheus and Grafana.
In our case, the task is to monitor CPU and memory usage during the Agoric test network. Since our topic is quite voluminous, in order to make it easier to digest the material, I will divide the process of setting up monitoring into several articles. And we will start with general information in order to understand what this is about! :)
Both Prometheus and Grafana are open-source enterprise-grade distributed monitoring solutions. They are open source.
1. Let’s talk about Prometheus
What is Prometheus?
Prometheus is an open-source systems monitoring and alerting toolkit originally built at SoundCloud. Since its inception in 2012, many companies and organizations have adopted Prometheus, and the project has a very active developer and user community. It is now a standalone open source project and maintained independently of any company. To emphasize this, and to clarify the project’s governance structure, Prometheus joined the Cloud Native Computing Foundation in 2016 as the second hosted project, after Kubernetes.
For more elaborate overviews of Prometheus, see the resources linked from the media section.
Features
Prometheus’s main features are:
- a multi-dimensional data model with time series data identified by metric name and key/value pairs
- PromQL, a flexible query language to leverage this dimensionality
- no reliance on distributed storage; single server nodes are autonomous
- time series collection happens via a pull model over HTTP
- pushing time series is supported via an intermediary gateway
- targets are discovered via service discovery or static configuration
- multiple modes of graphing and dashboarding support
Components
The Prometheus ecosystem consists of multiple components, many of which are optional:
- the main Prometheus server which scrapes and stores time series data
- client libraries for instrumenting application code
- a push gateway for supporting short-lived jobs
- special-purpose exporters for services like HAProxy, StatsD, Graphite, etc.
- an alertmanager to handle alerts
- various support tools
Most Prometheus components are written in Go, making them easy to build and deploy as static binaries.
Architecture
This diagram illustrates the architecture of Prometheus and some of its ecosystem components:
Prometheus scrapes metrics from instrumented jobs, either directly or via an intermediary push gateway for short-lived jobs. It stores all scraped samples locally and runs rules over this data to either aggregate and record new time series from existing data or generate alerts. Grafana or other API consumers can be used to visualize the collected data.
2. Let’s talk about Grafana
You can use the product for free!
Grafana is a multi-platform open source analytics and interactive visualization web application. It provides charts, graphs, and alerts for the web when connected to supported data sources. A licensed Grafana Enterprise version with additional capabilities is also available as a self-hosted installation or an account on the Grafana Labs cloud service. It is expandable through a plug-in system. End users can create complex monitoring dashboards using interactive query builders. Grafana is divided into a front end and back end, written in TypeScript and Go, respectively.
As a visualization tool, Grafana is a popular component in monitoring stacks, often used in combination with time series databases such as Prometheus or Zabbix.
3. Let’s try to set up Agoric monitoring.
When working with a validator, it is necessary to take into account not only such factors as:
✅Is the validator in jail.
✅Is the latest version of the app worth it.
✅Is the load on the cores evenly distributed.
✅Is there enough free space on the hard disk.
✅Are there any problems when reading and writing blocks.
✅Is your RAM loaded.
✅Is your internet connection stable?
✅and many other factors.
To monitor such metrics, you can use the Grafana analytics and interactive visualization platform.
Instruction from Agoric team.
Verify available versions
sudo apt-cache policy prometheus
Create Prometheus user
sudo useradd -M -r -s /bin/false prometheus
Create Prometheus directories
sudo mkdir /etc/prometheus /var/lib/prometheus
Dowload binaries
wget https://github.com/prometheus/prometheus/releases/download/v2.18.1/prometheus-2.18.1.linux-amd64.tar.gz
Extract and Install
tar xzf prometheus-2.18.1.linux-amd64.tar.gz
sudo cp prometheus-2.18.1.linux-amd64/{prometheus,promtool} /usr/local/bin/
sudo chown prometheus:prometheus /usr/local/bin/{prometheus,promtool}
sudo cp -r prometheus-2.18.1.linux-amd64/{consoles,console_libraries} /etc/prometheus/
Setup Prometheus configuration
sudo cp prometheus-2.18.1.linux-amd64/prometheus.yml /etc/prometheus/prometheus.yml
sudo nano /etc/prometheus/prometheus.yml
Configuration section for scraping
static_configs:
- targets: ['localhost:9091']
- job_name: 'Agoric-Validator'
static_configs:
- targets: ['localhost:26660']
labels:
group: 'Validator'
- job_name: node
static_configs:
- targets: ['localhost:9100']
Set the user and group ownership on the configuration file
sudo chown -R prometheus:prometheus /etc/prometheus
sudo chown prometheus:prometheus /var/lib/prometheus
Create a systemd file for Prometheus — in this scenario Prometheus uses 9091 as listen port
sudo nano /etc/systemd/system/prometheus.service
Now copy the section below:
[Unit]
Description=Prometheus Time Series Collection and Processing Server
Wants=network-online.target
After=network-online.target
[Service]
User=prometheus
Group=prometheus
Type=simple
ExecStart=/usr/local/bin/prometheus \
--config.file /etc/prometheus/prometheus.yml \
--storage.tsdb.path /var/lib/prometheus/ \
--web.console.templates=/etc/prometheus/consoles \
--web.console.libraries=/etc/prometheus/console_libraries\
--web.listen-address="0.0.0.0:9091"
[Install]
WantedBy=multi-user.target
Enable Prometheus at startup
sudo systemctl enable prometheus
Start Prometheus service
sudo systemctl start prometheus
Check the status of the Prometheus service
sudo systemctl status prometheus