Agoric monitoring by Prometheus + Grafana [EN]

Introduction

AlexOnGoods
4 min readApr 21, 2021

Hello. In this article, we will finally begin to learn how to monitor servers according to the parameters that are of interest to us. We will monitor using Prometheus and Grafana.

In our case, the task is to monitor CPU and memory usage during the Agoric test network. Since our topic is quite voluminous, in order to make it easier to digest the material, I will divide the process of setting up monitoring into several articles. And we will start with general information in order to understand what this is about! :)

Agoric’ Monitoring by Prometheus and Grafana

Both Prometheus and Grafana are open-source enterprise-grade distributed monitoring solutions. They are open source.

1. Let’s talk about Prometheus

What is Prometheus?

Prometheus is an open-source systems monitoring and alerting toolkit originally built at SoundCloud. Since its inception in 2012, many companies and organizations have adopted Prometheus, and the project has a very active developer and user community. It is now a standalone open source project and maintained independently of any company. To emphasize this, and to clarify the project’s governance structure, Prometheus joined the Cloud Native Computing Foundation in 2016 as the second hosted project, after Kubernetes.

For more elaborate overviews of Prometheus, see the resources linked from the media section.

Features

Prometheus’s main features are:

  • a multi-dimensional data model with time series data identified by metric name and key/value pairs
  • PromQL, a flexible query language to leverage this dimensionality
  • no reliance on distributed storage; single server nodes are autonomous
  • time series collection happens via a pull model over HTTP
  • pushing time series is supported via an intermediary gateway
  • targets are discovered via service discovery or static configuration
  • multiple modes of graphing and dashboarding support

Components

The Prometheus ecosystem consists of multiple components, many of which are optional:

Most Prometheus components are written in Go, making them easy to build and deploy as static binaries.

Architecture

This diagram illustrates the architecture of Prometheus and some of its ecosystem components:

Prometheus scrapes metrics from instrumented jobs, either directly or via an intermediary push gateway for short-lived jobs. It stores all scraped samples locally and runs rules over this data to either aggregate and record new time series from existing data or generate alerts. Grafana or other API consumers can be used to visualize the collected data.

2. Let’s talk about Grafana

You can use the product for free!

Grafana is a multi-platform open source analytics and interactive visualization web application. It provides charts, graphs, and alerts for the web when connected to supported data sources. A licensed Grafana Enterprise version with additional capabilities is also available as a self-hosted installation or an account on the Grafana Labs cloud service. It is expandable through a plug-in system. End users can create complex monitoring dashboards using interactive query builders. Grafana is divided into a front end and back end, written in TypeScript and Go, respectively.

As a visualization tool, Grafana is a popular component in monitoring stacks, often used in combination with time series databases such as Prometheus or Zabbix.

3. Let’s try to set up Agoric monitoring.

When working with a validator, it is necessary to take into account not only such factors as:
✅Is the validator in jail.
✅Is the latest version of the app worth it.
✅Is the load on the cores evenly distributed.
✅Is there enough free space on the hard disk.
✅Are there any problems when reading and writing blocks.
✅Is your RAM loaded.
✅Is your internet connection stable?
✅and many other factors.

To monitor such metrics, you can use the Grafana analytics and interactive visualization platform.

Instruction from Agoric team.

Verify available versions

sudo apt-cache policy prometheus

Create Prometheus user

sudo useradd -M -r -s /bin/false prometheus

Create Prometheus directories

sudo mkdir /etc/prometheus /var/lib/prometheus

Dowload binaries

wget https://github.com/prometheus/prometheus/releases/download/v2.18.1/prometheus-2.18.1.linux-amd64.tar.gz

Extract and Install

tar xzf prometheus-2.18.1.linux-amd64.tar.gz  
sudo cp prometheus-2.18.1.linux-amd64/{prometheus,promtool} /usr/local/bin/
sudo chown prometheus:prometheus /usr/local/bin/{prometheus,promtool}
sudo cp -r prometheus-2.18.1.linux-amd64/{consoles,console_libraries} /etc/prometheus/

Setup Prometheus configuration

sudo cp prometheus-2.18.1.linux-amd64/prometheus.yml /etc/prometheus/prometheus.yml  
sudo nano /etc/prometheus/prometheus.yml

Configuration section for scraping

static_configs:  
- targets: ['localhost:9091']

- job_name: 'Agoric-Validator'
static_configs:
- targets: ['localhost:26660']
labels:
group: 'Validator'

- job_name: node
static_configs:
- targets: ['localhost:9100']

Set the user and group ownership on the configuration file

sudo chown -R prometheus:prometheus /etc/prometheus  
sudo chown prometheus:prometheus /var/lib/prometheus

Create a systemd file for Prometheus — in this scenario Prometheus uses 9091 as listen port

sudo nano /etc/systemd/system/prometheus.service

Now copy the section below:

[Unit]  
Description=Prometheus Time Series Collection and Processing Server
Wants=network-online.target
After=network-online.target

[Service]
User=prometheus
Group=prometheus
Type=simple
ExecStart=/usr/local/bin/prometheus \
--config.file /etc/prometheus/prometheus.yml \
--storage.tsdb.path /var/lib/prometheus/ \
--web.console.templates=/etc/prometheus/consoles \
--web.console.libraries=/etc/prometheus/console_libraries\
--web.listen-address="0.0.0.0:9091"

[Install]
WantedBy=multi-user.target

Enable Prometheus at startup

sudo systemctl enable prometheus

Start Prometheus service

sudo systemctl start prometheus

Check the status of the Prometheus service

sudo systemctl status prometheus

--

--