Introduction

The Operations Central Monitoring setup collects monitoring information from across the instrument, and provides monitoring dashboards as well as an alarm-management system on top. It provides you with the following user services:

As well as the following backing services to support the setup:

  • A Prometheus database that collects monitoring information from across the instrument, exposed on http://localhost:9091,

  • A Node Exporter scraper that collects monitoring information of the host running this software stack, exposed on http://localhost:9100.

Hint

The URLs assume you’re running this software on localhost. Replace this with the hostname of the hosting system if you’re accessing this software on a server.

The services are connected as follows. The green components are part of this software package, the gray components are external:

digraph monitoring_setup {
    layout=dot;
    nodesep=1.2;

    fontname="Helvetica,Arial,sans-serif"
    node [fontname="Helvetica,Arial,sans-serif" fontsize="20pt" style=filled fixedsize=true]
    edge [fontname="Helvetica,Arial,sans-serif" fontsize="20pt"]
    rankdir=TB;

    node [shape=ellipse height=1 width=2 color=gray];
    slack;

    node [shape=rectangle width=1 color=gray];
    user;

    subgraph cluster_operational_central_management {
        color=black;
        label="Operational Central Management";

        node [shape=ellipse height=1 width=2 color=aquamarine];
        prometheus; grafana; alerta; node_exporter;

        prometheus -> grafana [label="query results"];
        grafana -> alerta [label="alerts"];
        node_exporter -> prometheus [label="metrics"];
        grafana -> prometheus [label="metrics"];
        prometheus -> prometheus [label="metrics"];
    }

    subgraph cluster_station {
        label="LOFAR2.0 Station";
        node [shape=ellipse height=1 width=2 color=gray];
        station_prometheus [label="prometheus"];
        station_grafana [label="grafana"]
        station_node_exporter [label="node_exporter"]
        hardware
        tango_devices;
        exporter;
        jupyter;

        station_node_exporter -> station_prometheus [label="metrics"];
        station_prometheus -> station_grafana [label="query results"];
        station_grafana -> station_prometheus [label="metrics"];
        hardware -> tango_devices [label="M&C"];
        tango_devices -> exporter [label="metrics"];
        exporter -> station_prometheus [label="metrics"];
        station_prometheus -> jupyter [label="metrics"];
        tango_devices -> jupyter [label="M&C"]
    }

    station_prometheus -> prometheus [label="metrics" minlen=1];
    station_grafana -> user [label="dashboards"]
    jupyter -> user [label="M&C"];
    alerta -> slack [label="notifications"];

    grafana -> user [label="dashboards"];
    alerta -> user [label="notifications"];
    slack -> user [label="notifications"];
}