Monitoring

There are multiple things to do to monitor FiFo.

General

The most basic way thing to monitor is the same as for any application, CPU, memory, IO, network saturation, and disk usage.

Services

FiFo-Zone

In the FiFo zone the following SMF services can be checked in every installation:

  • sniffle - main management service
  • snarl - AAA service
  • howl - API endpoint
  • epmd - Supporting Erlang service

In addition if other services are installed the following services might exist:

  • fifo_dns- DNS endpoint
  • kennel - Docker endpoint
  • dalmatiner/db - DalmatinerDB
  • dalmatiner/fe - Dalmatiner Frontend
  • tachyon - Tachyon aggregator
  • nsqadmin - NSQ Admin node
  • nsqd - NSQ daemon

Global Zone

In the global zone the following SMF services can be checked in every installation:

  • fifo/zlogin - FiFo zlogin service
  • chunter - Zone manager

In addition if other services are installed the following services might exist:

  • tachyon-meter - Tachyon
  • nsqadmin - NSQ Admin node
  • nsqd - NSQ daemon

Services

RiakCore based services (sniffle, snarl, howl)

All those services share some common commands to get status from them, they are all called -admin, so sniffle-admin, snarl-admin, and howl-admin. For the following section snarl-admin will stand in as a default example.

Status

This is a very generic command that will give a short information if the cluster is in a good state or not. It can be used by running snarl-admin status.

[root@75f04bc5-a457-4351-ce32-e3a5f2ace850 ~]# snarl-admin status
The cluster is fine!

Handoffs

Handoffs mean that data is transferred from one node to another, this happens after communication between nodes was lost, a node went down/was restarted or nodes were added or removed.

Generally handoffs are part of normal operations not not a reason for concern, however them appearing frequently can indicate underlaying problems as crashes, network partitions or very high overload.

[root@75f04bc5-a457-4351-ce32-e3a5f2ace850 ~]# snarl-admin handoff summary
Each cell indicates active transfers and, in parenthesis, the number of all known transfers.
The 'Total' column is the sum of the active transfers.
+----------------------+-----+---------+------+------+------+
|         Node         |Total|Ownership|Resize|Hinted|Repair|
+----------------------+-----+---------+------+------+------+
|  snarl@192.168.1.43  |  0  |         |      |      |      |
|  snarl@192.168.1.44  |  0  |         |      |      |      |
|  snarl@192.168.1.45  |  0  |         |      |      |      |
|  snarl@192.168.1.46  |  0  |         |      |      |      |
+----------------------+-----+---------+------+------+------+

Handoff status can be checked by running snarl-admin handoff status

Cluster membership

It is possible to check the cluster membership of all nodes with the sniffle-admin member-status command, it will give an idea of data distribution between nodes and indicate if all nodes agree on the cluster layout. Also leaving and joining nodes can be seen here.

[root@75f04bc5-a457-4351-ce32-e3a5f2ace850 ~]# snarl-admin member-status
================================= Membership ==================================
Status     Ring    Pending    Node
-------------------------------------------------------------------------------
valid      25.0%      --      'snarl@192.168.1.43'
valid      25.0%      --      'snarl@192.168.1.44'
valid      25.0%      --      'snarl@192.168.1.45'
valid      25.0%      --      'snarl@192.168.1.46'
-------------------------------------------------------------------------------
Valid:4 / Leaving:0 / Exiting:0 / Joining:0 / Down:0

Ring Status

It is possible to gather general information about the distribution mechanism with the snarl-admin ring-status command.

[root@75f04bc5-a457-4351-ce32-e3a5f2ace850 ~]# snarl-admin ring-status
================================== Claimant ===================================
Claimant:  'snarl@192.168.1.43'
Status:     up
Ring Ready: true

============================== Ownership Handoff ==============================
No pending changes.

============================== Unreachable Nodes ==============================
All nodes are up and reachable

Ports

All ports are configurable the following table will list only the defaults. Additional ports are allocated dynamically via the epmd service.

ServicePorts
Snarl8099, 4200, 4201
Sniffle8199, 4210
Howl80, 443, 4240, 8499
Chunter4200

LeoFS

LeoFS can be checked by running the status command on one of the leofs managers.