Monitoring
There are multiple things to do to monitor FiFo.
General
The most basic way thing to monitor is the same as for any application, CPU, memory, IO, network saturation, and disk usage.
Services
FiFo-Zone
In the FiFo zone the following SMF services can be checked in every installation:
sniffle
- main management servicesnarl
- AAA servicehowl
- API endpointepmd
- Supporting Erlang service
In addition if other services are installed the following services might exist:
fifo_dns
- DNS endpointkennel
- Docker endpointdalmatiner/db
- DalmatinerDBdalmatiner/fe
- Dalmatiner Frontendtachyon
- Tachyon aggregatornsqadmin
- NSQ Admin nodensqd
- NSQ daemon
Global Zone
In the global zone the following SMF services can be checked in every installation:
fifo/zlogin
- FiFo zlogin servicechunter
- Zone manager
In addition if other services are installed the following services might exist:
tachyon-meter
- Tachyonnsqadmin
- NSQ Admin nodensqd
- NSQ daemon
Services
RiakCore based services (sniffle, snarl, howl)
All those services share some common commands to get status from them, they are all called -admin
, so sniffle-admin
, snarl-admin
, and howl-admin
. For the following section snarl-admin
will stand in as a default example.
Status
This is a very generic command that will give a short information if the cluster is in a good state or not. It can be used by running snarl-admin status
.
[root@75f04bc5-a457-4351-ce32-e3a5f2ace850 ~]# snarl-admin status
The cluster is fine!
Handoffs
Handoffs mean that data is transferred from one node to another, this happens after communication between nodes was lost, a node went down/was restarted or nodes were added or removed.
Generally handoffs are part of normal operations not not a reason for concern, however them appearing frequently can indicate underlaying problems as crashes, network partitions or very high overload.
[root@75f04bc5-a457-4351-ce32-e3a5f2ace850 ~]# snarl-admin handoff summary
Each cell indicates active transfers and, in parenthesis, the number of all known transfers.
The 'Total' column is the sum of the active transfers.
+----------------------+-----+---------+------+------+------+
| Node |Total|Ownership|Resize|Hinted|Repair|
+----------------------+-----+---------+------+------+------+
| snarl@192.168.1.43 | 0 | | | | |
| snarl@192.168.1.44 | 0 | | | | |
| snarl@192.168.1.45 | 0 | | | | |
| snarl@192.168.1.46 | 0 | | | | |
+----------------------+-----+---------+------+------+------+
Handoff status can be checked by running snarl-admin handoff status
Cluster membership
It is possible to check the cluster membership of all nodes with the sniffle-admin member-status
command, it will give an idea of data distribution between nodes and indicate if all nodes agree on the cluster layout. Also leaving and joining nodes can be seen here.
[root@75f04bc5-a457-4351-ce32-e3a5f2ace850 ~]# snarl-admin member-status
================================= Membership ==================================
Status Ring Pending Node
-------------------------------------------------------------------------------
valid 25.0% -- 'snarl@192.168.1.43'
valid 25.0% -- 'snarl@192.168.1.44'
valid 25.0% -- 'snarl@192.168.1.45'
valid 25.0% -- 'snarl@192.168.1.46'
-------------------------------------------------------------------------------
Valid:4 / Leaving:0 / Exiting:0 / Joining:0 / Down:0
Ring Status
It is possible to gather general information about the distribution mechanism with the snarl-admin ring-status
command.
[root@75f04bc5-a457-4351-ce32-e3a5f2ace850 ~]# snarl-admin ring-status
================================== Claimant ===================================
Claimant: 'snarl@192.168.1.43'
Status: up
Ring Ready: true
============================== Ownership Handoff ==============================
No pending changes.
============================== Unreachable Nodes ==============================
All nodes are up and reachable
Ports
All ports are configurable the following table will list only the defaults. Additional ports are allocated dynamically via the epmd
service.
Service | Ports |
---|---|
Snarl | 8099, 4200, 4201 |
Sniffle | 8199, 4210 |
Howl | 80, 443, 4240, 8499 |
Chunter | 4200 |
LeoFS
LeoFS can be checked by running the status command on one of the leofs managers.
Updated less than a minute ago