Tachyon
Tachyon is a monitoring system for SmartOS (and in theory other Illumos/Solaris derivates) build to monitor large server farms and deal with the amount of data generated there. It does integrate with FiFo, however it is perfectly capable of running on it’s own.
Features
- Minimal impact by using kstat directly.
- Resilient to network or component failure.
- Low and predictable space usage ~12bit / metric / second.
- Fast query times (1 or 2 digest milliseconds).
- High throughput and near linear scaling.
- No modification of NGZ’s required, new zones will automatically be picked up.
Architecture
Tachyon has a multi layered architecture that balances data correctness and availability. The general premise it works under is that over all availability is at a greater importance then absolute correctness. Aside of tachyon itself two additional components form the stack:
- The nsq message queue, it is used to allow buffering messages in the case of outages, while buffered messages will not be visible in the database they state can later be completed.
- The DalmatinerDB database, to handle the throughput and storage requirements.
- The Grafana dashboard, to visualize the metrics and create dashboards. (plugin required!)
At the lowest level the tachyon-meter, a slightly modified kstat, gathers statistics directly from the kernel and sends them to a nsqd process. kstat is a wonderful tool for that we easily can collect hundreds if not thousands of metrics from a system with virtually no (or minimal) impact to the system. Timestamps are generated locally (at send time) so a delayed delivery does not affect the precision, and it easily can be correlated by other metrics generated on the system.
The Tachyon Aggregator is the second station metrics take, it decodes the kstat package and translates it into a insert statement that DalmatinerDB can understand. Those services are stateless and if the connection to DalmatinerDB is interrupted they will re-queue packages to NSQ for later processing.
The data is persisted in DalmatinerDB and from there can be queried by FiFo the DalmatinerDB-Frontend or Grafana.
NSQ
Installation
Installing NSQ in a zone is rather simple, Project-FiFo provides a SmartOS package that can be installed with the following command. The package includes all nsq components, the nsqd queue as well as the admin interface and the nsqadmind, all of them can be configured over the SMF configuration parameters.
pkg_add http://release.project-fifo.net/pkg/rel/nsq
Another option is to install the nsqd in the GZ however that comes with it’s own issues and at this point is not recommended.
Configuration
The NSQ config is done via the SMF configuration interface changing the configuration works like this:
svccfg -s svc:/network/nsqd:default
svc:/network/nsqd:default> addpg application application
svc:/network/nsqd:default> setprop application/lookupd-tcp-address="127.0.0.1:4160"
svc:/network/nsqd:default> refresh
The same applies for nsqadmin
and nsqlookupd
instances. The available configuration parameters can be read via: svccfg export nsqd | grep propval
.
Tachyon meter
Installation
The tachyon meter can be downloaded from http://release.project-fifo.net/gz/dev/ and installed the same way as chunter. There is a config file in which the IP of the nsqd process has to be specified.
cd /opt
curl -O http://release.project-fifo.net/gz/dev/tachyon-meter-latest.gz
gunzip tachyon-meter-latest.gz
sh tachyon-meter-latest
Configuration
The file /opt/tachyon-meter/etc/tachyon.conf
needs to be edited
# The NSQd host to send data to
host=192.168.1.41 # Needs to be changed to the IP of the zone hosting the NSQd daemon
# The port NSQd listens to HTTP messages
port=4151 # Does not need to be changed
# Tne NSQ topic to send to
topic=tachyon # Does not need to be changed
# The interval to send data to NSQ to in seconds
interval=1 # does not need to be changed
# The hostname to identify the server with
## Will try to pick up chunters host_id file if existing otherwise
## simply use the hostname.
if [ -f /opt/chunter/etc/host_id ]
then
hostname="$(cat /opt/chunter/etc/host_id)"
else
hostname="$(hostname)"
fi
is_smf=yes # Does not need to be changed, required for backgrouding in the SMF
Tachyon Aggregator
Installation
The aggregator can be installed out of the Project FiFo package repository, either via pkg_add
or via pkgin
install if the repository was added to the dependencies. The package name is tachyon
.
Configuration
The file /opt/local/tachyon/etc/tachyon.conf
needs to be edited, most options are explained in the file, the two most important ones are the following:
## The DalmatinerDB backend (if used).
##
## Default: 127.0.0.1:5555
##
## Acceptable values:
## - an IP/port pair, e.g. 127.0.0.1:10011
ddb = 192.168.1.42:5555 # Needs to be changed to point to one dalmatinerdb host
## One more more nsqlookupd http interfaces for tachyon to discover
## the channels.
##
## Default: 127.0.0.1:4161
##
## Acceptable values:
## - an IP/port pair, e.g. 127.0.0.1:10011
nsqlookupd.name = 127.0.0.1:4161 # Neds to be pointed to a nsq lookup deamon,
# more then one of this can be used with
# different names
Rules
The tachyon meter features a powerful rule engine that allows the routing (kstat package to ddb metric) to be defined and route and decide to keep or discard metrics. The syntax roughly resembles Erlang (or prolog for that matter) however the effects are different.
Packages are passed through the rules from top to bottom, if a rule matches no further rules are checked.
In general a rule is written in the form <bucket>(<condition>) -> [<target metric>]
. There is a special case of ignore(<condition>)
which means that a metric matching this condition is discarded and not send on.
Each rule can have one or more conditions, conditions have two forms:
- Matches: take the form
<key> = <value>
where key can be one of the following, most of the correspondent to the kstat field with the same name.host
- the id or hostname of the host the metric was send from (specified in tachyon meteruuid
- the uuid/id of an object, for zone this is the zone-uuid, for other values it will be picked if config/uuid is presentname
module
class
key
- Keywords: as
gz
, a shortcut foruuid = "global"
The target metric is defined as an array, each element can either be a field as explained in the matcher or a string constant.
An example set to route sd
(disk) related metrics to servers/<hostname>.’disk’.<instance>
(and below) would look like this:
%%
%% Disks
%%
server(gz, module = "sd") ->
[host, "disk", instance, "metrics", key].
server(gz, module = "sderr", key = "Hard Errors") ->
[host, "disk", instance, "errors", "hard"].
server(gz, module = "sderr", key = "Soft Errors") ->
[host, "disk", instance, "errors", "soft"].
server(gz, module = "sderr", key = "Transport Errors") ->
[host, "disk", instance, "errors", "transport"].
server(gz, module = "sderr", key = "Predictive Failure Analysis") ->
[host, "disk", instance, "errors", "predicted_failures"].
server(gz, module = "sderr", key = "Illegal Request") ->
[host, "disk", instance, "errors", "illegal"].
Grafana
Installation
There is currently no SmartOS package for Grafana2, it requires manual compilation, you can follow the installation guide. Since DalmatinerDB does not ship as a native datasource we maintain a fork.
Once installed you can add DalmatinerDB to the dependencies, the default port of the Dalmatiner Frontend server is 8080.
For the time being, a precompiled binary of grafana with our changes can be downloaded here.
Configuration
It mostly configured over the web interface, oterhwise see the offical documentation.
Updated less than a minute ago