Tachyon is a monitoring system for SmartOS (and in theory other Illumos/Solaris derivates) build to monitor large server farms and deal with the amount of data generated there. It does integrate with FiFo, however it is perfectly capable of running on it’s own.
- Minimal impact by using kstat directly.
- Resilient to network or component failure.
- Low and predictable space usage ~12bit / metric / second.
- Fast query times (1 or 2 digest milliseconds).
- High throughput and near linear scaling.
- No modification of NGZ’s required, new zones will automatically be picked up.
Tachyon has a multi layered architecture that balances data correctness and availability. The general premise it works under is that over all availability is at a greater importance then absolute correctness. Aside of tachyon itself two additional components form the stack:
- The nsq message queue, it is used to allow buffering messages in the case of outages, while buffered messages will not be visible in the database they state can later be completed.
- The DalmatinerDB database, to handle the throughput and storage requirements.
- The Grafana dashboard, to visualize the metrics and create dashboards. (plugin required!)
At the lowest level the tachyon-meter, a slightly modified kstat, gathers statistics directly from the kernel and sends them to a nsqd process. kstat is a wonderful tool for that we easily can collect hundreds if not thousands of metrics from a system with virtually no (or minimal) impact to the system. Timestamps are generated locally (at send time) so a delayed delivery does not affect the precision, and it easily can be correlated by other metrics generated on the system.
The Tachyon Aggregator is the second station metrics take, it decodes the kstat package and translates it into a insert statement that DalmatinerDB can understand. Those services are stateless and if the connection to DalmatinerDB is interrupted they will re-queue packages to NSQ for later processing.
The data is persisted in DalmatinerDB and from there can be queried by FiFo the DalmatinerDB-Frontend or Grafana.
Nsq requires a 15.4 dataset version.
Installing NSQ in a zone is rather simple, Project-FiFo provides a SmartOS package that can be installed with the following command. The package includes all nsq components, the nsqd queue as well as the admin interface and the nsqadmind, all of them can be configured over the SMF configuration parameters.
Before you proceed please make sure to have the FiFo GPG key installed as described in Installing FiFo.
cd /data curl -O https://project-fifo.net/fifo.gpg gpg --primary-keyring /opt/local/etc/gnupg/pkgsrc.gpg --import < fifo.gpg gpg --keyring /opt/local/etc/gnupg/pkgsrc.gpg --fingerprint #echo "http://release.project-fifo.net/pkg/15.4.1/rel" >> /opt/local/etc/pkgin/repositories.conf echo "http://release.project-fifo.net/pkg/rel" >> /opt/local/etc/pkgin/repositories.conf pkgin -fy up pkgin install nsq
Another option is to install the nsqd in the GZ however that comes with it’s own issues and at this point is not recommended.
The NSQ config is done via the SMF configuration interface changing the configuration works like this:
svccfg -s svc:/network/nsqd:default svc:/network/nsqd:default> addpg application application svc:/network/nsqd:default> setprop application/broadcast-address=astring: "<yourip>" svc:/network/nsqd:default> setprop application/lookupd-tcp-address="127.0.0.1:4160" svc:/network/nsqd:default> validate svc:/network/nsqd:default> refresh
The same applies for
nsqlookupd instances. The available configuration parameters can be read via:
svccfg export nsqd | grep propval.
The tachyon meter can be downloaded from http://release.project-fifo.net/gz/rel/ and installed the same way as chunter. There is a config file in which the IP of the nsqd process has to be specified.
cd /opt curl -O http://release.project-fifo.net/gz/rel/tachyon-meter-latest.gz gunzip tachyon-meter-latest.gz sh tachyon-meter-latest
/opt/tachyon-meter/etc/tachyon.conf needs to be edited
# The NSQd host to send data to host=192.168.1.41 # Needs to be changed to the IP of the zone hosting the NSQd daemon # The port NSQd listens to HTTP messages port=4151 # Does not need to be changed # Tne NSQ topic to send to topic=tachyon # Does not need to be changed # The interval to send data to NSQ to in seconds interval=1 # does not need to be changed # The hostname to identify the server with ## Will try to pick up chunters host_id file if existing otherwise ## simply use the hostname. if [ -f /opt/chunter/etc/host_id ] then hostname="$(cat /opt/chunter/etc/host_id)" else hostname="$(hostname)" fi is_smf=yes # Does not need to be changed, required for backgrouding in the SMF
The aggregator can be installed out of the Project FiFo package repository, either via
pkg_add or via
pkgin install if the repository was added to the dependencies. The package name is
/data/tachyon/etc/tachyon.conf needs to be edited, most options are explained in the file, the two most important ones are the following:
## The DalmatinerDB backend (if used). ## ## Default: 127.0.0.1:5555 ## ## Acceptable values: ## - an IP/port pair, e.g. 127.0.0.1:10011 ddb = 192.168.1.42:5555 # Needs to be changed to point to one dalmatinerdb host ## One more more nsqlookupd http interfaces for tachyon to discover ## the channels. ## ## Default: 127.0.0.1:4161 ## ## Acceptable values: ## - an IP/port pair, e.g. 127.0.0.1:10011 nsqlookupd.name = 127.0.0.1:4161 # Neds to be pointed to a nsq lookup deamon, # more then one of this can be used with # different names
The tachyon meter features a powerful rule engine that allows the routing (kstat package to ddb metric) to be defined and route and decide to keep or discard metrics. The syntax roughly resembles Erlang (or prolog for that matter) however the effects are different.
Packages are passed through the rules from top to bottom, if a rule matches no further rules are checked.
In general a rule is written in the form
<bucket>(<condition>) -> [<target metric>]. There is a special case of
ignore(<condition>) which means that a metric matching this condition is discarded and not send on.
Each rule can have one or more conditions, conditions have two forms:
- Matches: take the form
<key> = <value>where key can be one of the following, most of the correspondent to the kstat field with the same name.
host- the id or hostname of the host the metric was send from (specified in tachyon meter
uuid- the uuid/id of an object, for zone this is the zone-uuid, for other values it will be picked if config/uuid is present
- Keywords: as
gz, a shortcut for
uuid = "global"
The target metric is defined as an array, each element can either be a field as explained in the matcher or a string constant.
An example set to route
sd (disk) related metrics to
servers/<hostname>.’disk’.<instance> (and below) would look like this:
%% %% Disks %% server(gz, module = "sd") -> [host, "disk", instance, "metrics", key]. server(gz, module = "sderr", key = "Hard Errors") -> [host, "disk", instance, "errors", "hard"]. server(gz, module = "sderr", key = "Soft Errors") -> [host, "disk", instance, "errors", "soft"]. server(gz, module = "sderr", key = "Transport Errors") -> [host, "disk", instance, "errors", "transport"]. server(gz, module = "sderr", key = "Predictive Failure Analysis") -> [host, "disk", instance, "errors", "predicted_failures"]. server(gz, module = "sderr", key = "Illegal Request") -> [host, "disk", instance, "errors", "illegal"].
There is currently no SmartOS package for Grafana2, it requires manual compilation, you can follow the installation guide. Since DalmatinerDB does not ship as a native datasource we maintain a fork.
Once installed you can add DalmatinerDB to the dependencies, the default port of the Dalmatiner Frontend server is 8080.
For the time being, a precompiled binary of grafana with our changes can be downloaded here.
It mostly configured over the web interface, oterhwise see the offical documentation.
Updated less than a minute ago