Clustering FiFo
Application Types
FiFo is built to maintain the highest possible uptime: this means all components fall into one of two categories:
Stateless
Maintaining availability in stateless applications is pretty simple, it's 'the cloud way': you can fire up multiple instances and since they don't need to share state, a failing instance can simply be replaced by a new one or discarded altogether.
The following components are stateless:
To maintain availability, you will have to spawn at least two of each and put them behind a load balancer. In the current version this task has to be performed by the user.
Cluster/Distribution
Some components can't be stateless since they need to maintain some kind of data. To achieve a high level of availability these components run in a clustered/distributed mode. FiFo uses riak core as a distribution layer. This means that it follows an eventual consistent model and utilises all nodes that are present in the cluster which allows FiFo to survive single nodes failing as long as a minimum number of nodes is present.
The following components are based on riak core:
Due to these services being based on riak core it is recommended to run at least five instances of this service on different hardware machines to guarantee continual uptime. This cluster is called a ring in riak terms.
This has to do with the dynamo paper for those interested.
Chunter is a special case since it is directly tied to a hardware node which makes high availability irrelevant.
Sniffle / Snarl / Howl
Changing the cluster
If the cluster is set up before the system goes into production there is not much to worry about. However there are situations where it is required to extend or shrink the cluster during production. Generally there is nothing that prevents extending or shrinking a cluster during production but it requires the user to perform an additional step.
To prevent interruptions it is necessary to disable the
mDNSon the nodes that you want to add or remove. Otherwise other services will think these nodes are valid parts of the cluster and therefore try to access them. To disablemDNSthe configuration optionmdns.server = disabledneeds to be added to the Sniffle Configuration File, Snarl Configuration File or Howl Configuration File depending on the service added or removed. This should be done before the services are first started or removed / leaves.
In the following section the existing server will be named sniffle@10.0.0.1 and the new server will be named sniffle@10.0.0.2. If multiple servers exist it does not matter which one is picked. Also keep in mind that for working with a Howl or Snarl cluster the section before the @ needs to be replaced with howl or snarl respectively.
New Cluster Node Preparation
Prepare a new FiFo Node as described in the Installation section.
The first Fifo Node or existing FiFo Cluster will likely already contain an admin account so that the last step of the Installation section is not needed. If additional admin accounts are created it will not fail the join operation. A clean up of redundancies can be performed within the UI or using the
fifoadmcommand.
Adding a server
To add a server the following steps need to be taken:
- on the existing server check the server status with
ring statusandmember status. - on the new server join the existing server with the
cluster joincommand. - when all new servers are in
joiningstage executecluster commit - on the existing server re check the server status with
ring-statusandmember-statuscommands.
Exact descriptions of all commands can be found in the Sniffle Sniffle Administration Guide, Snarl Administration Guide and Howl Administration Guide.
Removing a server
Removing a server is as simple as adding one. To ensure it is not picked up again after it is removed the line mdns.server = disabled should be added to the configuration file of the server you want to remove.
- on the existing server check the server status with
ring-statusandmember-status. - on the existing server re check the server status with
ring-statusandmember-statuscommands. - once the server is removed it is restarted and can be shut down using the
svcadm disable snifflecommand.
Updated less than a minute ago
