Troubleshooting

Problem Checklist

We often encounter users reporting similar kinds of problems. Below is a checklist to help you track them down more easily.

First some general information you’ll need:

Command
Hypervisor IPifconfig
Zone IPifconfig
Chunter Versioncat /opt/chunter/etc/chunter.version
Sniffle Versionpkgin list | grep sniffle
Howl Versionpkgin list | grep howl
Snarl Versionpkgin list | grep snarl

All troubleshooting items below will be marked with a Legend/abbreviation which tells you in which environment or zone you should run the troubleshooting commands from:

AbbreviationRun location
GZGlobal Zone (Hypervisor / Server itself)
FZFiFo Zone
LZLeoFS Zone
DZDalmatinerDB Zone
TZTachyon Zone
CCClient Command (any client)

📘

Zone Abbreviations

In certain circumstances you may be running multiple services in the same zone - if that is the case the abbreviation simply means run this command in the Zone where the service/s are running.

🚧

User Interface - problems

If you have UI related problems please first attempt to clear your web browser cookies. This resolves a lot of UI problems.

General Checks

(GZ) Is your FiFo Zone running?

$ vmadm list | grep fifo

cae70926-cdc8-430f-9138-e06ac596c3ad  OS    512      running           fifo

(FZ) Are all your processes running?

$ ps -afe | grep run_erl | grep -v grep

0000104 99374 1 0 08:39:13 ? 0:00 /opt/local/sniffle/erts-5.9.1/bin/run_erl -daemon /tmp/sniffle/ /var/log/sniffl
0000103 42238 1 0 06:55:02 ? 0:00 /opt/local/snarl/erts-5.9.1/bin/run_erl -daemon /tmp/snarl/ /var/log/snarl exec
0000106 56504 1 0 07:20:50 ? 0:00 /opt/local/howl/erts-5.9.1/bin/run_erl -daemon /tmp/howl /var/log/howl exec /op

(FZ) Services are going to maintenance
If you are using the release build make sure your CPU supports AVX, consult the manufacturers spec sheet for this.

To get additional information it is also possible to run the service manually using <service> console for example sniffle console and look at the output that will print out possible configuration errors and other debug information to the shell.

(GZ) Are all your processes running?

$ ps -afe | grep run_erl | grep -v grep

root 48037 1 0 Dec 27 ? 0:00 /opt/chunter/erts-5.9.1/bin/run_erl -daemon /tmp/chunter /var/log/chunter exec
root 48033 1 0 Dec 27 ? 0:00 /opt/chunter/erts-5.9.1/bin/run_erl -daemon /tmp/fifo_zlogin /var/log/fifo_zlogin exec

(GZ) Is your Chunter services running?

$ svcs chunter fifo/zlogin

STATE  STIME  FMRI
online Jan_27 svc:/network/chunter:default
online Jan_27 svc:/fifo/zlogin:default

(GZ) Is there communication between your Global Zone and FiFo?

$ ping $FIFOZONEIP

192.168.0.126 is alive

(GZ) Does DNS reach your hypervisor?

might take ~2 minutes

$ snoop inet 224.0.0.251

172.16.234.10 -> 224.0.0.251 MDNS R _snarl._tcp.local. Internet PTR build._snarl._tcp.local.
172.16.234.10 -> 224.0.0.251 MDNS R _howl._tcp.local. Internet PTR build._howl._tcp.local.
172.16.234.10 -> 224.0.0.251 MDNS R _sniffle._tcp.local. Internet PTR build._sniffle._tcp.local.

(GZ) Is Chunter listening on the correct IP Address?

$ grep ip /opt/chunter/etc/chunter.conf

ip = <ip in the same network as the fifo zone>:4200

should be the ip address of your hypervisor within the same network as your fifo zone

(FZ) Does your FiFo zone have network connectivity?

$ ping project-fifo.net

project-fifo.net is alive

(FZ) Are all your FiFo Services running?

$ svcs sniffle snarl howl

STATE  STIME  FMRI
online Jan_25 svc:/network/snarl:default
online Jan_25 svc:/network/howl:default
online Jan_25 svc:/network/sniffle:default

(GZ) Is your Memory scaled correctly?

$ prstat

 PID USERNAME  SIZE   RSS STATE  PRI NICE      TIME  CPU PROCESS/NLWP
 27833 howl      109M   54M sleep   56    0   2:06:24 0.3% beam.smp/164
 27916 wiggle     91M   54M sleep   54    0   1:38:29 0.2% beam.smp/77
 27846 snarl     358M   61M sleep   57    0   0:29:30 0.1% beam.smp/172
 27841 sniffle  1112M  771M sleep   56    0   1:39:02 0.1% beam.smp/184

Note that SIZE is much bigger then RSS, this is caused by mmaped files for the database and can cause problems if it grows too big!

(CC) Is the API working, can you manually login via curl?

curl -v "http://<IP>/api/0.1.0/sessions" -H "Content-Type: application/json" -H "Accept: application/json" --data-binary '{"user":"admin","password":"admin"}'

* About to connect() to 192.168.0.204 port 80 (#0)
* Trying 192.168.0.204...
* connected
* Connected to 192.168.0.204 (192.168.0.204) port 80 (#0)
> POST /api/0.1.0/sessions HTTP/1.1
> User-Agent: curl/7.24.0 (x86_64-apple-darwin12.0) libcurl/7.24.0 OpenSSL/0.9.8r zlib/1.2.5
> Host: 192.168.0.204
> Content-Type: application/json
> Accept: application/json
> Content-Length: 33
>
* upload completely sent off: 33 out of 33 bytes
< HTTP/1.1 303 See Other
< Server: nginx/1.3.13
< Date: Wed, 29 May 2013 10:37:59 GMT
< Content-Type: application/json
< Content-Length: 0
< Location: http://<IP>/api/0.1.0/sessions/64bbb7cb-7505-4b01-adf7-c7daf5b5186a
< Connection: keep-alive
< access-control-allow-origin: *
< access-control-allow-headers: content-type, x-snarl-token
< access-control-expose-headers: x-snarl-token
< allow-access-control-credentials: true
< vary: accept
< set-cookie: x-snarl-token=64bbb7cb-7505-4b01-adf7-c7daf5b5186a; Version=1; Expires=Wed, 28-May-2014 10:37:59 GMT; Max-Age=31449600
< x-snarl-token: 64bbb7cb-7505-4b01-adf7-c7daf5b5186a
<
* Connection #0 to host 192.168.0.204 left intact
* Closing connection #0

(FZ) Can Howl connect to Sniffle and Snarl?

$ howl-admin connections
Snarl endpoints.
Hostname                                 Node                Errors
------------------------------ -------------------- ---------------
                  fifo01.local 192.168.1.41:4200                0
Snarl endpoints.
Hostname                                 Node                Errors
------------------------------ -------------------- ---------------
                  fifo01.local 192.168.1.41:4210                0

(FZ) Where do I find log files for a particular service?

$ ls -Fd /data/*/log
/data/howl/log/  /data/snarl/log/  /data/sniffle/log/
$
  • Note: You may see mentions of crashes in the log files - this can happen even on a healthy system that is running fine. Once you found a logfile you want to examine, cat or tail it accordingly*

Hypervisor & VM Checks

(FZ) Can you see your hypervisors being listed?

$ fifoadm hypervisors list

Hypervisor         IP               Memory          State
------------------ ---------------- --------------- -------------
00-15-17-b8-16-fc  172.16.0.4       25064/32699     ok

(FZ) Can you see your machines being listed?

$ fifoadm vms list

List of VMs     zone    fifoadm vms list
UUID                                 Hypervisor        Name            State
------------------------------------ ----------------- --------------- ----------
7df22c41-bade-4b26-b20e-ee2b45e81bf8 00-15-17-b8-16-fc fifo            running
21e0bc5d-af4e-4a44-8137-c7d50870dcbd 00-15-17-b8-16-fc ngnix           running
bf57045f-42ee-42b5-8dc5-201250b7b6f4 00-15-17-b8-16-fc confluence      running
39cd0a98-5087-472c-b89c-e75aef378a22 00-15-17-b8-16-fc dev             stopped
49314fda-fef0-42fa-b974-77d27b097aa1 00-15-17-b8-16-fc korny           running
2362ebf6-4988-4cfd-89ec-004dcc61a63b 00-15-17-b8-16-fc zotonic         stopped
87cc64b1-3990-4cf6-a54d-dbc2e66adddc 00-15-17-b8-16-fc -               installing
1df09840-f2bb-48fb-a3b3-5fe679849baf 00-15-17-b8-16-fc mail            running
6d4a35a6-41d8-4a44-9977-e010b3ed307a 00-15-17-b8-16-fc test            running

(FZ) How do I fetch information for a misbehaving VM?

$ fifoadm vms get -j <uuid>

{
 "hypervisor": "00-15-17-b8-16-fc",
 "state": "installing_dataset"
}

📘

Additional fifoadm commands

There are a lot more calls for fifoadm that can help depending on where things lead.

Other errors

DNS

When using a custom DNS server or host files instead of xip.io many people tend to get the configuration of their systems wrong this problems are easily detectable by looking for nxdomain errors in the log files. i.e. grep nxdomain /var/log/chunter/error.log.

To verify the DNS configuration the tool dig can be used in both the fifo zone and the global zone.

$ dig 10.0.0.100.xip.io

; <<>> DiG 9.8.3-P1 <<>> 10.0.0.100.xip.io
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 47415
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;10.0.0.100.xip.io.		IN	A

;; ANSWER SECTION:
10.0.0.100.xip.io.	300	IN	A	10.0.0.100

;; Query time: 279 msec
;; SERVER: 192.168.1.1#53(192.168.1.1)
;; WHEN: Thu Jul  7 01:44:49 2016
;; MSG SIZE  rcvd: 51

$ dig all-subdomains.10.0.0.100.xip.io

; <<>> DiG 9.8.3-P1 <<>> all-subdomains.10.0.0.100.xip.io
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 32813
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;all-subdomains.10.0.0.100.xip.io. IN	A

;; ANSWER SECTION:
all-subdomains.10.0.0.100.xip.io. 300 IN A	10.0.0.100

;; Query time: 147 msec
;; SERVER: 192.168.1.1#53(192.168.1.1)
;; WHEN: Thu Jul  7 01:45:20 2016
;; MSG SIZE  rcvd: 66

To clarify, all-subdomains in this example means that whatever valid dns hostname is put there needs to resolve to the same IP.

Reporting an issue

JIRA is the best place to file a report. If you do so it is often helpful to attach some logs. They can be found in the /data/{sniffle,snarl,howl}/logs and /var/log/chunter (in the FiFo Zone or GZ respectively).

In addition there is a built in command fifoadm diag that will prepare all your log files and put them in a /var/tmp/[$fifoservice]-diag directory. This is to aid you in your log collection and submission process and to encourage folks to always attach logs when filing bug reports.

In the global zone run /opt/chunter/share/chunter-diag to grab diagnostics data.