Troubleshooting
Problem Checklist
We often encounter users reporting similar kinds of problems. Below is a checklist to help you track them down more easily.
First some general information you’ll need:
Command | |
---|---|
Hypervisor IP | ifconfig |
Zone IP | ifconfig |
Chunter Version | cat /opt/chunter/etc/chunter.version |
Sniffle Version | pkgin list | grep sniffle |
Howl Version | pkgin list | grep howl |
Snarl Version | pkgin list | grep snarl |
All troubleshooting items below will be marked with a Legend/abbreviation which tells you in which environment or zone you should run the troubleshooting commands from:
Abbreviation | Run location |
---|---|
GZ | Global Zone (Hypervisor / Server itself) |
FZ | FiFo Zone |
LZ | LeoFS Zone |
DZ | DalmatinerDB Zone |
TZ | Tachyon Zone |
CC | Client Command (any client) |
Zone Abbreviations
In certain circumstances you may be running multiple services in the same zone - if that is the case the abbreviation simply means run this command in the Zone where the service/s are running.
User Interface - problems
If you have UI related problems please first attempt to clear your web browser cookies. This resolves a lot of UI problems.
General Checks
(GZ) Is your FiFo Zone running?
$ vmadm list | grep fifo
cae70926-cdc8-430f-9138-e06ac596c3ad OS 512 running fifo
(FZ) Are all your processes running?
$ ps -afe | grep run_erl | grep -v grep
0000104 99374 1 0 08:39:13 ? 0:00 /opt/local/sniffle/erts-5.9.1/bin/run_erl -daemon /tmp/sniffle/ /var/log/sniffl
0000103 42238 1 0 06:55:02 ? 0:00 /opt/local/snarl/erts-5.9.1/bin/run_erl -daemon /tmp/snarl/ /var/log/snarl exec
0000106 56504 1 0 07:20:50 ? 0:00 /opt/local/howl/erts-5.9.1/bin/run_erl -daemon /tmp/howl /var/log/howl exec /op
(FZ) Services are going to maintenance
If you are using the release build make sure your CPU supports AVX, consult the manufacturers spec sheet for this.
To get additional information it is also possible to run the service manually using <service> console
for example sniffle console
and look at the output that will print out possible configuration errors and other debug information to the shell.
(GZ) Are all your processes running?
$ ps -afe | grep run_erl | grep -v grep
root 48037 1 0 Dec 27 ? 0:00 /opt/chunter/erts-5.9.1/bin/run_erl -daemon /tmp/chunter /var/log/chunter exec
root 48033 1 0 Dec 27 ? 0:00 /opt/chunter/erts-5.9.1/bin/run_erl -daemon /tmp/fifo_zlogin /var/log/fifo_zlogin exec
(GZ) Is your Chunter services running?
$ svcs chunter fifo/zlogin
STATE STIME FMRI
online Jan_27 svc:/network/chunter:default
online Jan_27 svc:/fifo/zlogin:default
(GZ) Is there communication between your Global Zone and FiFo?
$ ping $FIFOZONEIP
192.168.0.126 is alive
(GZ) Does DNS reach your hypervisor?
might take ~2 minutes
$ snoop inet 224.0.0.251
172.16.234.10 -> 224.0.0.251 MDNS R _snarl._tcp.local. Internet PTR build._snarl._tcp.local.
172.16.234.10 -> 224.0.0.251 MDNS R _howl._tcp.local. Internet PTR build._howl._tcp.local.
172.16.234.10 -> 224.0.0.251 MDNS R _sniffle._tcp.local. Internet PTR build._sniffle._tcp.local.
(GZ) Is Chunter listening on the correct IP Address?
$ grep ip /opt/chunter/etc/chunter.conf
ip = <ip in the same network as the fifo zone>:4200
should be the ip address of your hypervisor within the same network as your fifo zone
(FZ) Does your FiFo zone have network connectivity?
$ ping project-fifo.net
project-fifo.net is alive
(FZ) Are all your FiFo Services running?
$ svcs sniffle snarl howl
STATE STIME FMRI
online Jan_25 svc:/network/snarl:default
online Jan_25 svc:/network/howl:default
online Jan_25 svc:/network/sniffle:default
(GZ) Is your Memory scaled correctly?
$ prstat
PID USERNAME SIZE RSS STATE PRI NICE TIME CPU PROCESS/NLWP
27833 howl 109M 54M sleep 56 0 2:06:24 0.3% beam.smp/164
27916 wiggle 91M 54M sleep 54 0 1:38:29 0.2% beam.smp/77
27846 snarl 358M 61M sleep 57 0 0:29:30 0.1% beam.smp/172
27841 sniffle 1112M 771M sleep 56 0 1:39:02 0.1% beam.smp/184
Note that SIZE is much bigger then RSS, this is caused by mmaped files for the database and can cause problems if it grows too big!
(CC) Is the API working, can you manually login via curl?
curl -v "http://<IP>/api/0.1.0/sessions" -H "Content-Type: application/json" -H "Accept: application/json" --data-binary '{"user":"admin","password":"admin"}'
* About to connect() to 192.168.0.204 port 80 (#0)
* Trying 192.168.0.204...
* connected
* Connected to 192.168.0.204 (192.168.0.204) port 80 (#0)
> POST /api/0.1.0/sessions HTTP/1.1
> User-Agent: curl/7.24.0 (x86_64-apple-darwin12.0) libcurl/7.24.0 OpenSSL/0.9.8r zlib/1.2.5
> Host: 192.168.0.204
> Content-Type: application/json
> Accept: application/json
> Content-Length: 33
>
* upload completely sent off: 33 out of 33 bytes
< HTTP/1.1 303 See Other
< Server: nginx/1.3.13
< Date: Wed, 29 May 2013 10:37:59 GMT
< Content-Type: application/json
< Content-Length: 0
< Location: http://<IP>/api/0.1.0/sessions/64bbb7cb-7505-4b01-adf7-c7daf5b5186a
< Connection: keep-alive
< access-control-allow-origin: *
< access-control-allow-headers: content-type, x-snarl-token
< access-control-expose-headers: x-snarl-token
< allow-access-control-credentials: true
< vary: accept
< set-cookie: x-snarl-token=64bbb7cb-7505-4b01-adf7-c7daf5b5186a; Version=1; Expires=Wed, 28-May-2014 10:37:59 GMT; Max-Age=31449600
< x-snarl-token: 64bbb7cb-7505-4b01-adf7-c7daf5b5186a
<
* Connection #0 to host 192.168.0.204 left intact
* Closing connection #0
(FZ) Can Howl connect to Sniffle and Snarl?
$ howl-admin connections
Snarl endpoints.
Hostname Node Errors
------------------------------ -------------------- ---------------
fifo01.local 192.168.1.41:4200 0
Snarl endpoints.
Hostname Node Errors
------------------------------ -------------------- ---------------
fifo01.local 192.168.1.41:4210 0
(FZ) Where do I find log files for a particular service?
$ ls -Fd /data/*/log
/data/howl/log/ /data/snarl/log/ /data/sniffle/log/
$
- Note: You may see mentions of crashes in the log files - this can happen even on a healthy system that is running fine. Once you found a logfile you want to examine, cat or tail it accordingly*
Hypervisor & VM Checks
(FZ) Can you see your hypervisors being listed?
$ fifoadm hypervisors list
Hypervisor IP Memory State
------------------ ---------------- --------------- -------------
00-15-17-b8-16-fc 172.16.0.4 25064/32699 ok
(FZ) Can you see your machines being listed?
$ fifoadm vms list
List of VMs zone fifoadm vms list
UUID Hypervisor Name State
------------------------------------ ----------------- --------------- ----------
7df22c41-bade-4b26-b20e-ee2b45e81bf8 00-15-17-b8-16-fc fifo running
21e0bc5d-af4e-4a44-8137-c7d50870dcbd 00-15-17-b8-16-fc ngnix running
bf57045f-42ee-42b5-8dc5-201250b7b6f4 00-15-17-b8-16-fc confluence running
39cd0a98-5087-472c-b89c-e75aef378a22 00-15-17-b8-16-fc dev stopped
49314fda-fef0-42fa-b974-77d27b097aa1 00-15-17-b8-16-fc korny running
2362ebf6-4988-4cfd-89ec-004dcc61a63b 00-15-17-b8-16-fc zotonic stopped
87cc64b1-3990-4cf6-a54d-dbc2e66adddc 00-15-17-b8-16-fc - installing
1df09840-f2bb-48fb-a3b3-5fe679849baf 00-15-17-b8-16-fc mail running
6d4a35a6-41d8-4a44-9977-e010b3ed307a 00-15-17-b8-16-fc test running
(FZ) How do I fetch information for a misbehaving VM?
$ fifoadm vms get -j <uuid>
{
"hypervisor": "00-15-17-b8-16-fc",
"state": "installing_dataset"
}
Additional fifoadm commands
There are a lot more calls for fifoadm that can help depending on where things lead.
Other errors
DNS
When using a custom DNS server or host files instead of xip.io
many people tend to get the configuration of their systems wrong this problems are easily detectable by looking for nxdomain
errors in the log files. i.e. grep nxdomain /var/log/chunter/error.log
.
To verify the DNS configuration the tool dig
can be used in both the fifo zone and the global zone.
$ dig 10.0.0.100.xip.io
; <<>> DiG 9.8.3-P1 <<>> 10.0.0.100.xip.io
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 47415
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0
;; QUESTION SECTION:
;10.0.0.100.xip.io. IN A
;; ANSWER SECTION:
10.0.0.100.xip.io. 300 IN A 10.0.0.100
;; Query time: 279 msec
;; SERVER: 192.168.1.1#53(192.168.1.1)
;; WHEN: Thu Jul 7 01:44:49 2016
;; MSG SIZE rcvd: 51
$ dig all-subdomains.10.0.0.100.xip.io
; <<>> DiG 9.8.3-P1 <<>> all-subdomains.10.0.0.100.xip.io
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 32813
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0
;; QUESTION SECTION:
;all-subdomains.10.0.0.100.xip.io. IN A
;; ANSWER SECTION:
all-subdomains.10.0.0.100.xip.io. 300 IN A 10.0.0.100
;; Query time: 147 msec
;; SERVER: 192.168.1.1#53(192.168.1.1)
;; WHEN: Thu Jul 7 01:45:20 2016
;; MSG SIZE rcvd: 66
To clarify, all-subdomains
in this example means that whatever valid dns hostname is put there needs to resolve to the same IP.
Reporting an issue
JIRA is the best place to file a report. If you do so it is often helpful to attach some logs. They can be found in the /data/{sniffle,snarl,howl}/logs
and /var/log/chunter
(in the FiFo Zone or GZ respectively).
In addition there is a built in command fifoadm diag
that will prepare all your log files and put them in a /var/tmp/[$fifoservice]-diag
directory. This is to aid you in your log collection and submission process and to encourage folks to always attach logs when filing bug reports.
In the global zone run /opt/chunter/share/chunter-diag
to grab diagnostics data.
Updated less than a minute ago