Sysdig: A Linux Diagnostics Tool

Original post date: May 13, 2014

We’ve mentioned sysdig several times before, but we haven’t published anything in our English-language blog about the program itself. Today, we’ll be pulling an article out of our archives and looking at our original review of sysdig.

Linux systems use a myriad of utilities to collect and analyze data. Each component requires its own specialized tools for diagnosing errors. The most widespread of these diagnostic utilities and their application is visually represented in the following graphic:

We recently learned about Sysdig, a utility developed by Draios. It collects information on absolutely everything:

  • incoming network connections and related processes
  • I/O-intensive files
  • process-related traffic
  • files and directories accessed by users
  • systems calls, files, and network connections that return errors

Sysdig positions itself as a tool for facilitating the work of system administrators. After reading an article about it on the developer’s site, we decided to test it out.

DTrace, Systemtap, and Sysdig

Sysdig is far from the first attempt to create a tool with extensive data collecting capabilities on a Linux system.
Regarding similar tools in terms of functionality, we should firstly mention DTrace — a dynamic tracing framework developed by Sun Microsystems. It monitors the amount of memory, processor time, and network resources used by a system’s active processes.

DTrace runs scripts written in D (a language similar to C, but which includes specialized functions and modifications for tracers). Scripts include a list of probes, which are responsible for specific actions. Probes activate when a given condition is met (for example, when opening a file or starting a process), and then execute a corresponding action. Information can be transferred from one probe to another.

DTrace is a powerful, but complicated tool and requires extensive technical knowledge on the side of the user. Writing and debugging D-scripts is also a tedious task (especially for those who aren’t particularly skilled programmers), which takes up a lot of time.

A tool very similar to DTrace in terms the principles behind it and available functions is Systemtap. Systemtap is a command line interface and scripting language. It monitors system events and, at the onset of an event, assigns a handler.

The beginning or end of a Systemtap session may be marked by an event like a timer going off. The term “event handler” is given to a sequence of operator scripts that execute when an event starts. Handlers usually break down information from an event’s context or print it in the console.

A major downside to SystemTap is the terribly complicated syntax of its scripting language. Writing and debugging scripts demands a hefty amount of the user’s time and energy.

In contrast to the aforementioned tools, Sysdig has a fairly different structure. In terms of architecture, it more closely resembles programs like libcap, tcpdump, and wireshark. The special driver sysdig probe captures events at the kernel level, which then launches the kernel function tracepoints, which in turn launches a handler for the event. Handlers store information on the event in a shared buffer. This information can then be displayed on the screen or saved to a text file.

Because of this architecture, sysdig does not affect system performance. Detailed information on system events can be retrieved using simple commands. Several operations can even be executed with the use of pre-made scripts written in Lua (we’ll discuss this in more detail below).

Installation

Sysdig is not currently included in official repositories. To launch Sysdig’s automatic installation, we run the following command:

curl -s https://s3.amazonaws.com/download.draios.com/stable/install-sysdig | sudo bash

For information on manually installing sysdig on different Linux distributions, see the official documentation.

Update: Sysdig is now officially included in the official versions of Debian and Ubuntu, but as the program is constantly being updated, it’s recommended following the standard installation instructions to ensure you have the latest version to date.

Initial Use

Once sysdig has been installed, we run the following command:

# sysdig

All of the active system events will be displayed in the standard output:

63889 15:25:12.908695644 3 notify-osd (7209) > poll fds=3:u5 timeout=4294967295 
63890 15:25:12.908698249 3 notify-osd (7209)  writev fd=3(<u>) size=4 
63893 15:25:12.908704065 2 gnome-terminal (18260) > lseek fd=24(/tmp/vteIVHGFX (deleted)) offset=0 whence=2(SEEK_END) 
63894 15:25:12.908704595 2 gnome-terminal (18260)  lseek fd=24(/tmp/vteIVHGFX (deleted)) offset=0 whence=2(SEEK_END) 
63896 15:25:12.908709655 2 gnome-terminal (18260)  write fd=24(/tmp/vteIVHGFX (deleted)) size=80 
63899 15:25:12.908710722 3 notify-osd (7209) > writev res=4 data=+... 
63900 15:25:12.908713828 3 notify-osd (7209) < poll fds=3:u1 timeout=4294967295 
63901 15:25:12.908714531 2 gnome-terminal (18260) < write res=80 data=1275 15:25:12.596942000 1 rs:main (941) < open fd=-2(ENOENT) name=/dev/xconsole

Each line contains information on one event. This is reflected in the following format:

%evt.num %evt.time %evt.cpu %proc.name (%thread.tid) %evt.dir %evt.type %evt.args

The printout consists of the following fields:

  • evt.num — event number
  • evt.time — time of event
  • evt.cpu — processor where event was captured
  • proc.name — processor name
  • thread.tid — thread number (for single-thread processors, this matches the processor number)
  • evt.dir — direction of event (< — for outgoing processes, > — for incoming)
  • evt.type — event type
  • evt.args — event arguments

Saving Information to Files

Event information that sysdig gathers can be saved to separate files. This is done using a command like:

# sysdig -w myfile.scap

If you don’t need to write information about all system events to a file, but only a limited number of them (let’s say 100 events), use the -n option:

# sysdig —n 100 —w myfile.scap

You can print out information that was previously saved to a file using the -r option:

# sysdig -r myfile.scap

Sysdig saves a complete capture of the operating system (launched processes, active files, active users, etc.) to each file.

Filters

As we saw in the previous examples, sysdig writes all event information to a standard output. We can make it so that the console displays only the information we need. This is done using filters.

Filters are given at the end of a line (like in tcdump). They can be applied when recording an event on the fly or when writing a file. We’ll try to trace the work of the cat command:

# sysdig proc.name = cat 

21368 13:10:15.384878134 1 cat (8298) < execve res=0 exe=cat args=index.html. tid=8298(cat) pid=8298(cat) ptid=1978(bash) cwd=/root fdlimit=1024
21371 13:10:15.384948635 1 cat (8298) > brk size=0
21372 13:10:15.384949909 1 cat (8298) < brk res=10665984
21373 13:10:15.384976208 1 cat (8298) > mmap
21374 13:10:15.384979452 1 cat (8298) < mmap
21375 13:10:15.384990980 1 cat (8298) > access
21376 13:10:15.384999211 1 cat (8298) < access
21377 13:10:15.385008602 1 cat (8298) > open
21378 13:10:15.385014374 1 cat (8298) < open fd=3(/etc/ld.so.cache) name=/etc/ld.so.cache flags=0(O_NONE) mode=0
21379 13:10:15.385015508 1 cat (8298) > fstat fd=3(/etc/ld.so.cache)
21380 13:10:15.385016588 1 cat (8298) < fstat res=0
21381 13:10:15.385017033 1 cat (8298) > mmap
21382 13:10:15.385019763 1 cat (8298) < mmap
21383 13:10:15.385020047 1 cat (8298) > close fd=3(/etc/ld.so.cache)
21384 13:10:15.385020556 1 cat (8298) < close res=0

Let’s apply filters. This can be done using standard comparison operators (=, !=, <, <=, >, >=, contains) or boolean operators (or, and, not) and parentheses.

We’ll enter the following command:

# sysdig proc.name = cat and proc.name = vi

This will trace all of the activities of cat and vi:

56239 12:14:01.449463618 0 BrowserBlocking (2587) > open 
56240 12:14:01.449467018 0 BrowserBlocking (2587) < open fd=142(/proc/16213/statm) name=/proc/16213/statm flags=1(O_RDONLY) mode=0 
63158 12:14:01.493237287 3 gnome-terminal (3910) > open 
63177 12:14:01.493281181 3 gnome-terminal (3910) < open fd=18(/tmp/vteHGSYFX) name=/tmp/vteHGSYFX flags=39(O_EXCL|O_CREAT|O_RDWR) mode=0 
63200 12:14:01.493309748 3 gnome-terminal (3910) > open 
63205 12:14:01.493319526 3 gnome-terminal (3910) < open fd=18(/tmp/vteHESYFX) name=/tmp/vteHESYFX flags=39(O_EXCL|O_CREAT|O_RDWR) mode=0

The command

# sysdig proc.name!=cat and evt.type=open

will print information about the open events for all processes except cat:

2111 12:15:47.656367409 1 rs:main (914) > open 
2112 12:15:47.656368926 1 rs:main (914)  open 
2114 12:15:47.656371170 1 rs:main (914)  open 
2116 12:15:47.656374373 1 rs:main (914)  open 
2118 12:15:47.656376563 1 rs:main (914)  open 
2120 12:15:47.656378615 1 rs:main (914)  open

The full list of filters can be viewed using the command

# sysdig -l

(further explanation and commentary can be found here).

Using filters, we can can easily retrieve critical information. For example, we can view information on incoming network connections received by all processes except apache using a simple command:

# sysdig evt.type=accept and proc.name!=apache

As said above, the sysdig printout contains an evt.arg and evt.rawarg field. We should talk about these separately. Every event registered by sysdig applies to a specific category (such as open, read, etc.) and contains specific parameters (fd, name, etc.), which are coded in a particular way. We’re not going to break down all of this (anyone interested can look at the official documentation), so we’re left with how these arguments can be used when creating filters.

Let’s look at the following command:

# sysdig evt.type=execve and evt.arg.ptid=bash

This displays a list of processes launched by interactive users in the console. This filter catches ‘execve’ system calls (which are used for running programs) only if the parent process is bash.

The difference between evt.arg and evt.rawarg is that the latter doesn’t decode identifying process numbers, error codes, etc., leaving all arguments in a raw numerical form.
For example, we can view a list of all the processes that have caused errors with the following command:

# sysdig "evt.rawarg.res<0 or evt.rawarg.fd<0"

257727 15:57:35.398754060 3 chrome (17326) < futex res=-110(ETIMEDOUT) 
257737 15:57:35.399218996 0 chrome (2493) < recvfrom res=-11(EAGAIN) data= tuple=NULL 
257749 15:57:35.399362914 1 Xorg (1153) < read res=-11(EAGAIN) data= 
257834 15:57:35.401067094 0 chrome (2493) < recvfrom res=-11(EAGAIN) data= tuple=NULL 
257836 15:57:35.401106092 0 chrome (2493) < recvfrom res=-11(EAGAIN) data= tuple=NULL 
257849 15:57:35.402594284 2 chrome (4446) < futex res=-110(ETIMEDOUT) 
257882 15:57:35.407348870 0 chrome (2493) < recvfrom res=-11(EAGAIN) data= tuple=NULL 
257884 15:57:35.407358705 0 chrome (2493) < recvfrom res=-11(EAGAIN) data= tuple=NULL 
257888 15:57:35.407373908 0 chrome (2493) < recvfrom res=-11(EAGAIN) data= tuple=NULL 
257922 15:57:35.407757377 1 Xorg (1153) < read res=-11(EAGAIN) data=

The full list of events and parameters supported by filters can be viewed using the following command:

# sysdig -L

Formatting Printouts

We can customize the the format of a sysdig printout using the -p option and indicating the desired output fields:

# sysdig -p"user:%user.name dir:%evt.arg.path" evt.type=chdir

user:ubuntu dir:/root
user:ubuntu dir:/root/tmp
user:ubuntu dir:/root/Download

Entering the command above collects information on ‘chdir’ system calls (these occur each time the cd command is executed) and displays the name of the user executing the cd command and the directory they switch to in the console.

The -p option uses the following syntax:

  • the percent sign (%) is placed before the name of each field
  • any text can be added to the line (similar to the printf function in C)
  • by default, a line is only printed to the console when all of the elements after the -p option are present. An asterisk (*) at the beginning of a line means incomplete printouts can be given; missing fields will be shown as N/A.

By entering the command

# sysdig -p"%evt.type %evt.dir %evt.arg.name" evt.type=open

we get a printout of information on open outgoing events.

open < /proc/23533/task/23533/stat
open < /proc/23533/task/23535/stat
open < /proc/23533/task/23536/stat
open < /proc/23533/task/23539/stat
open < /proc/23533/task/23540/stat
open < /proc/23533/task/23541/stat
open < /proc/23533/task/23542/stat
open < /proc/23533/task/23543/stat
open < /proc/23533/task/23544/stat

Incoming events don’t have a name, which is why there is no information on them in the output.

If we enter the command

# sysdig -p "*%evt.type %evt.dir %evt.arg.name" evt.type=open

then the printout will include information on outgoing events:

open < /proc/22832/task/22838/stat
open > 
open < /proc/22832/task/22839/stat
open > 
open < /proc/22832/task/22840/stat
open > 
open < /proc/22832/task/22841/stat
open > 
open < /proc/22832/task/22842/stat
open > 
open < /proc/22832/task/22843/stat
open > 
open < /dev/urandom

Chisels

Sysdig uses small scripts, written in Lua, for analyzing event lists. Developers refer to these as chisels.
A list of available chisels can be displayed in the console using the command:

# sysdig -cl

To view a specific chisel’s description and a list of arguments it can use, we use the -i option:

# sysdig -i fileslower

Category: Performance
---------------------
fileslower Trace slow file I/O
Use the -i flag to get detailed information about a specific chisel
Trace file I/O slower than a threshold, or all file I/O

Args:
[int] min_ms — minimum millisecond threshold for showing file I/O

Chisels are launched using the -c option. We’ll launch the chisel topfiles_bytes (this displays a list of the most accessed files on the local machine):

# sysdig -c topfiles_bytes

Bytes     Filename  
------------------------------
3.21KB    /dev/input/event4
2.93KB    /tmp/vte7IZWFX (deleted)
864B      /dev/urandom
800B      /tmp/vteL7ZWFX (deleted)
498B      /dev/ptmx
224B      /dev/dri/card0
219B      /proc/16213/task/16221/stat
217B      /proc/16213/task/16229/stat
217B      /proc/16213/task/16219/stat
215B      /proc/16213/task/16225/sta

Filters can be used with chisels. If we aren’t interested in information on how frequently files are accessed in the /dev directory, we apply the following filter:

# sysdig -c topfiles_bytes "not fd.name contains /dev"
Bytes     Filename  
------------------------------
1.90KB    /tmp/vte7IZWFX (deleted)
438B      /proc/16139/task/16145/stat
438B      /proc/16139/task/16141/stat
434B      /proc/16139/task/16150/stat
430B      /proc/16139/task/16146/stat
430B      /proc/16139/task/16147/stat
430B      /proc/16139/task/16149/stat
430B      /proc/16139/task/16148/stat
428B      /proc/16139/task/16139/stat
420B      /proc/16139/task/16142/stat

With filters, we can also view information on files accessed in a specific directory:

# sysdig -c topfiles_bytes "fd.name contains /var/log/" 

Bytes     Filename
------------------------------
596B      /var/log/kern.log
596B      /var/log/syslog
596B      /var/log/messages

Another filter lets us see which files a particular process accessed:

# sysdig -c topfiles_bytes "proc.name=vi"

We can also see which files a user accessed:

$ sysdig -c topfiles_bytes "user.name=username" 

Bytes     Filename  
------------------------------
1.90KB    /tmp/vte7IZWFX (deleted)
576B      /dev/urandom
384B      /tmp/vteL7ZWFX (deleted)
355B      /dev/ptmx

We can launch multiple chisels simultaneously:

# sysdig -c stdin -c stdout proc.name=cat

As we’ve already noted, chisels are written in Lua, so existing chisels can be edited and new ones can be written fairly easily.

A manual on writing scripts can be found here.

Practical Examples

Let’s look at a few examples of the standard diagnostic procedures we can perform with sysdig.

Network

To view a list of connections not served by Apache:

# sysdig -p "%proc.name %fd.name" "evt.type=accept and proc.name!=httpd"

To see what data has been exchanged with server 192.168.0.1:
in binary:

# sysdig -s2000 -X -c echo_fds fd.cip=192.168.0.1

in ASCII:

# sysdig -s2000 -A -c echo_fds fd.cip=192.168.0.1

To retrieve information on the processes consuming the most bandwidth:

# sysdig -c topprocs_net
Bytes     Process   
------------------------------
885B      avahi daemon
6.44KB    Chrome

To view statistics on server ports:
on the number of established connections:

# sysdig -c fdcount_by fd.sport "evt.type=accept";

on the amount of information sent in bytes:

# sysdig -c fdbytes_by fd.sport

To view information on client IPs:
on the number of established connections:

# sysdig -c fdcount_by fd.cip "evt.type=accept"

on the amount of information sent in bytes:

# sysdig -c fdbytes_by fd.cip

Bytes     fd.cip    
------------------------------
375B      192.168.40.99
250B      192.168.40.255
226B      192.168.40.101
133B      192.168.30.88
125B      255.255.255.255

To view information on requests sent by Apache to external MySQL servers:

# sysdig -A -c echo_fds fd.sip=192.168.30.5 and proc.name=apache2 and evt.buffer contains SELECT

Disk Subsystem

To view statistics on disk subsystems:

# sysdig -c topprocs_file
Bytes     Process   
------------------------------
12.61KB   BrowserBlocking
3.89KB    Xorg
3.79KB    Chrome_IOThread
3.09KB    gnome-terminal

To view information on file-heavy processes:

# sysdig -c fdcount_by proc.name "fd.type=file"

BrowserBlocking	365
Chrome_IOThread	44
irqbalance	12
upowerd	7
dropbox	5
Xorg	3
alsa-sink	2
rs:main	2
compiz	1
rsyslogd	1
gnome-terminal	1

To trace read/write operations performed by processes:

# sysdig -c topfiles_bytes

Bytes     Filename  
------------------------------
5.41KB    /dev/input/event4
1.90KB    /tmp/vteHGSYFX (deleted)
576B      /dev/urandom
554B      /dev/ptmx
384B      /tmp/vteHESYFX (deleted)
219B      /proc/16139/task/16145/stat
219B      /proc/15857/task/15865/stat
219B      /proc/16139/task/16141/sta

To view a list of files that Apache runs the most read/write operations for:

# sysdig -c topfiles_bytes proc.name=httpd

To trace file opens in real time:

# sysdig -p "%12user.name %6proc.pid %12proc.name %3fd.num %fd.typechar %fd.name" evt.type=open

root         1143   irqbalance   3   f /proc/interrupts
root         1143   irqbalance   3   f /proc/stat
root         1143   irqbalance   3   f /proc/irq/42/smp_affinity
root         1143   irqbalance   3   f /proc/irq/41/smp_affinity
root         1143   irqbalance   3   f /proc/irq/16/smp_affinity
root         1143   irqbalance   3   f /proc/irq/43/smp_affinity
root         1143   irqbalance   3   f /proc/irq/17/smp_affinity
root         1143   irqbalance   3   f /proc/irq/23/smp_affinity
root         1143   irqbalance   3   f /proc/irq/40/smp_affinity
root         1143   irqbalance   3   f /proc/irq/10/smp_affinity
root         1143   irqbalance   3   f /proc/irq/18/smp_affinity

Processor Usage

To view statistics on processor usage:

# sysdig -c topprocs_cpu

CPU%      Process
------------------------------
0.31%     sysdig
0.09%     sshd
0.03%     mysqld
0.01%     nginx
0.01%     php5-fpm

To view CPU0 statistics:

# sysdig -c topprocs_cpu evt.cpu=0

To view the standard output for a process:

# sysdig -s4096 -A -c stdout proc.name=cat

Performance and Errors

To view information on httpd file open errors:

# sysdig "proc.name=httpd and evt.type=open and evt.failed=true"

To view statistics on the most time-consuming files:

# sysdig -c topfiles_time 

Time      Filename  
------------------------------
403us     /dev/urandom
267us     /dev/input/event4
84us      /dev/dri/card0
63us      /tmp/vte7IZWFX (deleted)
34us      /tmp/vteL7ZWFX (deleted)
20us      /proc/3467/task/3467/stat
13us      /dev/ptmx
11us      /proc/16010/task/16010/st

To view information on the processes Apache spends the most time on:

# sysdig -c topfiles_time proc.name=httpd

To view information on processes in terms of I/O errors:

# sysdig -c topprocs_errors

------------------------------
2363      notify-osd
1327      Xorg
688       compiz
349       chrome
82        pulseaudio
76        gtk-window-deco
62        gnome-terminal
50        alsa-sink
30        Chrome_ChildIOT
20        gnome-screensav
20        nautilus
14        Chrome_IOThread
10        syndaemon
10        gnome-settings-
7         soffice.bin
6         nm-applet
6         dbus-daemon
4         AudioThread
3         pidgin
2         NetworkManager
2         mission-control
1         gdbus

To view information on files in terms of I/O errors:

# sysdig -c topfiles_errors

#Errors   Filename  
------------------------------
43        /dev/input/event4
2         /dev/ptmx

To view information on system calls that return errors:

# sysdig -c topscalls "evt.failed=true"

# Calls   System Call                                                                                                        
------------------------------
384       recvfrom
273       futex
169       read
133       sendto
41        select
3         recvmsg

To trace file open errors as they occur:

# sysdig -p "%12user.name %6proc.pid %12proc.name %3fd.num %fd.typechar %fd.name" evt.type=open and evt.failed=true

root         1607   upowerd      -1  f /sys/devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A08:00/device:0e/PNP0C09:00/PNP0C0A:00/power_supply/BAT0/energy_now
root         1607   upowerd      -1  f /sys/devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A08:00/device:0e/PNP0C09:00/PNP0C0A:00/power_supply/BAT0/energy_avg
root         1607   upowerd      -1  f /sys/devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A08:00/device:0e/PNP0C09:00/PNP0C0A:00/power_supply/BAT0/voltage_max_design
root         1607   upowerd      -1  f /sys/devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A08:00/device:0e/PNP0C09:00/PNP0C0A:00/power_supply/BAT0/power_now

To display a list of I/O operations with a latency above 1 millisecond:

# sysdig -c fileslower 1

TIME                    PROCESS      TYPE     LAT(ms) FILE
2014-05-13 12:46:57.190 rsyslogd     read        3524 /proc/kmsg
2014-05-13 12:46:57.197 rsyslogd     read           7 /proc/kmsg
2014-05-13 12:46:57.205 rsyslogd     read           7 /proc/kmsg
2014-05-13 12:46:57.209 rsyslogd     read           4 /proc/kmsg
2014-05-13 12:46:57.221 rsyslogd     read          11 /proc/kmsg
2014-05-13 12:46:57.225 rsyslogd     read           3 /proc/kmsg
2014-05-13 12:46:57.233 rsyslogd     read           7 /proc/kmsg
2014-05-13 12:46:57.241 rsyslogd     read           7 /proc/kmsg
2014-05-13 12:46:58.362 upowerd      read         220 /sys/devices/LNXSYSTM:00/LN

Security

To view information on directories visited by the root-user:

# sysdig -p "%evt.arg.path" "evt.type=chdir and user.name=root"

The trace ssh activity:

# sysdig -A -c echo_fds fd.name=/dev/ptmx and proc.name=sshd

To display all file open events from the /etc directory:

# sysdig evt.type=open and fd.name contains /etc
97367 12:50:02.164137993 0 unity-panel-ser (2193) < open fd=13(/etc/timezone) name=/etc/timezone flags=1(O_RDONLY) mode=0 
97385 12:50:02.164419642 0 unity-panel-ser (2193) < open fd=13(/etc/localtime) name=/etc/localtime flags=1(O_RDONLY) mode=0 
97405 12:50:02.164642935 0 unity-panel-ser (2193) < open fd=13(/etc/localtime) name=/etc/localtime flags=1(O_RDONLY) mode=0

Conclusions

Sysdig is still a fairly young project. Among its undeniable advantages we should name its simple stats command. In many cases, the information sysdig returns on system events is more detailed than DTrace or Systemtap, and it is presented in a more user-friendly format. Another big plus is that the analysis of system processes can be performed after data has been collected, and not only when errors occur or in emergency situations.

Sysdig, without a doubt, has a lot of potential. We hope the project gets the fine-tuning and the merit it deserves among the other Linux system diagnostic tools.