( NetBSD 8.99.38 or newer required )
( these tools are experimental and can kill a traced process )
I wrote an experimental version of NetBSD truss(1), modeled after
FreeBSD truss(1).
https://github.com/krytarowski/picotrace
It was started as a fork of the FreeBSD one, but quickly became a
full-rewrite as the decoding part was specific to the FreeBSD kernel and
the ptrace(2) usage was FreeBSD-only as well. There was not much for
code sharing.
I've picked picotrace, forked and extended to mimic FreeBSD truss(1).
NetBSD truss(1) indents to implement all the features of FreeBSD
truss(1) and extend it whenever useful with additional features. At this
point of time all the features should be implemented and committed to
the tree, but not all of them are reliable. I've extended the tool to
handle SIGINFO for the counting mode (-c) and printing AUXV for tracing
& printing events mode (-x).
Example output:
1. ./truss -o /tmp/build.sh-2019-05-24.txt -axe -f -H -d -D
/usr/src/build.sh
http://netbsd.org/~kamil/truss/build.sh-2019-05-24.txt.bz2
$ bzcat build.sh-2019-05-24.txt.bz2 |head
9267 1 0.000715337 0.000715337 mmap(0x0, 32768,
0|PROT_READ|PROT_WRITE, 0|MAP_PRIVATE|MAP_ANONYMOUS, -1, 0)= 0xf7ef7000
9267 1 0.000897229 0.000120463 open("/etc/ld.so.conf", O_RDONLY)
= 0 Err#2 ENOENT
9267 1 0.000980901 0.000081141 open("/usr/local/lib/libc.so.12",
O_RDONLY) = 0 Err#2 ENOENT
9267 1 0.001057840 0.000074579 open("/usr/lib/libc.so.12",
O_RDONLY) = 0x3
9267 1 0.001135284 0.000075166 __fstat50(3, { dev=0,
mode="-r--r--r-- ", ino=12401082, nlink=1, uid=0, gid=0, rdev=-1,
atime=1558660527.138286790, mtime=1558420316.000000000,
ctime=1558425703.025156636, blksize=32768, blocks=4544})= 0
9267 1 0.001161701 0.000024142 mmap(0x0, 4096, 0|PROT_READ,
0|MAP_SHARED, 3, 0) = 0xf7ef6000
9267 1 0.001192443 0.000028834 munmap(0x7f7ff7ef6000, 4096)
= 0
9267 1 0.001215649 0.000021266 mmap(0x0, 6373376,
0|PROT_EXEC|PROT_READ, 0|MAP_PRIVATE|MAP_ALIGNED(0x15), 3, 0)= 0xf7400000
9267 1 0.001244366 0.000026852 mmap(0x7f7ff77df000, 69632,
0|PROT_READ|PROT_WRITE, 0|MAP_PRIVATE|MAP_FIXED, 3, 1961984)= 0xf77df000
9267 1 0.001271844 0.000025604 mmap(0x7f7ff77f0000, 2244608,
0|PROT_READ|PROT_WRITE, 0|MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0)=
0xf77f0000
2. ./truss -c `pgrep firefox`
http://netbsd.org/~kamil/truss/firefox-count-2019-05-24.txt
truss[9968] running for 61.849759335 seconds
truss[9968] attached to child=10769 ('firefox') for 61.848690315 seconds
syscall seconds calls errors missed-sce
missed-scx
read 0.254737948 1947 0 790
716
write 0.811738104 1418 0 643
934
open 0.137601862 134 26 77
54
close 0.005132063 89 0 42
43
unlink 0.047976024 29 0 30
5
getpid 3.165382267 7359 0 514
8200
getuid 0.000451950 7 0 5
41
Observations:
1. We can observe in both examples that there is a performance issue in
build.sh, as something attempts to call 5000 times read(2) for 1 byte only.
I plan to extend NetBSD truss(1) to print program name next to PID number.
2. Firefox and Thunderbird and certainly other similar software calls
excessively gettimeofday() and clock_gettime(). At least around 100k
times per 1 minute, and the program spends around 30sec (cumulative time
from all LWPs in a process) in the kernel space prompting for the
current time.
__gettimeofday50 2.158735930 17989 0 963
4141
__clock_gettime50 21.415106767 73068 0 7484
22300
There is also a bug in the NetBSD kernel that whenever we catch more
than 1 event in the same time, one of them is registered and others
lost. This affects multi-threading tracing. This affects all debuggers
with support debugging of programs with multiple threads (and likely
multiple processes).
I've discussed with Jason Thorpe how to optimize this and it shall be
put into a dedicated thread.
I have noted few other ptrace(2) and kernel behavior issues and I put
them on my TODO list and I will work on fixing them.
The bottlenek
NetBSD coredumper(1)
Some applications mask crash signals and implement homegrown fallback
mechanisms for breakage (vim, firefox, Xorg and whatever tries to be smart).
In the past, I used to use this core(5) dumper for firefox, removing
signal handlers for interesting signals:
http://netbsd.org/~kamil/firefox-dumper/custom_sigaction.c
Now, with coredumper(1), this workaround is no longer needed. With
recent kernel changes crash signals are correctly reported to debuggers
always. regardless of signal masking in processes. I learnt during
BSDCan that people in some other OSs keep removing signal handlers
manually in order to debug bugs..
NAME
coredumper – crash detector and core dumper
SYNOPSIS
coredumper [-c CORENAME] [-i] [-p | command [arg ...]]
Whenever program triggers crash, coredumper(5) catches this and creates
a core file.
I plan to extend this program with functionality to print backtraces
instead of dumping core files that can be too large in some circumstances.
My recent motivation to get this program done was investigating X
crashes, but they are extra tricky as X uses privileged processes that
are harder to debug.
Full example:
http://netbsd.org/~kamil/coredumper/example-2019-05-24.txt
In practice a similar behavior to truss(1) and coredumper(1) could be
achieved with an existing debugger (especially with Python/similar
bindings), however the purpose of these new programs is to perform one
single task in an easier way to setup.
ktruss(1) is similar to truss(1), but uses kernel-based tracing.
truss(1) uses ptrace(2) and is more flexible in terms of prompting the
traced process for whatever information we are interested in a child
process.
NetBSD truss(1) and NetBSD coredumper(1) are intended to be completely
MI (except LP64 check).
For now, I'm switching back to fixing kernel ptrace(2) behavior in order
to make such tools (and consequently debuggers like GDB/LLDB) more
reliable. truss(1) and picotrace(1) exposed some new issues and helped
me to understand why tracing programs with multiple LWPs is still not
reliable.
Attachment:
signature.asc
Description: OpenPGP digital signature