Subject: sockstat(1), kern.file2 and net.*.*.pcblist for sysctl
To: None <tech-kern@NetBSD.org>
From: Andrew Brown <atatat@atatdot.net>
List: tech-kern
Date: 02/27/2005 14:15:04
From: Andrew Brown <atatat@atatdot.net>
To: tech-kern@NetBSD.org
Cc: 
Bcc: 
Subject: sockstat(1), kern.file2 and net.*.*.pcblist for sysctl
Reply-To: 
Return-Receipt-To: receipts@daemon.org

amongst many distractions, i have a version sockstat(1) that seems to
work quite well.

freebsd's sockstat program works quite simply: it retrieves (via
kern.file) the list of all process' open files, then retrieves the
lists of pcbs in use (from the net.<af>.<protocol>.pcblist nodes in
sysctl), and then prints the socket information for all files whose
f_type is DTYPE_SOCKET and whose f_data matches the socket address
from a pcb.

our kern.file is (imho) rather crude.  first it copies out a copy of
the filehead (why?) and then copies out each file from the list
verbatim.  there are pointers in here (netbsd32 issues), locks
(LOCKDEBUG issues), and implicit abi issues (change the size or layout
of struct file and you lose).  secondly, it does nothing to allow a
file to be associated with a process.

so i invented kern.file2, which uses (a) a fixed order structure (no
abi issues) with (b) fixed sizes for things (no netbsd32 issues), (c)
the list of which can be pulled in two ways (by the main list of open
files or by the list of processes' open files, thereby solving the
process issue), and which contains a few things from the struct proc
and struct vnode (where applicable) to make things easier.

i suspect that none of the information therein is "precious" by any
means and would need to be "protected from random prying eyes", but if
anyone would like to see the layout, please ask.

then i wrote a program to dump kernel memory out in a nice readable
layout by walking either the process list, the open file list, or the
pcb lists, recursing into a printing most of the interesting
structures (vnodes, mounts, sockets, specinfos, domains, protosws,
etc) that it encountered along the way.  yes, you can get this from
gdb, but it's much easier this way.

then we need pcb lists, so that's simple enough.  the sysctl version
of the pcblist (i ended up with eight lists: tcp4 and udp4, tcp6 and
udp6, local and dgram unix domain pcbs, and then two for the raw v4
and v6 sockets).

the information in the meta-pcb is cobbled from the pcb, the ppcb (if
applicable), and the struct socket.  it's a few pointers (to each of
the structures), a few state and flags instances, the local and remote
addresses, send and receive queue sizes, and a few other pointers from
the struct unpcb.

clearly i had more in mind than writing sockstat(1), so now i also
have a version of netstat(1) that can print all the network tables
without kmem priveleges (and did you know that netstat doesn't list
open raw sockets?  neither did i).

i also found a few bugs.

(1) i've already fixed.  netstat was printing the pcb address of a udp
socket for udp6 sockets.

(2) i'm uncertain on.  the netstat man pages states:

     -A    With the default display, show the address of any protocol control
           blocks associated with sockets; used for debugging.

but the address printed for unix domain sockets is that of the struct
socket, not the struct unpcb.  it's a simple one line change, but is
this worth fixing?  does anyone care?  i think i should change it
simply because it's not right.

(3) i've not dug into too deeply (maybe it's not even a bug?), but
it's a little more alarming.  basically, it appears that i have a lot
of uninitialized simplelocks in my kernel.  picking a process (mostly)
at random and running gdb against the running kernel shows:

    (gdb) print allproc.lh_first->p_list.le_next->p_pid
    $36 = 327
    (gdb) print allproc.lh_first->p_list.le_next->p_cwdi->cwdi_slock
    $37 = {lock_data = -904051956}
    (gdb) print &allproc.lh_first->p_list.le_next->p_cwdi->cwdi_slock
    $40 = (struct simplelock *) 0xca1c52dc
    (gdb) x/x &allproc.lh_first->p_list.le_next->p_cwdi->cwdi_slock
    0xca1c52dc:     0xca1d430c

i don't have LOCKDEBUG in my kernel and i'm not a lock expert, but i
would have expected a simple lock either to be 0 or 1.  the number of
simplelocks i have that are not in this state is currently 65.

anyway, i have to go write the man page.  if you have any comments or
questions, please let me know.

-- 
|-----< "CODE WARRIOR" >-----|
codewarrior@daemon.org             * "ah!  i see you have the internet
twofsonet@graffiti.com (Andrew Brown)                that goes *ping*!"
werdna@squooshy.com       * "information is power -- share the wealth."