Subject: make -j3 stalled with vnlock on amd64
To: None <current-users@netbsd.org>
From: Kurt Schreiner <ks@ub.uni-mainz.de>
List: current-users
Date: 12/19/2005 19:03:43
Hi,

just tried to compile -current userland cvs updated some hours ago
on a dual opteron machine (Opteron 246, 6G RAM, scsi-disk(s)).
I first compiled and installed a new kernel from the "fresh" sources:
NetBSD sunopti 3.99.14 NetBSD 3.99.14 (SUNOPTI_MP) #23: Mon Dec 19 17:55:47 MET 2005  ks@sunopti:/u/NetBSD/arch/amd64/obj/sys/arch/amd64/compile/SUNOPTI_MP amd64

And started ./build.sh -N 1 -u -j 3 -U -m amd64 -O /u/NetBSD/arch/amd64/obj
which stopped with this "famous last words":

install ===> gnu/usr.sbin/postfix/spawn
--- /u/NetBSD/arch/amd64/dest/usr/libexec/postfix/spawn ---
    install  /u/NetBSD/arch/amd64/dest/usr/libexec/postfix/spawn
--- install-libexec ---
--- /u/NetBSD/arch/amd64/dest/usr/bin/uucp ---
    install  /u/NetBSD/arch/amd64/dest/usr/bin/uucp
--- install-usr.sbin ---
--- install-trivial-rewrite ---
install ===> gnu/usr.sbin/postfix/trivial-rewrite
--- install-libexec ---
--- install-uulog ---
install ===> gnu/libexec/uucp/uulog


ps al shows:

UID   PID  PPID   CPU PRI NI  VSZ  RSS WCHAN  STAT TTY      TIME COMMAND
 77   300 26737 66127  10  0  184  984 wait   I    ttyp0 0:00.00 sh -c cd /u/NetBSD/src/gnu/usr.bin
 77   357   359 35149  10  0  316 1160 wait   I+   ttyp0 0:00.02 sh ./build.sh -N 1 -u -j 3 -U -m a
 77   358  1022  1820   2  0   40  656 piperd I+   ttyp0 0:00.44 tee -a /var/tmp/mkamd64-051219.180
 77   359  1022  3581  10  0  188  528 wait   I+   ttyp0 0:00.00 sh NBscripts/build-netbsd -j3 -d 
 77   885   243   149  18  0 2288 1576 pause  Is   ttyp0 0:00.02 -tcsh 
 77   991   885  3581  10  0  188 1020 wait   I+   ttyp0 0:00.00 sh NBscripts/build-netbsd -j3 -d 
 77  1022   991  3581  10  0  188  808 wait   I+   ttyp0 0:00.00 sh NBscripts/build-netbsd -j3 -d 
 77  1947  2468 62985  10  0  856 1676 wait   I    ttyp0 0:00.02 /u/NetBSD/arch/amd64/TOOLS/bin/nbm
 77  2156   357     0   2  0  856 1676 poll   S+   ttyp0 0:00.36 /u/NetBSD/arch/amd64/TOOLS/bin/nbm
 77  2468  2156 35149  10  0  180 1012 wait   I    ttyp0 0:00.01 sh 
 77  8441 12615     0   2  0  516 1336 poll   S    ttyp0 0:00.02 /u/NetBSD/arch/amd64/TOOLS/bin/nbm
 77  8690   300 66127  -2  0  440 1236 vnlock D    ttyp0 0:00.02 /u/NetBSD/arch/amd64/TOOLS/bin/nbm
 77  9933 12476 66175  10  0  184 1016 wait   I    ttyp0 0:00.00 sh 
 77 12476 27362     0   2  0  508 1324 poll   S    ttyp0 0:00.02 /u/NetBSD/arch/amd64/TOOLS/bin/nbm
 77 12615 12476 66175  10  0  184 1016 wait   I    ttyp0 0:00.00 sh 
 77 13141 22628 66127  -2  0  204  940 vnlock D    ttyp0 0:00.00 /u/NetBSD/arch/amd64/TOOLS/bin/nbm
 77 15800  1947 62985  10  0  184 1016 wait   I    ttyp0 0:00.00 /bin/sh -c _makedirtarget() {  dir
 77 18227 20174     0   2  0  548 1360 poll   S    ttyp0 0:00.02 /u/NetBSD/arch/amd64/TOOLS/bin/nbm
 77 18378  9933     0   2  0  444 1272 poll   S    ttyp0 0:00.02 /u/NetBSD/arch/amd64/TOOLS/bin/nbm
 77 18989 15800     0   2  0  856 1672 poll   S    ttyp0 0:00.22 /u/NetBSD/arch/amd64/TOOLS/bin/nbm
 77 19252 20274 66127  10  0  184 1016 wait   I    ttyp0 0:00.00 sh 
 77 20174 24829 66127  10  0  184 1016 wait   I    ttyp0 0:00.00 sh 
 77 20274 20656     0   2  0 1476 2324 poll   S    ttyp0 0:00.07 /u/NetBSD/arch/amd64/TOOLS/bin/nbm
 77 20656  8441 66175  10  0  184 1016 wait   I    ttyp0 0:00.01 sh 
 77 20815 12476 66175  10  0  184 1016 wait   I    ttyp0 0:00.00 sh 
 77 22628 26391 66127  10  0  184 1016 wait   I    ttyp0 0:00.00 sh 
 77 23035 18989 66175  10  0  180 1012 wait   I    ttyp0 0:00.01 sh 
 77 24151 18378 66175  10  0  184 1016 wait   I    ttyp0 0:00.00 sh 
 77 24232 23035 66175  10  0  812 1632 wait   I    ttyp0 0:00.02 /u/NetBSD/arch/amd64/TOOLS/bin/nbm
 77 24829 20815     0   2  0  696 1524 poll   S    ttyp0 0:00.03 /u/NetBSD/arch/amd64/TOOLS/bin/nbm
 77 25808 18227 66127  10  0  184 1016 wait   I    ttyp0 0:00.01 sh 
 77 26391 24151     0   2  0  796 1612 poll   S    ttyp0 0:00.02 /u/NetBSD/arch/amd64/TOOLS/bin/nbm
 77 26737 25808 66127   2  0  488 1272 piperd I    ttyp0 0:00.01 /u/NetBSD/arch/amd64/TOOLS/bin/nbm
 77 27362 24232 66175  10  0  188 1020 wait   I    ttyp0 0:00.00 /bin/sh -c _makedirtarget() {  dir
 77 27888 19252 66127  -2  0  208  944 vnlock D    ttyp0 0:00.01 /u/NetBSD/arch/amd64/TOOLS/bin/nbm
 77  3294  5057     0  18  0 2264 1588 pause  Ss   ttyp1 0:00.02 -tcsh 
 77 10268  3294     0  28  0  120  840 -      R+   ttyp1 0:00.00 ps al 
  0   227     1 12340   3  0   52  972 ttyin  Is+  ttyE0 0:00.00 /usr/libexec/getty Pc console 
  0   233     1 12340   3  0   52  972 ttyin  Is+  ttyE1 0:00.00 /usr/libexec/getty Pc ttyE1 
  0   234     1 12340   3  0   52  972 ttyin  Is+  ttyE2 0:00.00 /usr/libexec/getty Pc ttyE2 
  0   235     1 12340   3  0   52  972 ttyin  Is+  ttyE3 0:00.00 /usr/libexec/getty Pc ttyE3 

The machine is still responding to the network (no local console), only
the build.sh process and it's children hang.

This is the 3nd or 4th time I got this scenario, but couldn't find something
to trigger this at will.
What to do now? Sending a pr? Any things I can do to help with debugging???
(I've compiled the kernel w/ symbols.)

Kurt