Subject: kern/14463: Kernel freeze or panic with union and null fs
To: None <gnats-bugs@gnats.netbsd.org>
From: None <rslr@free.fr>
List: netbsd-bugs
Date: 11/05/2001 01:13:52
>Number:         14463
>Category:       kern
>Synopsis:       Kernel freeze or panic with union and null fs
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Mon Nov 05 01:15:00 PST 2001
>Closed-Date:
>Last-Modified:
>Originator:     de SAINT LEGER Rodolphe
>Release:        1.5.x on sparc/i386/amiga
>Organization:
none
>Environment:
NetBSD gally 1.5.1 NetBSD 1.5.1 (TOSHIBA 480 CDT) #0: Mon Jul 30 16:12:36 CEST 2001     root@gally:/usr/src/sys/arch/i386/compile/TOSHIBA480CDT i386
NetBSD Iria 1.5.3_ALPHA NetBSD 1.5.3_ALPHA (IRIA2) #0: Mon Oct 22 12:29:55 CEST 2001     root@Iria:/usr/src/sys/arch/amiga/compile/IRIA2 amiga
NetBSD gladys 1.5.2 NetBSD 1.5.2 (SST5) #0: Thu Oct 25 16:13:52 CEST 2001     root@gladys:/root/sys/arch/sparc/compile/SST5 sparc

>Description:
I've got a quite complex fstab to keep the original source repository,
and to share compiled object across the network, to implement parallel building with tools like cook or pvm

in this fstab, I'sharing in a standard way, distfiles, packages, obj and sandbox
these directories contains the pkgsrc distfiles, the precompiled packages (make using bulk-install), the working obj directory for compiling and a sandox for each platform

also, for union mounts I keep the upper layer visible in a special directory located in /usr/layers.


Here is my fstab:
/dev/wd0a / ffs rw 1 1
/dev/wd0b none swap sw 0 0
/dev/wd0e /usr ffs rw 1 2
/kern /kern kernfs rw
#
# mount sharesrc
#
192.168.31.14:/usr/sharesrc             /usr/sharesrc           nfs rw 0 0 
192.168.31.14:/usr/sharesrc/pkgsrc      /usr/sharesrc/pkgsrc    nfs ro 0 0 
192.168.31.14:/usr/sharesrc/src         /usr/sharesrc/src       nfs ro 0 0 
192.168.31.14:/usr/sharesrc/xsrc        /usr/sharesrc/xsrc      nfs ro 0 0 
#
# mount union layers
#
/usr/pkgsrc     /usr/layers/pkgsrc      null rw 0 0
/usr/src        /usr/layers/src         null rw 0 0
/usr/xsrc       /usr/layers/xsrc        null rw 0 0
/usr/sharesrc/distfiles /usr/pkgsrc/distfiles   null rw 0 0
/usr/sharesrc/packages  /usr/pkgsrc/packages    null rw 0 0
#
# mount pkgsrc, src and xsrc
#
/usr/sharesrc/pkgsrc    /usr/pkgsrc     union rw,-b 0,0
/usr/sharesrc/src       /usr/src        union rw,-b 0,0
/usr/sharesrc/xsrc      /usr/xsrc       union rw,-b 0,0

The first problem is that if I null mount /usr/pkgsrc/distfiles and /usr/pkgsrc/packages
after the union mount, I can't unmount them anymore, and the distfiles are downloaded in /usr/layers/distfiles instead of /usr/sharesrc/distfiles
(The newly mounted directories says at lower layer, while they should be on the upper layer),
a work around is to null mount them before the /usr/pkgsrc union mount

The second problem is that the machine won't reboot,
the machine freeze during the sync of disks (i386) or cause a panic (amiga).
I try to reproduce the panic, but it seems to appear randomly, (as I just crash my amiga trying to reproduce this panic... :(
it use to panic lots of time, and when I want to notice the debugger output, it just doesn't want)

The third problem is a pkgsrc problem (I know it's not kern but I put the problems all together)
If I make ls in /usr/sharesrc/pkgsrc/archivers/lha, I've got a pkg subdirectory while the DESCR and PLIST file are in the top level package directory, which cause an error during the bulk-install.
This directory can't be modified by a build script even on the machine which share this directory
( the machine which share this directory have the pkgsrc in /usr/update, It null mounts the /usr/sharesrc/pkgsrc as read only before using it, making the fstab looks like the other macihnes)

The fourth problem is that the kernel fails to use pkgsrc,
The first time it's a cd error (a getcwd() system call fails), the second time, the fs freezes forever until the next reboot (which won't work).
Notice one strange thing, the fstab looks the same on the main machine, If I process the same thing on it, the make process is freezed, nfs directories can't be mounted anymore, BUT nfs directories which are already mounted still works...
As I write this pr, the main machine (the sparc) is running sup (pkgsrc sup is finished, a make bulk-install process is freezed), the i386 works fine (as I last reboot it yesterday evening), the amiga can't mount it's fstab (RPC timed out).
Notice also, that if after the first make, if you come to /usr/layers/pkgsrc to clean the layer by some rm -rf (don't touch distfiles, packages and mk subdirectory, If you make an ls -al, it freezes for eternity

I send the fstab and the exports of the machine who share directories to help
/etc/fstab
/dev/sd0a / ffs rw 1 1
/dev/sd0e /usr ffs rw 1 2
/dev/sd0f /var ffs rw 1 2
/dev/sd0g /var/mail ffs rw 1 2
/dev/sd0h /home ffs rw 1 2
/dev/sd0b none swap sw 0 0
#
# mount sharesrc
#
/usr/update/pkgsrc      /usr/sharesrc/pkgsrc    null ro 0 0
/usr/update/src         /usr/sharesrc/src       null ro 0 0
/usr/update/xsrc        /usr/sharesrc/xsrc      null ro 0 0
#
# mount union layers
#
/usr/pkgsrc     /usr/layers/pkgsrc      null rw 0 0
/usr/src        /usr/layers/src         null rw 0 0
/usr/xsrc       /usr/layers/xsrc        null rw 0 0
/usr/sharesrc/distfiles /usr/pkgsrc/distfiles   null rw 0 0
/usr/sharesrc/packages  /usr/pkgsrc/packages    null rw 0 0
#
# mount pkgsrc, src and xsrc
#
/usr/sharesrc/pkgsrc    /usr/pkgsrc     union rw,-b 0,0
/usr/sharesrc/src       /usr/src        union rw,-b 0,0
/usr/sharesrc/xsrc      /usr/xsrc       union rw,-b 0,0
#
# mount /altroot
#
/usr/sharesrc/sandbox/sparc     /altroot        null    rw 0 0

/etc/exports
/home -network 192.168.31.0/28
/home -network 192.168.31.32/28
/usr/sharesrc -network 192.168.31.0/28 -maproot=0:9
/usr/sharesrc/pkgsrc -ro -network 192.168.31.0/28
/usr/sharesrc/src -ro -network 192.168.31.0/28
/usr/sharesrc/xsrc -ro -network 192.168.31.0/28

>How-To-Repeat:
for the first problem, take this fstab, put /usr/pkgsrc/distfiles and /usr/pkgsrc/packages after union mounts

for the second problem, make a bulk-install in whatever package you want, make it a second time,
when the make freeze, reboot it (in another session), it should not work.

for the third problem, erase your pkgsrc repository (don't forget the sup entries),
do a sup -s -v, and waiiiiiit :)
cd /usr/pkgsrc/archivers/lha
make bulk-install


For the fourth problem
after a reboot (don't forget to edit build.conf and mk.conf)
cd /usr/pkgsrc/<package>
make bulk-install

Here is an output for /usr/pkgsrc/shells

[root@Iria]:/usr/pkgsrc/shells# make bulk-install
===> bash2
BULK> Package bash-2.05nb1 not built yet, packaging...
make bulk-package PRECLEAN=no
BULK> Package bash-2.05nb1 not built yet, packaging...
BULK> Full rebuild  in progress...
BULK> Cleaning package and its depends
make clean CLEANDEPENDS=YES
===> Cleaning for gettext-lib-0.10.35nb1
===> Cleaning for libtool-base-1.4.20010614nb4
===> Cleaning for bash-2.05nb1
make package (bash-2.05nb1)
===> Validating dependencies for bash-2.05nb1
cd: getcwd() failed: No such file or directory
*** Error code 2

Stop.
*** Error code 1

Stop.
*** Error code 1

Stop.
BULK> bash-2.05nb1 was marked as broken:
-rw-r--r--  1 root  wsrc  414 Nov  5 07:30 .broken.amiga
make deinstall
===> Deinstalling for bash-2.05nb1
BULK> Cleaning packages and its depends
make clean CLEANDEPENDS=YES
===> Cleaning for gettext-lib-0.10.35nb1
===> Cleaning for libtool-base-1.4.20010614nb4
===> Cleaning for bash-2.05nb1
BULK> Build for bash-2.05nb1 was not successful, aborting.
*** Error code 1

Stop.
*** Error code 1

Stop.
===> es
BULK> Package es-0.9a1 not built yet, packaging...
make bulk-package PRECLEAN=no
BULK> Package es-0.9a1 not built yet, packaging...
BULK> Full rebuild  in progress...
BULK> Cleaning package and its depends
make clean CLEANDEPENDS=YES
===> Cleaning for es-0.9a1
make package (es-0.9a1)
^C[root@Iria]:/usr/pkgsrc/shells# 

Here is an example of modifying the /usr/layers directory
after a reboot and a bulk error (typing from a freezed output on the sparc)
# cd /usr/layers
# ls
pkgsrc src    xsrc
# cd pkgsrc
# ls
.broken.sparc   distfiles       packages
devel           mk              shells
# rm -rf .broken.sparc devel shells
# ls -al
^C^C

>Fix:
for the second problem I think that a shell script should make the trick, sorry not to write it
(perhaps in feeback)

for the union/null fs, I have no idea

>Release-Note:
>Audit-Trail:
>Unformatted: