Subject: kern/8383: LFS can reproducibly be wedged in post-1.4.1 release branch
To: None <gnats-bugs@gnats.netbsd.org>
From: None <Havard.Eidnes@runit.sintef.no>
List: netbsd-bugs
Date: 09/12/1999 07:44:38
>Number:         8383
>Category:       kern
>Synopsis:       LFS can reproducibly be wedged in post-1.4.1 release branch
>Confidential:   no
>Severity:       serious
>Priority:       high
>Responsible:    kern-bug-people (Kernel Bug People)
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Sun Sep 12 07:20:01 1999
>Last-Modified:
>Originator:     Havard Eidnes
>Organization:
	RUNIT AS
>Release:        netbsd-1-4 branch as of Sat Sep 11 1999
>Environment:
	AlphaPC164, netbsd-1-4 branch (i.e. post-1.4.1)
System: NetBSD albatross.runit.sintef.no 1.4.1 NetBSD 1.4.1 (ALBATROSS) #3: Sat Sep 11 20:41:18 MEST 1999     he@albatross.runit.sintef.no:/usr/src/sys/arch/alpha/compile/ALBATROSS alpha

>Description:
	Relatively simple use of LFS can cause the system to wedge
	solidly.  In my test case repeated use of "bonnie" on a
	freshly constructed LFS partition causes this problem after
	a few runs.

	In one of my earlier tests, the I/O on the LFS file system
	stopped, but I could still execute other commands on the other
	ssh sessions I had open towards the target host, and when that
	happened the output of "ps axlww" looked like this:

albatross% ps axlww
  UID   PID  PPID CPU PRI NI   VSZ  RSS WCHAN  STAT TT       TIME COMMAND
    0     0     0   0 -18  0     0 31104 schedu DLs  ??    0:00.00 (swapper)
    0     1     0   0  10  0   592  328 wait   Is   ??    0:00.01 init 
    0     2     0   0 -18  0     0 31104 daemon DL   ??    0:00.00 (pagedaemon)
    0     3     0   0 -18  0     0 31104 reaper DL   ??    0:00.00 (reaper)
    0   118     1   0   2  0   176  648 select Ss   ??    0:00.10 syslogd 
    0   122     1   0   2  0   104  688 select Ss   ??    0:00.23 portmap -l 
    0   126     1   0  10  0 100664  336 mfsidl Is   ??    0:00.02 mount_mfs -s 200000 -o nosuid -o nodev /dev/wd0b /tmp 
    0   159     1   0  -5  0    32  504 lfs se Ds   ??    0:02.94 update 30 
    0   161     1   0  10  0   392  808 nanosl Is   ??    0:00.02 cron 
    0   164     1   0   2  0   176  800 select Ss   ??    0:00.16 inetd -l 
13841   184     1  36 104 20 14896 15568 -      RN   ??   38:49.93 ./setiathome
    0   186     1   0   2  0   640 1040 select Is   ??    0:00.37 /usr/pkg/sbin/sshd (sshd1)
    0   193   186   0   2  0   848 1352 select I    ??    0:00.24 /usr/pkg/sbin/sshd (sshd1)
    0   199   186   0   2  0   848 1352 select S    ??    0:00.52 /usr/pkg/sbin/sshd (sshd1)
    0   213     1   0  10  0    88  584 wait   Is   ??    0:00.00 /usr/libexec/lfs_cleanerd /spare3 
    0   214   213   0  -5  0   376  872 getblk D    ??    0:06.97 /usr/libexec/lfs_cleanerd /spare3 
    0   236   186   0   2  0   848 1352 select S    ??    0:00.15 /usr/pkg/sbin/sshd (sshd1)
13841   195   193   0  18  0  1056 1472 pause  Is   p0    0:00.08 -tcsh 
    0   205   195   0  18  0   888  448 pause  I    p0    0:00.08 -csh 
    0   367   205   0  -5  0    80  696 lfs th D+   p0    1:24.17 bonnie -s 1000 
13841   201   199   0  10  0  1056 1504 ppwait Ds   p1    0:00.06 -tcsh 
13841   443   201   0  28  0   736  352 -      R+   p1    0:00.00 ps -axlww 
13841   444   201   0  28  0  1056 1504 -      RV   p1    0:00.00 -tcsh 
13841   238   236   4  18  0  1056 1496 pause  Is   p2    0:00.05 -tcsh 
13841   242   238   0   3  0   656 1328 ttyin  S+   p2    0:01.08 systat vm 
    0   188     1   7   3  0   112  824 ttyin  Is+  C0    0:00.02 /usr/libexec/getty std.9600 console 
albatross%

	During the above test the file system under test was on an
	ncr860 controller, and eventually the disk driver got sick and
	started spewing

assertion "cp" failed: file "../../../../dev/pci/ncr.c", line 6704
assertion "cp" failed: file "../../../../dev/pci/ncr.c", line 6704
sd1(ncr0:1:0): COMMAND FAILED (4 28) @0xfffffe00000b3000.
sd1(ncr0:1:0): COMMAND FAILED (4 28) @0xfffffe00000b3800.

	messages all over the console.

	In a repeat test where I used a Qlogic ISP1040B controller and
	another disk on this same system, I did not get to the period
	of "no I/O on LFS file system but rest of system still works",
	but rather the whole system just wedged.  However, the driver
	did not stuff up as above.

	I took a crash dump in both instances, and after the second
	crash dump I did this digging (as partly directed by Konrad
	Schröder, partly doing some trial/error):

Script started on Sun Sep 12 14:30:07 1999
snylteveps% ssh albatross
Last login: Sun Sep 12 14:32:35 1999 from 129.241.196.66
NetBSD 1.4.1 (ALBATROSS) #3: Sat Sep 11 20:41:18 MEST 1999

Welcome to NetBSD!

No mail.
albatross% su
albatross# cd /var/crash
albatross# ps axlww -Mnetbsd.1.core -N netbsd.1
  UID   PID  PPID CPU PRI NI   VSZ  RSS WCHAN  STAT TT       TIME COMMAND
    0     0     0   0 -18  0     0    0 schedu DLs  ??    0:00.01 (swapper)
    0     1     0   0  10  0   608    0 wait   Is   ??    0:00.02 (init)
    0     2     0   0 -18  0     0    0 daemon DL   ??    0:00.00 (pagedaemon)
    0     3     0   0 -18  0     0    0 reaper DL   ??    0:00.01 (reaper)
    0   118     0   0   2  0   176    0 -      Rs   ??    0:00.85 (syslogd)
    0   122     0   0 -14  0   104    0 inode  Ds   ??    0:04.35 (portmap)
    0   126     0   0  10  0 100664    0 mfsidl Is   ??    0:00.08 (mount_mfs)
    0   158     0   0  -5  0    32    0 getnew Ds   ??    0:15.68 (update)
    0   160     0   0 -14  0   392    0 inode  Ds   ??    0:00.16 (cron)
    0   163     0   0   2  0   176    0 select Is   ??    0:03.16 (inetd)
13841   184     0  27  -5 20 14912    0 getnew DN   ??  948:44.79 (setiathome)
    0   185     0   3   2  0   640    0 select Is   ??    0:01.74 (sshd1)
    0   188     0   0   2  0   848    0 select I    ??    0:00.11 (sshd1)
    0   239     0   0   2  0   848    0 select I    ??    0:01.49 (sshd1)
    0   246     0   0   2  0   848    0 select I    ??    0:00.22 (sshd1)
    0  1151     0   0   2  0   848    0 select I    ??    0:00.08 (sshd1)
    0  5298     0   0   2  0   848    0 select I    ??    0:00.26 (sshd1)
    0  5307     0   0   2  0   848    0 select I    ??    0:00.22 (sshd1)
    0  5331     0   0  10  0    88    0 wait   Is   ??    0:00.00 (lfs_cleanerd)
    0  5332     0   0  28  0  1032    0 segmen I    ??    0:21.74 (lfs_cleanerd)
    0  5361     0   0   2  0   848    0 select I    ??    0:00.49 (sshd1)
    0  5675     0   0  -5  0   128    0 getnew D    ??    0:00.00 (portmap)
    0  5676     0   0  -5  0   176    0 getnew D    ??    0:00.00 (inetd)
    0  5678     0   0 -14  0   112    0 inode  D    ??    0:00.00 (portmap)
13841   190     0   0  18  0  1056    0 pause  Is   p0    0:00.07 (tcsh)
    0   252     0   6   3  0   888    0 ttyin  I+   p0    0:00.05 (csh)
13841   241     0   0  18  0  1056    0 pause  Is   p1    0:00.05 (tcsh)
13841   245     0   0  -5  0   656    0 getnew D+   p1    0:22.37 (systat)
13841   248     0   0  18  0  1072    0 pause  Is   p2    0:00.08 (tcsh)
    0   288     0   7   3  0   936    0 ttyin  I+   p2    0:00.04 (csh)
13841  1153     0   2   3  0  1056    0 ttyin  Is+  p3    0:00.06 (tcsh)
13841  5309     0   0  18  0  1056    0 pause  Is   p4    0:00.05 (tcsh)
    0  5313     0   0  18  0   888    0 pause  I    p4    0:00.07 (csh)
    0  5623     0   8  -5  0    80    0 getnew D+   p4    3:11.84 (bonnie)
13841  5300     0   1  18  0  1056    0 pause  Is   p5    0:00.05 (tcsh)
13841  5304     0   0 -14  0   656    0 inode  D+   p5    0:01.63 (systat)
13841  5363     0   0  10  0  1056    0 ppwait Ds   p6    0:00.10 (tcsh)
13841  5677     0   0  -5  0  1056    0 getnew DV+  p6    0:00.00 (tcsh)
    0  5360     0   0   3  0   112    0 ttyin  Is+  C0    0:00.01 (getty)
albatross# gdb -q netbsd.1
(no debugging symbols found)...(gdb) 
(gdb) target kcore netbsd.1.core
#0  0xfffffc0000473958 in dumpsys ()
(gdb) where
#0  0xfffffc0000473958 in dumpsys ()
#1  0xfffffc00004735a0 in cpu_reboot ()
#2  0xfffffc0000311b9c in db_reboot_cmd ()
#3  0xfffffc0000311614 in db_command ()
#4  0xfffffc00003118cc in db_command_loop ()
#5  0xfffffc0000315ad4 in db_trap ()
#6  0xfffffc000048091c in ddb_trap ()
#7  0xfffffc000047b5fc in trap ()
#8  0xfffffc0000300484 in XentIF ()
#9  0xfffffc000030c9c8 in comintr ()
warning: Hit beginning of text section without finding
warning: enclosing function for address 0x4
This warning occurs if you are debugging a function without any symbols
(for example, in a stripped executable).  In that case, you may wish to
increase the size of the search with the `set heuristic-fence-post' command.

Otherwise, you told GDB there was a function where there isn't one, or
(more likely) you have encountered a bug in GDB.
(gdb) p curproc
$1 = 0
(gdb) help proc
Set current context from proc address
(gdb) ^Z
Suspended
albatross# locate gdbscri
/usr/src/sys/gdbscripts
/usr/src/sys/gdbscripts/bdump
/usr/src/sys/gdbscripts/CVS
/usr/src/sys/gdbscripts/CVS/Entries
/usr/src/sys/gdbscripts/CVS/Repository
/usr/src/sys/gdbscripts/CVS/Root
/usr/src/sys/gdbscripts/CVS/Tag
/usr/src/sys/gdbscripts/kdump
/usr/src/sys/gdbscripts/pgrpdump
/usr/src/sys/gdbscripts/stats
/usr/src/sys/gdbscripts/vchain
/usr/src/sys/gdbscripts/vdump
/usr/src/sys/gdbscripts/xps
albatross# fg
gdb /var/crash/netbsd.1
(gdb) 
(gdb) source xps
(gdb) xps
Attempt to extract a component of a value that is not a structure.
(gdb) p allproc
$1 = 133653960
(gdb) p allproc.lh_first
Attempt to extract a component of a value that is not a structure.
(gdb) quit
albatross# pwd
/usr/src/sys/gdbscripts
albatross# cd conf
/sys/arch/alpha/conf 
albatross# ed ALBATROSS
9373

...


a
makeoptions DEBUG="-g"
.
w
9396
q
albatross# config ALBATROSS
Don't forget to run "make depend"
albatross# cd compile/!$
cd compile/ALBATROSS
/sys/arch/alpha/compile/ALBATROSS 
albatross# ls -l list*
-rw-r--r--  1 root  wheel  84186 Aug 10 10:42 list.1
-rw-r--r--  1 root  wheel   3268 Aug 22 15:58 list.2
-rw-r--r--  1 root  wheel  14632 Sep 11 19:25 list.3
-rw-r--r--  1 root  wheel  89736 Sep 11 20:41 list.4
albatross# (nice make clean all; time) >& list.5 & tail -f list.5
[2] 309

...

ld -N -Ttext fffffc0000300000 -e __transfer -G 4 -X -o netbsd ${SYSTEM_OBJ} vers.o
text    data    bss     dec     hex     filename
1854492 142960  223936  2221388 21e54c  netbsd
cp netbsd netbsd.gdb
strip -g -X -x netbsd
453.7u 62.9s 9:39.00 89.2% 0+0k 4433+17199io 544pf+0w
^C
[2]  - Done                   ( nice make clean all; time ) >& list.5
albatross# pwd
/usr/src/sys/arch/alpha/compile/ALBATROSS
albatross# ls -l netbsd*
-rwxr-xr-x  1 root  wheel   2181479 Sep 12 14:56 netbsd*
-rwxr-xr-x  1 root  wheel  12640465 Sep 12 14:56 netbsd.gdb*
albatross# cd /usr/src/sys/gdbscripts
albatross# gdb -q /sys/arch/alpha/compile/ALBATROSS/netbsd.gdb
(gdb) target kcore /var/crash/netbsd.1.core
#0  0xfffffc0000473958 in dumpsys ()
    at ../../../../arch/alpha/alpha/machdep.c:1444
1444            savectx(&dumppcb);
(gdb) source xps
(gdb) xps
              proc   pid     flag st              wchan comm
0xfffffc0007f765c8  5678        4  3 0xfffffc0006379c68 portmap (inode)
0xfffffc0007f76010  5677       16  3 0xfffffc000050b8e8 tcsh (getnewbuf)
0xfffffc0007f76998  5676        4  3 0xfffffc000050b8e8 inetd (getnewbuf)
0xfffffc0007f767b0  5675        4  3 0xfffffc000050b8e8 portmap (getnewbuf)
0xfffffc00064e5ca0  5623     4006  3 0xfffffc000050b8e8 bonnie (getnewbuf)
0xfffffc0007f761f8  5363     4006  3 0xfffffc0007f761f8 tcsh (ppwait)
0xfffffc0007f763e0  5361       84  3 0xfffffc000050a7c8 sshd1 (select)
0xfffffc000237e5b8  5360     4086  3 0xfffffc000235e010 getty (ttyin)
0xfffffc00064e56e8  5332       84  3 0xfffffe0000128840 lfs_cleanerd (segment)
0xfffffc00064e5ab8  5331       84  3 0xfffffc00064e5ab8 lfs_cleanerd (wait)
0xfffffc00064e5318  5313     4086  3 0xfffffe00107761a0 csh (pause)
0xfffffc00064e47a8  5309     4086  3 0xfffffe00107661a0 tcsh (pause)
0xfffffc00064e5500  5307       84  3 0xfffffc000050a7c8 sshd1 (select)
0xfffffc00064e4d60  5304     4106  3 0xfffffc00049cdb08 systat (inode)
0xfffffc00064e58d0  5300     4086  3 0xfffffe001076e1a0 tcsh (pause)
0xfffffc00064e4b78  5298       84  3 0xfffffc000050a7c8 sshd1 (select)
0xfffffc00064e4f48  1153     4086  3 0xfffffc000235e6c8 tcsh (ttyin)
0xfffffc00064e5130  1151       84  3 0xfffffc000050a7c8 sshd1 (select)
0xfffffc00064e4990   288     4086  3 0xfffffc000235e570 csh (ttyin)
0xfffffc00064e45c0   252     4086  3 0xfffffc000235e2c0 csh (ttyin)
0xfffffc00064e43d8   248     4086  3 0xfffffe001074e1a0 tcsh (pause)
0xfffffc00064e4008   246       84  3 0xfffffc000050a7c8 sshd1 (select)
---Type <return> to continue, or q <return> to quit---
0xfffffc00064e41f0   245     4106  3 0xfffffc000050b8e8 systat (getnewbuf)
0xfffffc000237f6e0   241     4086  3 0xfffffe00107421a0 tcsh (pause)
0xfffffc000237fc98   239       84  3 0xfffffc000050a7c8 sshd1 (select)
0xfffffc000237fab0   190     4086  3 0xfffffe00107321a0 tcsh (pause)
0xfffffc000237ed58   188       84  3 0xfffffc000050a7c8 sshd1 (select)
0xfffffc000237f8c8   185       84  3 0xfffffc000050a7c8 sshd1 (select)
0xfffffc000237f4f8   184     4005  3 0xfffffc000050b8e8 setiathome (getnewbuf)
0xfffffc000237f310   163       84  3 0xfffffc000050a7c8 inetd (select)
0xfffffc000237f128   160      104  3 0xfffffc000237c058 cron (inode)
0xfffffc000237e988   158        4  3 0xfffffc000050b8e8 update (getnewbuf)
0xfffffc000237ef40   126       84  3 0xfffffc00056f2388 mount_mfs (mfsidl)
0xfffffc000237e7a0   122        4  3 0xfffffc0005f38648 portmap (inode)
0xfffffc000237eb70   118       84  2                  0 syslogd (select)
0xfffffc000237e3d0     3    20204  3 0xfffffc0000509a38 reaper (reaper)
0xfffffc000237e1e8     2    20204  3 0xfffffc000051c5e8 pagedaemon (daemon_slp)
0xfffffc000237e000     1     4084  3 0xfffffc000237e000 init (wait)
0xfffffc00005074f8     0    20204  3 0xfffffc00005074f8 swapper (scheduler)
              proc   pid     flag st              wchan comm
(gdb) proc 0xfffffc00064e5ab8
(gdb) where
#0  0xfffffc0000336948 in mi_switch () at ../../../../kern/kern_synch.c:632
#1  0xfffffc0000335fc0 in tsleep (ident=0x0, priority=288, 
    wmesg=0xfffffc000048d0e5 "wait", timo=0)
    at ../../../../kern/kern_synch.c:370
#2  0xfffffc000032b45c in sys_wait4 (q=0xfffffc00064e5ab8, 
    v=0xfffffe0010785e88, retval=0xfffffe0010785ed8)
    at ../../../../kern/kern_exit.c:508
#3  0xfffffc000047bbcc in syscall (code=7, framep=0xfffffe0010785ef8)
    at ../../../../arch/alpha/alpha/trap.c:664
#4  0xfffffc0000300540 in XentSys ()
    at ../../../../arch/alpha/alpha/locore.s:581
warning: Hit heuristic-fence-post without finding
warning: enclosing function for address 0x16024bd88
This warning occurs if you are debugging a function without any symbols
(for example, in a stripped executable).  In that case, you may wish to
increase the size of the search with the `set heuristic-fence-post' command.

Otherwise, you told GDB there was a function where there isn't one, or
(more likely) you have encountered a bug in GDB.
(gdb) proc 0xfffffc00064e56e8
(gdb) where
#0  0xfffffc0000336948 in mi_switch () at ../../../../kern/kern_synch.c:632
#1  0xfffffc0000335fc0 in tsleep (ident=0x0, priority=306, 
    wmesg=0xfffffc0000494209 "segment", timo=307377)
    at ../../../../kern/kern_synch.c:370
#2  0xfffffc000040dcc0 in lfs_segwait (p=0x0, v=0x0, retval=0x0)
    at ../../../../ufs/lfs/lfs_syscalls.c:862
#3  0xfffffc000047bbcc in syscall (code=187, framep=0xfffffe001077def8)
    at ../../../../arch/alpha/alpha/trap.c:664
#4  0xfffffc0000300540 in XentSys ()
    at ../../../../arch/alpha/alpha/locore.s:581
warning: Hit heuristic-fence-post without finding
warning: enclosing function for address 0x120001c18
(gdb) up
#1  0xfffffc0000335fc0 in tsleep (ident=0x0, priority=306, 
    wmesg=0xfffffc0000494209 "segment", timo=307377)
    at ../../../../kern/kern_synch.c:370
370             mi_switch();
(gdb) up
#2  0xfffffc000040dcc0 in lfs_segwait (p=0x0, v=0x0, retval=0x0)
    at ../../../../ufs/lfs/lfs_syscalls.c:862
862             error = tsleep(addr, PCATCH | PUSER, "segment", timeout);
(gdb) l
857                     timeout = hzto(&atv);
858                     splx(s);
859             } else
860                     timeout = 0;
861             
862             error = tsleep(addr, PCATCH | PUSER, "segment", timeout);
863             return (error == ERESTART ? EINTR : 0);
864     }
865     
866     /*
(gdb) up
#3  0xfffffc000047bbcc in syscall (code=187, framep=0xfffffe001077def8)
    at ../../../../arch/alpha/alpha/trap.c:664
664                     error = (*callp->sy_call)(p, args + hidden, rval);
(gdb) l
659             scdebug_call(p, code, args + hidden);
660     #endif
661             if (error == 0) {
662                     rval[0] = 0;
663                     rval[1] = 0;
664                     error = (*callp->sy_call)(p, args + hidden, rval);
665             }
666     
667             switch (error) {
668             case 0:
(gdb) up
#4  0xfffffc0000300540 in XentSys ()
    at ../../../../arch/alpha/alpha/locore.s:581
../../../../arch/alpha/alpha/locore.s:581: No such file or directory.
Current language:  auto; currently asm
(gdb) quit
albatross# ^Dexit
albatross% ^Dlogout
Connection to albatross closed.
snylteveps% ^Dexit

Script done on Sun Sep 12 15:10:06 1999

	The session doing the test follows:

Script started on Sun Sep 12 13:11:21 1999
snylteveps% ssh albatross
Last login: Sun Sep 12 13:14:41 1999 from 129.241.196.66
NetBSD 1.4.1 (ALBATROSS) #3: Sat Sep 11 20:41:18 MEST 1999

Welcome to NetBSD!

No mail.
albatross% su
albatross# df
Filesystem  1K-blocks     Used    Avail Capacity  Mounted on
/dev/wd0a       49711    27202    20023    57%    /
/dev/wd0d     1485749   926050   485411    65%    /usr
/dev/wd0f      507750     2932   479430     0%    /var
mfs:126         96655        2    91820     0%    /tmp
/dev/wd0g     6059635  1169479  4587174    20%    /local
kernfs              1        1        0   100%    /kern
/dev/sd0g     6122066        1  5815961     0%    /spare1
/dev/sd2c     4058244        1  3855330     0%    /spare2
/dev/sd0a       49840    20543    26805    43%    /altroot
/dev/sd0d     1488395   146002  1267973    10%    /altroot/usr
/dev/sd0f      507928      200   482331     0%    /altroot/var
albatross# ed /etc/fstab
361
/spare1/
/dev/sd0g /spare1 ffs rw 1 2
s/^/# /p
# /dev/sd0g /spare1 ffs rw 1 2
w
363
q
albatross# umount /spare1
albatross# newfs_lfs /dev/rsd0g
Using 1 single indirect block(s) for inode 1
albatross# mount -t lfs /dev/sd0g /spare1
albatross# cd /spare1
albatross# df .
Filesystem  1K-blocks     Used    Avail Capacity  Mounted on
/dev/sd0g     6317056      233  5685117     0%    /spare1
albatross# mount | egrep spare1
/dev/sd0g on /spare1 type lfs (local)
albatross# bonnie -s 1000
File './Bonnie.5338', size: 1048576000
Writing with putc()...done
Rewriting...done
Writing intelligently...done
Reading with getc()...done
Reading intelligently...done
Seeker 1...Seeker 2...Seeker 3...start 'em...done...done...done...
              -------Sequential Output-------- ---Sequential Input-- --Random--
              -Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks---
Machine    MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU  /sec %CPU
         1000  4267 34.7  4354 20.1  3234 17.1  9895 53.8 10035 13.5 111.1  4.2
albatross# df .
Filesystem  1K-blocks     Used    Avail Capacity  Mounted on
/dev/sd0g     6317056     9662  5675688     0%    /spare1
albatross# ls -l
total 129
-rw-------  1 root  wheel  122880 Sep 12 13:35 ifile
drwx------  2 root  wheel     512 Sep 12 13:16 lost+found/
albatross# du .
1       ./lost+found
130     .
albatross# bonnie -s 2000
File './Bonnie.5450', size: 2097152000
Writing with putc()...done
Rewriting...done
Writing intelligently...done
Reading with getc()...done
Reading intelligently...done
Seeker 1...Seeker 2...Seeker 3...start 'em...done...done...done...
              -------Sequential Output-------- ---Sequential Input-- --Random--
              -Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks---
Machine    MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU  /sec %CPU
         2000  3819 30.9  4402 20.6  2861 15.4 10369 58.2 10405 14.4 103.3  3.4
albatross# df .
Filesystem  1K-blocks     Used    Avail Capacity  Mounted on
/dev/sd0g     6317056     8052  5677298     0%    /spare1
albatross# du .
1       ./lost+found
130     .
albatross# ls -ls
total 129
128 -rw-------  1 root  wheel  122880 Sep 12 14:11 ifile
  1 drwx------  2 root  wheel     512 Sep 12 13:16 lost+found/
albatross# bonnie -s 3000
File './Bonnie.5619', size: -1149239296
Writing with putc()...done
Rewriting...done
Writing intelligently...done
Reading with getc()...done
Reading intelligently...done
Seeker 1...Seeker 2...Seeker 3...^C
albatross# bonnie -s 2000
File './Bonnie.5623', size: 2097152000
Writing with putc()...done
Rewriting...^C^C^CRead from remote host albatross: Connection reset by peer
Connection to albatross closed.
snylteveps% ^Dexit

Script done on Sun Sep 12 14:27:20 1999

	and the console output was

Script started on Sun Sep 12 13:10:06 1999
snylteveps% ssh junk1
Last login: Fri Sep 10 18:23:02 1999 from vader.runit.sint
NetBSD 1.3H (GENERIC) #4: Tue Oct  6 23:45:26 MEST 1998

Welcome to NetBSD!

No mail.
% tip tty1
can't open log file /var/log/aculog.
connected


NetBSD/alpha (albatross.runit.sintef.no) (console)

login: Sep 12 13:15:14 albatross su: he to root on /dev/ttyp4
root
Last login: Thu Aug 12 22:46:44 1999 on console
Sep 12 13:20:55 albatross login: ROOT LOGIN (root) ON console
Copyright (c) 1996, 1997, 1998, 1999
        The NetBSD Foundation, Inc.  All rights reserved.
Copyright (c) 1980, 1983, 1986, 1988, 1990, 1991, 1993, 1994
        The Regents of the University of California.  All rights reserved.

NetBSD 1.4.1 (ALBATROSS) #3: Sat Sep 11 20:41:18 MEST 1999

Welcome to NetBSD!

You have new mail.
Sep 12 13:20:55 albatross login: ROOT LOGIN (root) ON console
Terminal type? [vt100] 
Don't login as root, use the su command.
albatross# Sep 12 13:21:06 albatross portmap[5358]: connect from 129.241.100.134 to getport(rstatd)

albatross# exit
albatross# logout

NetBSD/alpha (albatross.runit.sintef.no) (console)

login: Sep 12 13:30:54 albatross su: he to root on /dev/ttyp6


NetBSD/alpha (albatross.runit.sintef.no) (console)

login: de0: abnormal interrupt: transmit underflow (raising TX threshold to 96|256)
~Stopped at      Debugger+0x4:   ret     zero,(ra)
db> trace
db> reboot 104

dumping to dev 4,1 offset 525039
dump 256 255 254 253 252 251 250 249 248 247 246 245 244 243 242 241 240 239 238 237 236 235 234 233 232 231 230 229 228 227 226 225 224 223 222 221 220 219 218 217 216 215 214 213 212 211 210 209 208 207 206 205 204 203 202 201 200 199 198 197 196 195 194 193 192 191 190 189 188 187 186 185 184 183 182 181 180 179 178 177 176 175 174 173 172 171 170 169 168 167 166 165 164 163 162 161 160 159 158 157 156 155 154 153 152 151 150 149 148 147 146 145 144 143 142 141 140 139 138 137 136 135 134 133 132 131 130 129 128 127 126 125 124 123 122 121 120 119 118 117 116 115 114 113 112 111 110 109 108 107 106 105 104 103 102 101 100 99 98 97 96 95 94 93 92 91 90 89 88 87 86 85 84 83 82 81 80 79 78 77 76 75 74 73 72 71 70 69 68 67 66 65 64 63 62 61 60 59 58 57 56 55 54 53 52 51 50 49 48 47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 succeeded


rebooting...


halted CPU 0

halt code = 5
HALT instruction executed
PC = fffffc0000300140

CPU 0 booting

(boot dqa0.0.0.11.0 -flags a)
block 0 of dqa0.0.0.11.0 is a valid boot block
reading 14 blocks from dqa0.0.0.11.0
bootstrap code read in
base = 180000, image_start = 0, image_bytes = 1c00
initializing HWRPB at 2000
initializing page table at 172000
initializing machine state
setting affinity to the primary CPU
jumping to bootstrap code

NetBSD/Alpha 1.4 FFS Primary Bootstrap
Jumping to entry point...

NetBSD/alpha 1.4 Secondary Bootstrap, Revision 1.10
(ross@netbsd.org, May 8 11:34:00 PDT 1999)

VMS PAL rev: 0x1000900010114
OSF PAL rev: 0x1000800020116
Switch to OSF PAL code succeeded.

Boot flags: a

Loading netbsd...
1997456+223936+125472+[66833]

Entering netbsd at 0xfffffc0000301100...
[ preserving 193296 bytes of netbsd ELF symbol table ]
Copyright (c) 1996, 1997, 1998, 1999
    The NetBSD Foundation, Inc.  All rights reserved.
Copyright (c) 1982, 1986, 1989, 1991, 1993
    The Regents of the University of California.  All rights reserved.

NetBSD 1.4.1 (ALBATROSS) #3: Sat Sep 11 20:41:18 MEST 1999
    he@albatross.runit.sintef.no:/usr/src/sys/arch/alpha/compile/ALBATROSS
Digital AlphaPC 164 500 MHz
8192 byte page size, 1 processor.
real mem = 268435456 (2531328 reserved for PROM, 265904128 used by NetBSD)
avail mem = 230752256
using 3245 buffers containing 26583040 bytes of memory
mainbus0 (root)
cpu0 at mainbus0: ID 0 (primary), 21164A-2 (pass 2)
cia0 at mainbus0: DECchip 2117x Core Logic Chipset (ALCOR/ALCOR2), pass 3
cia0: extended capabilities: 21<DWEN,BWEN>
cia0: using BWX for PCI config access
pci0 at cia0 bus 0
pci0: i/o enabled, memory enabled
isp0 at pci0 dev 5 function 0
isp0: interrupting at eb164 irq 2
isp0: Ultra Mode Capable
isp0: Board Revision 1040B, loaded F/W Revision 7.55
isp0: Last F/W revision was 7.55
isp0: driver initiated bus reset
scsibus0 at isp0: 16 targets, 8 luns per target
isp0: Target 0 Async Mode
sd0 at scsibus0 targ 0 lun 0: <IBM, DGHS09U, 0350> SCSI3 0/direct fixed
sd0: 8748MB, 8152 cyl, 10 head, 219 sec, 512 bytes/sect x 17916240 sectors
ncr0 at pci0 dev 7 function 0: ncr 53c860 fast20 scsi
ncr0: interrupting at eb164 irq 1
ncr0: minsync=12, maxsync=137, maxoffs=8, 16 dwords burst, normal dma fifo
ncr0: single-ended, open drain IRQ driver
ncr0: restart (scsi reset).
scsibus1 at ncr0: 8 targets, 8 luns per target
sd1 at scsibus1 targ 1 lun 0: <DEC, RZ29B    (C) DEC, 0007> SCSI2 0/direct fixed
sd1(ncr0:1:0): 10.0 MB/s (100 ns, offset 8)
sd1: 4091MB, 3708 cyl, 20 head, 113 sec, 512 bytes/sect x 8380080 sectors
sd2 at scsibus1 targ 3 lun 0: <DEC, RZ29B    (C) DEC, 0007> SCSI2 0/direct fixed
sd2(ncr0:3:0): 10.0 MB/s (100 ns, offset 8)
sd2: 4091MB, 3708 cyl, 20 head, 113 sec, 512 bytes/sect x 8380080 sectors
sio0 at pci0 dev 8 function 0: Intel 82378ZB System I/O (SIO) (rev. 0x43)
de0 at pci0 dev 9 function 0
de0: interrupting at eb164 irq 3
de0: 21140 [10-100Mb/s] pass 1.2
de0: address 00:c0:f0:16:11:ca
de0: enabling 10baseT port
pciide0 at pci0 dev 11 function 0: CMD Technology PCI0646
pciide0: bus-master DMA support present
pciide0: primary channel wired to compatibility mode
wd0 at pciide0 channel 0 drive 0: <IBM-DJNA-370910>
wd0: drive supports 16-sector pio transfers, lba addressing
wd0: 8693MB, 16383 cyl, 16 head, 63 sec, 512 bytes/sect x 17803440 sectors
wd0: 32-bits data port
wd0: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 4
pciide0: secondary channel wired to compatibility mode
pciide0: secondary channel ignored (disabled)
wd0(pciide0:0:0): using PIO mode 4, DMA mode 2 (using DMA data transfers)
isa0 at sio0
com0 at isa0 port 0x3f8-0x3ff irq 4: ns16550a, working fifo
com0: console
com1 at isa0 port 0x2f8-0x2ff irq 3: ns16550a, working fifo
lpt0 at isa0 port 0x3bc-0x3bf irq 7
pcppi0 at isa0 port 0x61
spkr0 at pcppi0
isabeep0 at pcppi0
fdc0 at isa0 port 0x3f0-0x3f7 irq 6 drq 2
fd0 at fdc0 drive 0: 1.44MB, 80 cyl, 2 head, 18 sec
mcclock0 at isa0 port 0x70-0x71: mc146818 or compatible
root on wd0a dumps on wd0b
root file system type: ffs
swapctl: adding /dev/wd0b as swap device at priority 0
Automatic boot in progress: starting file system checks.
/dev/rwd0a: 884 files, 27202 used, 22509 free (245 frags, 2783 blocks, 0.5% fragmentation)
/dev/rwd0a: MARKING FILE SYSTEM CLEAN
/dev/rsd0a: 861 files, 20543 used, 29297 free (57 frags, 3655 blocks, 0.1% fragmentation)
/dev/rsd0a: MARKING FILE SYSTEM CLEAN
isp0: Target 0 at 20MHz Max Offset 8, 16 bit wide, Tagged Queueing Enabled
/dev/rsd0d: 8986 files, 146002 used, 1342393 free (6489 frags, 166988 blocks, 0.4% fragmentation)
/dev/rsd0d: MARKING FILE SYSTEM CLEAN
/dev/rsd0f: 87 files, 200 used, 507728 free (184 frags, 63443 blocks, 0.0% fragmentation)
/dev/rsd0f: MARKING FILE SYSTEM CLEAN
/dev/rsd2c: 1 files, 1 used, 4058243 free (11 frags, 507279 blocks, 0.0% fragmentation)
/dev/rsd2c: MARKING FILE SYSTEM CLEAN
/dev/rwd0d: 95090 files, 926050 used, 559699 free (35587 frags, 65514 blocks, 2.4% fragmentation)
/dev/rwd0d: MARKING FILE SYSTEM CLEAN
/dev/rwd0f: 295 files, 2913 used, 504837 free (349 frags, 63061 blocks, 0.1% fragmentation)
/dev/rwd0f: MARKING FILE SYSTEM CLEAN
/dev/rwd0g: UNREF FILE I=7944  OWNER=he MODE=100644
/dev/rwd0g: SIZE=406 MTIME=Sep 12 05:35 1999  (CLEARED)
/dev/rwd0g: FREE BLK COUNT(S) WRONG IN SUPERBLK (SALVAGED)
/dev/rwd0g: BLK(S) MISSING IN BIT MAPS (SALVAGED)
/dev/rwd0g: SUMMARY INFORMATION BAD (SALVAGED)
/dev/rwd0g: 64 files, 1169478 used, 4890157 free (69 frags, 611261 blocks, 0.0% fragmentation)
/dev/rwd0g: MARKING FILE SYSTEM CLEAN
setting tty flags
starting network
hostname: albatross.runit.sintef.no
configuring network interfaces:de0: enabling Full Duplex 100baseTX port
 de0.
add net default: gateway 129.241.249.7
adding interface aliases:
starting system logger
checking for core dump...
savecore: reboot
savecore: system went down at Sun Sep 12 14:27:07 1999
savecore: writing core to /var/crash/netbsd.1.core
Sep 12 14:29:49 albatross savecore: reboot
savecore: writing kernel to /var/crash/netbsd.1
starting rpc daemons: portmap.
starting nfs daemons:.
setting securelevel: kern.securelevel: 0 -> 1
checking quotas: done.
building databases...
clearing /tmp
updating motd.
standard daemons: update cron.
starting network daemons: inetd.
starting local daemons: sshd.
Sun Sep 12 14:30:36 MEST 1999

NetBSD/alpha (albatross.runit.sintef.no) (console)

login: ~.
Connection to junk1 closed.
snylteveps% ^Dexit

Script done on Sun Sep 12 14:28:33 1999


>How-To-Repeat:
	See above.	

>Fix:
	Sorry, too much "Black Magic" for me.  However, I'll be happy
	to assist in digging out any additional information from the
	crash dumps and the file systems (which are still untouched
	after the tests), and to test kernel modifications to fix this
	problem.
>Audit-Trail:
>Unformatted: