Subject: kern/21039: panic: ffs_alloccg: map corrupted after UFS2 upgrade
To: None <gnats-bugs@gnats.netbsd.org>
From: None <stephenm@employees.org>
List: netbsd-bugs
Date: 04/06/2003 07:12:55
>Number:         21039
>Category:       kern
>Synopsis:       panic: ffs_alloccg: map corrupted after UFS2 upgrade
>Confidential:   no
>Severity:       critical
>Priority:       high
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Sun Apr 06 08:22:01 PDT 2003
>Closed-Date:
>Last-Modified:
>Originator:     Stephen Ma
>Release:        NetBSD 1.6Q 2003-04-02
>Organization:
	People's Front for the correct spelling of the word "Organisation"
>Environment:
System: NetBSD whitewater.local 1.6Q NetBSD 1.6Q (WHITEWATER) #7: Wed Apr  2 18:48:35 PST 2003 stephenm@whitewater.local:/v1/netbsd/obj/src/sys/arch/i386/compile/WHITEWATER i386
Architecture: i386
Machine: i386
>Description:
The kernel panics with the message "panic: ffs_alloccg: map corrupted"
soon after booting a kernel with the new UFS2 support included. This
happens when writing to a partition that works fine with a kernel
built prior to the UFS2 support (around 2003-03-27). The partition is
at 80% capacity, and has been happily in-use (including many full NetBSD
release builds) with the pre-UFS2 kernel for a longish time. Softdep is
enabled on the partition.

The panic seems to happen on the first write (or possibly the first
inode allocation) on that partition after booting with the UFS2
enabled kernel. However, the same UFS2 enabled kernel doesn't seem to
panic when writing to other (smaller) partitions on the same box, so
the panic seems to be sensitive to whatever partition it's accessing.

The panic seems to be reliably reproducible - it's happened several
times, and always seemingly on the first write to the partition after
a reboot.

The partition is on an IDE drive that probes as:

pciide0 at pci0 dev 7 function 1: Intel 82371AB IDE controller (PIIX4) (rev. 0x01)
pciide0: bus-master DMA support present
pciide0: primary channel wired to compatibility mode
wd0 at pciide0 channel 0 drive 0: <TOSHIBA MK1214GAP>
wd0: drive supports 16-sector PIO transfers, LBA addressing
wd0: 11513 MB, 16383 cyl, 16 head, 63 sec, 512 bytes/sect x 23579136 sectors
wd0: 32-bit data port
wd0: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 4 (Ultra/66)
pciide0: primary channel interrupting at irq 14
wd0(pciide0:0:0): using PIO mode 4, Ultra-DMA mode 2 (Ultra/33) (using DMA data
transfers)
pciide0: secondary channel wired to compatibility mode

A transcript of the panic is included below. A full copy of the dumpfs
output is available on request.
>How-To-Repeat:
23:18:21 whitewater:tmp# mount
/dev/wd0a on / type ffs (noatime, nodevmtime, local)
/dev/wd0e on /usr type ffs (noatime, soft dependencies, local)
/dev/wd0f on /v1 type ffs (noatime, soft dependencies, local)
mfs:187 on /tmp type mfs (synchronous, local)
23:18:25 whitewater:tmp# dumpfs /v1 >v1.dump
23:18:31 whitewater:tmp# head -22 v1.dump
file system: /dev/rwd0f
endian  little-endian
magic   11954   time    Fri Apr  4 21:00:14 2003
id      [ 0 0 ]
cylgrp  dynamic inodes  4.4BSD  fslevel 3       softdep disabled
nbfree  18649   ndir    28503   nifree  303643  nffree  20571
ncg     6       ncyl    392     size    740880  blocks  725607
bsize   32768   shift   15      mask    0xffff8000
fsize   4096    shift   12      mask    0xfffff000
frag    8       shift   3       fsbtodb 3
cpg     76      bpg     17955   fpg     143640  ipg     80896
minfree 5%      optim   time    maxcontig 2     maxbpg  8192
rotdelay 0ms    rps     60
ntrak   240     nsect   63      npsect  63      spc     15120
symlinklen 60   trackskew 0     interleave 1    contigsumsize 2
maxfilesize 0x004002001005ffff
nindir  8192    inopb   256     nspf    8
avgfilesize 16384       avgfpdir 64
sblkno  8       cblkno  16      iblkno  24      dblkno  2552
sbsize  4096    cgsize  32768   offset  8       mask    0xffffff00
csaddr  2552    cssize  4096    shift   11      mask    0xfffff800
cgrotor 0       fmod    0       ronly   0       clean   0x02
23:18:38 whitewater:tmp# cd /v1
23:18:41 whitewater:/v1# ls -tlr
total 48
drwxr-xr-x  3 root  wheel    512 Nov 19  2000 export
drwxr-xr-x  2 root  wheel    512 Jul 16  2002 tmp
drwxr-xr-x  7 root  wheel    512 Nov 30 02:47 netbsd
drwx-----T  2 root  wheel  33280 Apr  2 19:23 lost+found
23:18:48 whitewater:/v1# cp /usr/bin/vi .
start = 1, len = 17954, fs = /v1
offset=10736 10736
panic: ffs_alloccg: map corrupted
Stopped in pid 597.1 (cp) at    cpu_Debugger+0x4:       leave
db> show registers
ds          0x10
es          0x10
fs          0x30
gs          0x10
edi         0xc02aabce  fifo_nfsv2nodeop_opv_desc+0x72e
esi         0x100
ebp         0xcf546920  end+0xf212478
ebx         0xcf54694c  end+0xf2124a4
edx         0
ecx         0x16e084
eax         0x18e1
eip         0xc021e1c8  cpu_Debugger+0x4
cs          0x8
eflags      0x202
esp         0xcf546920  end+0xf212478
ss          0x10
cpu_Debugger+0x4:       leave
db> bt
cpu_Debugger(2,0,4622,c0179f1e,c02aabbf) at cpu_Debugger+0x4
panic(c02aabce,1,4622,c07c60d4,8) at panic+0xb8
ffs_mapsearch(c07c6000,ccf82000,8,0,8) at ffs_mapsearch+0x132
ffs_alloccgblk(cf5378f4,c4838120,8,0,0) at ffs_alloccgblk+0xb8
ffs_alloccg(cf5378f4,0,8,0,8000) at ffs_alloccg+0x12b
ffs_hashalloc(cf5378f4,0,8,0,8000) at ffs_hashalloc+0x2e
ffs_alloc(cf5378f4,0,0,8,0) at ffs_alloc+0x1bf
ffs_balloc_ufs1(cf546c90,cf2cf000,cf546c78,2,cf546ddc) at ffs_balloc_ufs1+0x68b
ffs_balloc(cf546c90,2,cf520b08,cf54a0a0,0) at ffs_balloc+0x2a
VOP_BALLOC(cf54a0a0,0,0,8000,c079b800) at VOP_BALLOC+0x4f
ufs_gop_alloc(cf54a0a0,0,0,8000,0) at ufs_gop_alloc+0xab
ffs_write(cf546e4c,30002,cf507b68,0,4213c) at ffs_write+0x5ec
VOP_WRITE(cf54a0a0,cf546ee0,1,c079b800,cf546f80) at VOP_WRITE+0x3b
vn_write(cf2d3620,cf2d3648,cf546ee0,c079b800,1) at vn_write+0x9f
dofilewrite(cf507b68,7,cf2d3620,48106000,4213c) at dofilewrite+0x87
sys_write(cf29ec80,cf546f80,cf546f78,c02285a4,0) at sys_write+0x6b
syscall_plain(1f,1f,1f,1f,805f150) at syscall_plain+0xab
db> ps
 PID           PPID     PGRP        UID S   FLAGS LWPS          COMMAND    WAIT
>Fix:

>Release-Note:
>Audit-Trail:
>Unformatted:
 >597            542      597          0 2  0x4002    1               cp
  542            529      542          0 2  0x4002    1             bash    wait
  529              1      529       1000 2  0x4003    1             bash    wait
  534            378      534          0 2  0x4002    1             bash   ttyin
  377              1      377          0 2  0x4002    1            getty   ttyin
  378              1      378       1000 2  0x4003    1             bash    wait
  381              1      381          0 2       0    1             cron nanosle
  348              1      348          0 2 0x20000    1            inetd  kqread
  311              1      311          0 2       0    1             sshd  select
  187              1      187          0 2       0    1        mount_mfs  mfsidl
  152              1      152          0 2       0    1          syslogd
  12               0        0          0 2 0x20200    1         aiodoned aiodone
  11               0        0          0 2 0x20200    1          ioflush  syncer
  10               0        0          0 2 0x20200    1           reaper  reaper
  9                0        0          0 2 0x20200    1       pagedaemon pgdaemo
  8                0        0          0 2 0x20200    1        pcic0,0,1  pcicev
  7                0        0          0 2 0x20200    1        pcic0,0,0  pcicev
  6                0        0          0 2 0x20200    1             pms0 pmsrese
  5                0        0          0 2 0x20200    1          usbtask  usbtsk
  4                0        0          0 2 0x20200    1             usb0  usbevt
  3                0        0          0 2 0x20200    1        atapibus0  sccomp
  2                0        0          0 2 0x20200    1       acpi sched acpisch
  1                0        1          0 2  0x4000    1             init    wait
  0               -1        0          0 2 0x20200    1          swapper schedul
 db> reboot
 syncing disks... 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 giving up
 panic: wdc_exec_command: polled command not done
 Stopped in pid 597.1 (cp) at    cpu_Debugger+0x4:       leave
 db> reboot
 rebooting...