port-amd64: restore paniced...(long, beware ;-)

Subject: restore paniced...(long, beware ;-)
To: None <port-amd64@netbsd.org, current@netbsd.org>
From: Kurt Schreiner <ks@ub.uni-mainz.de>
List: port-amd64
Date: 08/04/2006 18:01:46
Hi,

while copying an lfs-partition to another I got the following panic:

panic: pool_get: lfsinopl: page empty
Stopped in pid 495.1 (restore) at       netbsd:cpu_Debugger+0x5:        leave
db{1}> bt
cpu_Debugger() at netbsd:cpu_Debugger+0x5
panic() at netbsd:panic+0x1f5
pool_get() at netbsd:pool_get+0x2e6
lfs_vcreate() at netbsd:lfs_vcreate+0x72
lfs_ialloc() at netbsd:lfs_ialloc+0x76
lfs_valloc() at netbsd:lfs_valloc+0xdf
ufs_makeinode() at netbsd:ufs_makeinode+0x60
ufs_create() at netbsd:ufs_create+0x2c
lfs_create() at netbsd:lfs_create+0x37
VOP_CREATE() at netbsd:VOP_CREATE+0x2b
vn_open() at netbsd:vn_open+0x27c
sys_open() at netbsd:sys_open+0xd9
syscall_plain() at netbsd:syscall_plain+0x132
--- syscall (number -9007885349996724221) ---
0:
db{1}> 

There was top(1) running when the machine paniced:

load averages:  4.63,  4.76,  4.12                                      up 0 days,  1:26   17:16:39
57 processes:  1 runnable, 52 sleeping, 4 on processor
CPU0 states:  3.5% user,  0.0% nice, 21.4% system,  3.5% interrupt, 71.6% idle
CPU1 states:  2.5% user,  0.0% nice, 10.4% system,  0.0% interrupt, 87.1% idle
CPU2 states:  2.0% user,  0.0% nice, 11.4% system,  0.0% interrupt, 86.6% idle
CPU3 states:  5.0% user,  0.0% nice,  8.0% system,  0.0% interrupt, 87.1% idle
Memory: 3746M Act, 1244K Wired, 5172K Exec, 3626M File, 3874M Free
Swap: 8192M Total, 8192M Free

  PID USERNAME PRI NICE   SIZE   RES STATE      TIME   WCPU    CPU COMMAND
  495 root      40    0    58M   59M RUN/0      0:34  3.81%  3.81% restore
  807 root      18    0   672K   17M pause/1    0:12  3.27%  3.27% dump_lfs
 1212 root      39    0   672K   17M CPU/0      0:13  3.12%  3.12% dump_lfs
  804 root      37    0   672K   17M CPU/2      0:13  2.49%  2.49% dump_lfs
 1494 root      29    0   736K   17M CPU/3      0:12  1.56%  1.56% dump_lfs
   88 root      -6    0     0K   18M physio/0   0:22  1.42%  1.42% [physiod]
   19 root      18    0     0K   18M syncer/0   0:02  0.20%  0.20% [ioflush]
  500 root      10    0   672K   17M wait/0     0:31  0.00%  0.00% dump_lfs
   20 root     -18    0     0K   18M aiodon/0   0:06  0.00%  0.00% [aiodoned]
  511 root      -2    0  8532K 5256K segmen/0   0:03  0.00%  0.00% lfs_cleanerd
 1015 ks         2    0   456K 2844K select/3   0:02  0.00%  0.00% sshd
 1309 ks        28    0   252K 1320K CPU/1      0:00  0.00%  0.00% top


Some output from iostat -x wd1 1:

wd1          0.00      0   1.00     0.00      28.28    115   1.00     3.17
wd1          0.00      0   1.00     0.00      31.39    113   1.00     3.46
wd1          0.00      0   1.05     0.00      30.42    119   1.05     3.54
wd1          0.00      0   1.00     0.00      29.14    114   1.00     3.24
wd1          0.00      0   1.00     0.00      29.97    114   1.00     3.33
wd1          0.00      0   1.00     0.00      24.77    115   1.00     2.78
wd1          0.00      0   1.00     0.00      28.97    115   1.00     3.25
wd1         32.00     61   0.99     1.92      26.46    106   0.99     2.74
wd1         32.00    799   0.76    24.97       0.00      0   0.76     0.00
wd1         32.00   1179   0.71    36.85       0.00      0   0.71     0.00
wd1         32.00    915   0.71    28.59       0.00      0   0.71     0.00
wd1         32.00   1229   0.60    38.40       0.00      0   0.60     0.00
wd1         32.00   1259   0.64    39.36       0.00      0   0.64     0.00
wd1         32.00   1080   0.64    33.76       0.00      0   0.64     0.00
wd1         32.00    846   0.78    26.42       0.00      0   0.78     0.00


And finally what the file systems are:
>-503: df
Filesystem  1K-blocks      Used     Avail Capacity  Mounted on
/dev/wd1a     1032486    459880    520982    46%    /
/dev/wd1e     8258022    153158   7691964     1%    /var
/dev/wd1f     8258022      4456   7840666     0%    /opt
/dev/wd1h   103062405  13497610  79257047    14%    /u
/dev/wd1g    97517041   3565685  84199219     4%    /v
kernfs              1         1         0   100%    /kern
tmpfs        16346060         4  16346056     0%    /tmp
>-504: mount
/dev/wd1a on / type ffs (local)
/dev/wd1e on /var type ffs (local)
/dev/wd1f on /opt type ffs (noatime, local)
/dev/wd1h on /u type lfs (noatime, local)
/dev/wd1g on /v type lfs (noatime, local)
kernfs on /kern type kernfs (local)
tmpfs on /tmp type tmpfs (nosuid, nodev, local)


Sources where freshly cvs update'd, kernel is (a little) customized by 
commenting out unneeded drivers etc. and adding:

options         MULTIPROCESSOR
options         CONSDEVNAME="\"com\"",CONADDR=0x3f8,CONSPEED=19200
options         CONS_OVERRIDE   # Always use above! independent of boot info
#options        LOCKDEBUG
options         REALBASEMEM=631		# Thats what the BIOS says
options         REALEXTMEM=8387584	# Thats what the BIOS says


This was the 2nd panic trying to dump_lfs|restore ~13GB from one ~100GB-fs
to the other (both lfs on the same physical disk). First panic was w/ a
kernel w/ "options LOCKDEBUG" set and happend much more early (after ~60MB
or so, and sorry, no console output saved).

Any ideas anyone how I can help debugging this? Would be nice to have this
Sun Ultra40 reliably running a real operating system^W^W^W^WNetBSD ;-)


Kurt