NetBSD-Users archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: Substantial COMPAT_LINUX changes in netbsd-5?
Hi,
a slight update to this one: it appears that we are now in a situation
where we have these two backup processes shown here -- this is with the
suggested change (LK_CANRECURSE in nfs_root()):
mail-server% ps axl | egrep bpbkar
0 5421 1 1332 85 0 18048 7260 uvn_fp2 D ? 2:28.47 bpbkar -r 1
0 17026 1 129 117 0 17860 4876 nfsrcv D ? 0:00.03 bpbkar -r 1
mail-server%
The lock which makes subsequent attempts at running "df" to hang is
already held at this point -- most probably by the process which is now
stuck in uvn_fp2 and which is not making any progress. Running
Linux-emulated "df" results in these hanging in "tstile" (a real "df"
probably would as well, did not test that, though):
mail-server% ps axlww | egrep "bpbkar|df"
0 5421 1 1332 85 0 18048 7260 uvn_fp2 D ? 2:28.47 bpbkar -r
1209600 -ru root -dt 1428531 -to 0 -clnt mail-server.nordu.net -class NetBSD
-sched Cumulative-Inc -st CINC -bpstart_to 300 -bpend_to 300 -read_to 1800
-ckpt_time 900 -blks_per_buffer 2048 -use_otm -nfsok -b
mail-server.nordu.net_1264048175 -kl 28 -use_ofb
0 17026 1 129 117 0 17860 4876 nfsrcv D ? 0:00.03 bpbkar -r
1209600 -ru root -dt 1432799 -to 0 -clnt mail-server.nordu.net -class NetBSD
-sched Cumulative-Inc -st CINC -bpstart_to 300 -bpend_to 300 -read_to 1800
-ckpt_time 900 -blks_per_buffer 2048 -use_otm -nfsok -b
mail-server.nordu.net_1264048175 -kl 28 -use_ofb
1612 4586 3404 0 43 0 152 32 - R+ ttyp2 0:00.00 egrep
bpbkar|df
1042 2519 1 0 127 0 1504 828 tstile D ttyp3- 0:00.00
/emul/linux/bin/df
1042 3006 1 0 127 0 1504 824 tstile D ttyp3- 0:00.01
/emul/linux/bin/df
mail-server%
The previous message contained a stack trace for the process then stuck
in uvn_fp2, I repeat the relevant part here:
---
I also did a backtrace of the other bpbkar processes which in "ps axl"
output had these wait channels:
0 5177 1 302 117 0 17860 4876 nfsrcv D ? 0:00.03 bpbkar -r
0 7179 1 915 85 0 18048 7260 uvn_fp2 D ? 2:52.22 bpbkar -r
db{0}> trace/t 0t7179
trace: pid 7179 lid 1 at 0xdb7ae3cc
sleepq_block(0,0,c0aaba51,c0b27c80,0,c150add8,62,c3ede230,de64667c,0) at
netbsd:sleepq_block+0xeb
mtsleep(c3ede230,204,c0aaba51,0,de64667c,de64667c,10,6,0,0) at
netbsd:mtsleep+0x12d
uvn_findpage(db7ae5ac,0,db7ae4ac,c05343fa,0,0,2,0,994000,db7ae5cc) at
netbsd:uvn_findpage+0x92
uvn_findpages(de64667c,97c40000,3,db7ae5ec,db7ae5ac,0,994000,20,2,0) at
netbsd:uvn_findpages+0x73
genfs_getpages(db7ae6b0,0,0,0,0,97cb0000,0,0,2,db7ae65c) at
netbsd:genfs_getpages+0x743
nfs_getpages(db7ae6b0,4,97c42000,3,0,10000,97cc0000,c089d600,de64667c,97c40000)
at netbsd:nfs_getpages+0xbb
VOP_GETPAGES(de64667c,97c40000,3,db7ae750,db7ae7c8,0,1,0,1802,0) at
netbsd:VOP_GETPAGES+0x65
uvn_get(de64667c,97c40000,3,db7ae750,db7ae7c8,0,1,0,1802,e41be780) at
netbsd:uvn_get+0x117
ubc_fault(db7ae8e0,d3a75000,db7ae8a0,1,0,1,42,c085d206,cee38540,ce3a4d00) at
netbsd:ubc_fault+0x170
uvm_fault_internal(c0bc21c0,d3a75000,1,0,c4ec6482,c0000,0,c05a6cfa,6,6) at
netbsd:uvm_fault_internal+0x3a9
trap() at netbsd:trap+0x797
--- trap (number 6) ---
copyout(e390a0e4,d3a75000,8249400,2000,e390a0e4,0,d3a75000,97c40000,3,d3a75000)
at netbsd:copyout+0x33
uiomove(d3a75000,2000,db7aec8c,db7aeadc,0,101,deaddead,0,1829b58,0) at
netbsd:uiomove+0x62
ubc_uiomove(de64667c,db7aec8c,10000,0,101,eee4221c,db7aeb2c,c085d206,de615800,de64671c)
at netbsd:ubc_uiomove+0xeb
nfs_bioread(de64667c,db7aec8c,0,ce3a6f00,0,de64667c,db7aec2c,c053d6f4,db7aec14,de64667c)
at netbsd:nfs_bioread+0x312
nfs_read(db7aec14,de64667c,c089d3c0,de64667c,1,20001,db7aec2c,c0534d58,c089ce80,de64667c)
at netbsd:nfs_read+0x43
VOP_READ(de64667c,db7aec8c,0,ce3a6f00,d4728580,0,7aec6c,16,10000,8249400) at
netbsd:VOP_READ+0x44
vn_read(e4408600,e4408600,db7aec8c,ce3a6f00,1,0,0,0,e41be780,db7aed48) at
netbsd:vn_read+0x93
dofileread(9,e4408600,8249400,10000,e4408600,1,db7aed28,db7aed48,db7aed48,e41be780)
at netbsd:dofileread+0x75
sys_read(e41be780,db7aed10,db7aed28,7aed20,96,10,c0b4a744,9,8249400,10000) at
netbsd:sys_read+0x6f
linux_syscall(db7aed48,2b,2b,2b,2b,610,8259300,bfbeec08,9,10000) at
netbsd:linux_syscall+0x9b
db{0}>
---
Now, why this process appears to be stuck in uvn_fp2 and does not make
any progress from that point I do not know. My gut feeling is that
it's not unlikely that this process is holding a lock which makes those
other processes get stuck in "tstile" waits.
The part in uvn_findpage() which waits on uvn_fp2 appears to be this
section of code:
/* page is there, see if we need to wait on it */
if ((pg->flags & PG_BUSY) != 0) {
if (flags & UFP_NOWAIT) {
UVMHIST_LOG(ubchist, "nowait",0,0,0,0);
return 0;
}
pg->flags |= PG_WANTED;
UVMHIST_LOG(ubchist, "wait %p", pg,0,0,0);
UVM_UNLOCK_AND_WAIT(pg, &uobj->vmobjlock, 0,
"uvn_fp2", 0);
mutex_enter(&uobj->vmobjlock);
continue;
}
However, as stated, it seems that the process never wakes up from
waiting here. Race condition on pg->flags PG_WANTED setting/testing?
Or is that supposed to be covered by &uobj->vmobjlock?
I see the comment ov uvn_findpages() says uobj must be locked, but
there's no diag-assert to verify that's actually the case. How would
such a diag-assert look?
If you still want us to dig out which lock the "tstile"d processes is
hanging on, I think we still need some instructions to do that.
Regards,
- Håvard
Home |
Main Index |
Thread Index |
Old Index