NetBSD-Bugs archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
kern/59412: uvmpdpol_pagerealize() queue index out of bound
>Number: 59412
>Category: kern
>Synopsis: uvmpdpol_pagerealize() queue index out of bound
>Confidential: no
>Severity: serious
>Priority: medium
>Responsible: kern-bug-people
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Fri May 09 15:40:00 +0000 2025
>Originator: Manuel Bouyer
>Release: NetBSD 10.1_STABLE
>Organization:
LIP6
>Environment:
System: NetBSD ftp.lip6.fr 10.1_STABLE NetBSD 10.1_STABLE (FTP10) #8: Wed May 7 13:24:58 CEST 2025 bouyer%armandeche.soc.lip6.fr@localhost:/local/armandeche1/tmp/build/amd64/obj/local/armandeche2/netbsd-10/src/sys/arch/amd64/compile/FTP10 amd64
Architecture: x86_64
Machine: amd64
>Description:
On this heavily-loaded web server I got this panic:
login: [ 74250.5367339] uvm_fault(0xffff8d781f092b50, 0xffff8d78b6230000, 2) ->e
[ 74250.5515087] fatal page faultfatal page fault in supervisor mode
[ 74250.5592053] trap type 6 code 0x2 rip 0xffffffff8066f407 cs 0x8 rflags 0x100
[ 74250.5776047] curlwp 0xffff8d782835d8c0 pid 0.4 lowest kstack 0xffffc01b20d80
kernel: page fault trap, code=0
Stopped in pid 0.4 (system) at netbsd:uvmpdpol_pagerealize+0x3d: movq
%r12,0(%rdx,%rax,8)
uvmpdpol_pagerealize() at netbsd:uvmpdpol_pagerealize+0x3d
uvm_aio_aiodone_pages() at netbsd:uvm_aio_aiodone_pages+0x1a6
uvm_aio_aiodone() at netbsd:uvm_aio_aiodone+0xb9
dkiodone() at netbsd:dkiodone+0xb9
biointr() at netbsd:biointr+0x61
softint_dispatch() at netbsd:softint_dispatch+0x11c
DDB lost frame for netbsd:Xsoftintr+0x4c, trying 0xffffc01b20d8b0f0
Xsoftintr() at netbsd:Xsoftintr+0x4c
--- interrupt ---
0:
ds 80
es 1080
fs 2523
gs c95f
rdi ffffc000002ad980
rsi 0
rbp ffffc01b20d8ae70
rbx ffffffff80e416c0 boot_cpu.2
rdx ffff8d70b6231000
rcx ffff8d782835d8c0
rax fffffffe
r8 300000000000000
r9 8
r10 80
r11 ffffffffffff
r12 ffffc000002ad980
r13 0
r14 ffff8d7371b09500
r15 0
rip ffffffff8066f407 uvmpdpol_pagerealize+0x3d
cs 8
rflags 10282
rsp ffffc01b20d8ae50
ss 10
netbsd:uvmpdpol_pagerealize+0x3d: movq %r12,0(%rdx,%rax,8)
gdb shows that this movq would be:
ucpu->pdq[--(ucpu->pdqhead)] = pg;
and at this point, pdqhead is 0xfffffffe
This server uses a mix of old-stlye partitions (where biodone()
is called from hard interrupt context) and wedges (where biodone()
is called from soft interrupt context).
My guess is that the soft interrupt thread may have been
hard interrupted between the ucpu->pdqhead == 0 check and the actual
use of ucpu->pdqhead.
>How-To-Repeat:
If my guess is correct, this needs a heavily-loaded server with
a setup where biodone() is called from both software and hardware
interrupt context on the same CPU (such as a RAID controller
with both old-style partitions and wedges)
>Fix:
running uvmpdpol_pagerealize() at splvm() seems to fix the issue for me,
as below:
Index: uvm/uvm_pdpolicy_clock.c
===================================================================
RCS file: /cvsroot/src/sys/uvm/uvm_pdpolicy_clock.c,v
retrieving revision 1.40
diff -u -p -u -r1.40 uvm_pdpolicy_clock.c
--- uvm/uvm_pdpolicy_clock.c 12 Apr 2022 20:27:56 -0000 1.40
+++ uvm/uvm_pdpolicy_clock.c 9 May 2025 14:47:31 -0000
@@ -770,16 +770,19 @@ void
uvmpdpol_pagerealize(struct vm_page *pg)
{
struct uvm_cpu *ucpu;
+ int s;
/*
* drain the per per-CPU queue if full, then enter the page.
*/
kpreempt_disable();
+ s = splvm();
ucpu = curcpu()->ci_data.cpu_uvm;
- if (__predict_false(ucpu->pdqhead == 0)) {
+ while (__predict_false(ucpu->pdqhead == 0)) {
ucpu = uvmpdpol_flush();
}
ucpu->pdq[--(ucpu->pdqhead)] = pg;
+ splx(s);
kpreempt_enable();
}
Home |
Main Index |
Thread Index |
Old Index