Subject: Re: request for testers of PDPOLICY_CLOCKPRO
To: None <current-users@NetBSD.org>
From: Brian de Alwis <bsd@cs.ubc.ca>
List: current-users
Date: 03/02/2007 11:07:50
I've continued to use PDPOLICY_CLOCKPRO for several weeks, and
noticed some other anomolous behaviour in the last couple of days.
I should note that this is from the day before the newlock2 commits
(I couldn't afford for this machine to be out of whack).
I can only describe the symptoms of my first issue: I haven't been
able to trace it to any reason. I've been doing some heavy memory-use
Java work using Eclipse and lots of database activity using Apache
Derby (a Java DB). I recently noticed that killing/exiting a
high-mem-consumption Java process (e.g., 100-200MB of memory) will
lead to the system appearing to sometimes freeze. There is no disk
activity. It usually recovers after a short time (sometimes seconds,
sometimes 10s of seconds), and I've sometimes noticed a pms0 reset
message on /dev/console shortly after it revives:
pms0: resetting mouse interface
I also experienced some *very* weird issues under X11 last night,
possibly related to this mouse problem, where the system was no
longer properly tracking focus. I could use the keyboard, but the
mouse wouldn't seem to work.
I have my vm.coldtargetpct=40, and the file-cache seemed fine (about
140MB).
In the second issue, I've also had about 5 panics since my last
report. Unfortunately I haven't been able to trace them as something
is mucking up the stack:
(gdb) bt
#0 0xc04eb04c in cpu_reboot (howto=0, bootstr=0x0)
at /usr/src/sys/arch/i386/i386/machdep.c:910
#1 0xc0454d58 in panic (fmt=0xc08f6cac "trap")
at /usr/src/sys/kern/subr_prf.c:246
#2 0xc04f70de in trap (frame=0xccc955a8)
at /usr/src/sys/arch/i386/i386/trap.c:336
#3 0xc010c4b2 in calltrap ()
#4 0xc04e7300 in db_read_bytes (addr=6, size=4,
data=0xccc95614 "\bWÉÌ\bWÉÌ\004")
at /usr/src/sys/arch/i386/i386/db_memrw.c:98
#5 0xc0198e23 in db_get_value (addr=6, size=4, is_signed=0)
at /usr/src/sys/ddb/db_access.c:62
#6 0xc04e7afd in db_stack_trace_print (addr=-859220216, have_addr=1,
count=65535, modif=0xc0918ce9 "", pr=0xc0454b40 <printf>)
at /usr/src/sys/arch/i386/i386/db_trace.c:467
#7 0xc0454d2f in panic (fmt=0xc08f6cac "trap")
at /usr/src/sys/kern/subr_prf.c:235
#8 0xc04f70de in trap (frame=0xccc957ac)
at /usr/src/sys/arch/i386/i386/trap.c:336
#9 0xc010c4b2 in calltrap ()
#10 0xc07568f4 in memset ()
Previous frame inner to this frame (corrupt stack?)
I was in X, so wasn't able to get the panic message, and it wasn't
in /var/log/messages. But reconstructing the details from the
trap() call from the core, it would have been something like:
fatal page fault in supervisor mode
trap type 6 code 2 eip c07568f4 cs 8 eflags 10293
I unfortunately can't get the cr2 and ilevel values.
Going to the memset frame:
(gdb) info fr
Stack level 10, frame at 0xccc957ac:
eip = 0xc07568f4 in memset; saved eip 0xccc957ac
caller of frame at 0xccc957b0
Arglist at 0xccc957a4, args:
Locals at 0xccc957a4, Previous frame's sp is 0xccc957ac
Saved registers:
eip at 0xccc957a8
Unfortunately the first behaviour is too frustrating, so I've
reverted to a kernel with the normal paging policy. I'll see if
that exhibits the same problems.
Brian.
--
Brian de Alwis | Software Practices Lab | UBC | http://www.cs.ubc.ca/~bsd/
"Amusement to an observing mind is study." - Benjamin Disraeli