tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: NetBSD-10.0/i386 spurious SIGSEGV



On Sat, Jun 08, 2024 at 10:10:58PM -0400, Mouse wrote:
> First thing I'd look at is the userland instruction(s) around the crash
> point, maybe look at instructions starting at 0xbb610480 or something
> and then disassemble forwards looking for 0xbb610579.  In particular,
> I'd be interested in whether it's a store instruction that failed or
> whether this happened during a syscall trap.

   0xbb610570 <__gettimeofday50>:	mov    $0x1a2,%eax
   0xbb610575 <__gettimeofday50+5>:	int    $0x80
   0xbb610577 <__gettimeofday50+7>:	jb     0xbb61057a <__gettimeofday50+10>
=> 0xbb610579 <__gettimeofday50+9>:	ret  

> Are all the failures in __gettimeofday50?  All in trap-to-the-kernel
> calls?

I have seen many crashes on system call returns. Another one on
__gettimeofday50:

   0xbb610570 <__gettimeofday50>:	mov    $0x1a2,%eax
   0xbb610575 <__gettimeofday50+5>:	int    $0x80
   0xbb610577 <__gettimeofday50+7>:	jb     0xbb61057a <__gettimeofday50+10>
   0xbb610579 <__gettimeofday50+9>:	ret    
=> 0xbb61057a <__gettimeofday50+10>:	push   %ebx

Another one:
   0xbb610570 <__gettimeofday50>:	mov    $0x1a2,%eax
   0xbb610575 <__gettimeofday50+5>:	int    $0x80
=> 0xbb610577 <__gettimeofday50+7>:	jb     0xbb61057a <__gettimeofday50+10>
   0xbb610579 <__gettimeofday50+9>:	ret  

At once I thought about a stack problem, but I think the last one proves
this is not the case. This one involves no memory access.

> You say "multiple machines"; are those multiple domUs on a single dom0,
> or are they spread across multiple underlying hardware machines? 

It happens on multiple hardware machines and starts on upgrading the 
domU. I even tested moving a domU from one machine to another one 
and the bug folllowed. Other netbsd-9 domU on the same dom0 have
no problem, or at least it is rare enough that I did not notice
for years.

> If the latter, how similar are those underlying machines? 

Same model:
vcpu3: Intel(R) Xeon(R) CPU E3-1220 v6 @ 3.00GHz, id 0x906e9


-- 
Emmanuel Dreyfus
manu%netbsd.org@localhost


Home | Main Index | Thread Index | Old Index