Port-sparc64 archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Netbsd v 8.0 fails to boot on spare ultra 45



On Sun, 23 Dec 2018, Michael L. Hitch wrote:

On Sat, 22 Dec 2018, Jerome Ibanes wrote:

Could you try a GENERIC.UP kernel and/or disable the audio driver?

Thank you,
Jerome

I haven't tried GENERIC.MP, but I have effectively disabled the audio driver (forced the code to never match - quicker than removing it from the config and rebujilding the entire kernel). It is able to boot after that, although I didn't run that kernel very long.

  Oops - I meant GENERIC.UP.  None of those worked either.

 I was able to track a specific commit of audio.c that starts crashing.

/*      $NetBSD: audio.c,v 1.335 2017/05/06 00:13:25 nat Exp $  */
 This one boots.

/*      $NetBSD: audio.c,v 1.337 2017/05/08 07:31:34 martin Exp $       */
 This one crashes with SIR.

I can't see any reason why that change would cause this. I'm starting to suspect the problem is acually elsewhere and that one change just causes the real problem to consistently occur.

I was trying to see how far the audio attach got on a current netbsd-8 tree, and got to a point where kmem_zalloc() is called, but never returned. With the tree I used to locate the audio.c commuit that fails, it appears to get much further in the audio attach code before is fails.

  Heading down that rabbit hole.....

I replaced the 1.337 version of audio.c with the 1.335 and found that the kernel again would boot - but when I enabled AUDIO_DEBUG to get more information, it would crash with SIR again.

One other possibly related problem is that a few times I had sshd fault on startup on boot. I wasn't paying much attention to which kernel that was at the time.

I started running native builds and getting segment violations, and worked my way to earlier versions of the tree and found that the problems started when gcc was switched to 5.3 in May 2016. I even managed to hit a kernel that got the SIR when using gcc 5.3. Building that with gcc 4.8 worked fine.

Then I decided to try DDB with the failing kernel. After some memory refreshes of sparc64 assembly (I was rather rusty with sparc64), I noticed one difference between a working kernel and a crashing kernel. The stack pointer was somewhat lower in the crashing kernel and it looked like the new gcc used more stack space than gcc 4.8 did. I suspected at that point that perhaps the kernel stack was overwriting the pcb. I changed the stack size (SSIZE in param.h) to 4 pages initially and the kernels would now boot. I did drop it to 3 pages and still was ok.

Now I had kernels that would boot consistently, but I was still having problems running native builds. I had rememebered looking at the port-sparc64 mailing list from that time, when sparc64 reverted back to gcc 4.8 and a message about a problem. When I tracked down the change to fix that, I realized the the fix was to ld.elf_so, which was not in the 7.0 release that I was running all this on. One of my builds had a ld.elf_so that included the fix, and once I updated ld.elf_so, I was able to run complete native builds with no problems. I ran that with both 8.0_STABLE and current kernels.

  So it looks like a larger kernel stack is now needed for sparc64.

Mike

---
Michael L. Hitch                        mhitch%montana.edu@localhost
Operations Consulting,  University Information Technology
Montana State University, Bozeman, MT     USA


Home | Main Index | Thread Index | Old Index