Port-sparc archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: slot-based crash! not SBusFPGA (Re: CG14 in 8-bit color)

On Sat, 8 Jan 2022 at 09:54, Romain Dolbeau <romain%dolbeau.org@localhost> wrote:
> Hello,
> During the development of framebuffer emulation in my SBusFPGA, I had
> this strange bug:
> Le ven. 3 déc. 2021 à 20:29, Romain Dolbeau <romain%dolbeau.org@localhost> a écrit :
> > Neither, it doesn't crash while accessing hardware in X11, that's part
> > of why it's been a pain to diagnose. When using my own
> > re-implementation of a cg6 or a cg3 (and the cg3 is a very passive and
> > simple device), X11 works apparently fine (visually speaking), but
> > after I exit X11 the kernel will crash sooner or later (this does not
> > happen with a real cg6 of course, so it concerns literally just me! I
> > don't have a real cg3 handy to triple-check). It happens even when my
> > device is set to a lower resolution like 1024x768 or 1152x900. If I
> > deliberately use swap while running X11, it apparently gets corrupted;
> > if I use swap after leaving X11, a crash is guaranteed, and it's
> > always in 'pv_syncflags4m' IIRC.
> I finally tracked it down ... to the SBus slot, as far as I can tell !
> (I wasn't moving the real TGX+ around, it was in slot 3 while the
> SBusFPGA was in or out of slot 1).
> After emulating a cg3/cg6 with a genuine PROM (so the SW stack was
> strictly identical), my design would still crash.
> I tried single-headed by removing the 'real' TGX+, but then also moved
> the SBusFPGA to a different slot (from 1 to 2). No crash.
> Back to 1, crash again.
> Try again in 2, fine.
> In 2 with my own PROM and other devices, fine.
> Remove SBusFPGA, put the real TGX+ in slot 1 ... and crash again. With
> no SBusFPGA in the machine.
> No idea why this happens, but while the SBusFPGA with other devices
> were fine in that slot, I can't use it for a TGX+, real or emulated.
> I will have to try with a different SS20 to see if it's a design issue
> (unlikely!) or if the machine is somehow damaged.
> I did plug and unplug the SBusFPGA quite a lot, and it was the easiest
> slot to work with, so it may have taken some damage. Yet USB and the
> RAM disk from SBusFPGA seem OK in it.
> It did get me to code a hardware-based initializer and a bw2 emulated
> device, so it's not a complete waste of time (I'll probably need both
> if I ever get around to getting a NuBusFPGA made).

That's a little wild (though congratulations on finally finding it!)

I wonder what specific usage is triggering the issue - could something
be triggering a misdirected write to an incorrect location, corrupting
kernel memory, or maybe something goes awry under some form of
sustained load...

Very random thoughts (if it shows up on another ss20)
- Known hardware issue: Could slot 1 not be certified for
framebuffers? - I vaguely recall way (way) back some, possibly SunOS
4, documentation that stated that a sun framebuffer would not work in
some slots on a certain model, but it's been too long for me to
remember details
- Software issue: Does SunOS/Solaris on the box show the issue (in
case its a usage pattern that only NetBSD triggers)
- Specific trigger: It might be possible to run as a cg3 and "do less"
to see what is the minimum to trigger. eg: does it trigger if only
probed and attached as a console device, or if a delay is added in any
write to it etc. This would only be worth looking at if the first two
items yield nothing, and likely only "for curiosity"

Anyway, great catch for what must have been an almost unbelievably
frustrating issue :)


Home | Main Index | Thread Index | Old Index