Port-vax archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: KA655 Kernel Compile and Corrupt Object Files
On 2025-05-24 18:09, Josh Moyer wrote:
Thanks all for your thoughts. The CPU (and ECC memory) really does seems fine, even under load -- including when pages are swapped back in over the network. A DMA and/or load related problem in the DEQNA or its driver seems likely to me for a few reasons:
- DEQNAs are seemingly notorious for poor behavior...
- ...indeed, I recently noticed that mine is occasionally producing this kernel message: "qe0: xmit logic died, resetting..."
- ... and, related, my reseller* told me that a minimum revision of the DEQNA (which I *did* met/exceeded for both of my DEQNAs) was required for proper interop with the KA655, suggesting that some issues were seen.
The problems with the DEQNA have nothing to do with corrupting data or
anything like that. The problem is that it can get stuck. There were
numerous firmware upgrades, but DEC never really solved it. The DEQNA
isn't built around any normal microprocessor, but have some weird
hardware design.
VMS eventually decided to stop supporting it, but the PDP-11 OSes
continued to support it, but the drivers have code to detect and recover
from a stuck controller.
The "qe0: xmit logic dies, resetting..." is something I've seen on lots
of VAXen with different controllers, so I doubt that is relevant. I
don't know what triggers the message. Never dove into it. But I'm pretty
sure my 4000/90 says the same, except ze0 or something like that.
But maybe that is related to some code that supposedly would recover a
stuck DEQNA...?
I will be double checking my original assertions and looking closely at the object files to see if their contents are recognizable at all as compared to the known good simh build. I agree with mouse's suspicion that a successful link of the hardware compiled object files would likely produce an inoperable kernel and I may test to confirm. I MIGHT also try testing with my hardware KA630, although math tells me a few weeks would be required for the compile.
It would be really interesting if you could get a build to complete, and
see what happens if you try to run it.
As for reproducible builds, while the date-times may differ, I wouldn't expect object files to change between builds (aside from maybe vers.o), given the same toolchain options and source inputs**. Am I mistaken? I will try to learn more about GCC and how it writes object files. I'll also be looking at if_qe.c. I've never worked on a compiler, nor a kernel-mode (and network) driver before, so this should be fun! Reading list suggestions appreciated.
Agreed. Which is why I was basically a bit surprised that the files do
differ. But I might be missing something?
I WOULD like to test with another NIC, but all I have is another DEQNA. I cannot acquire a DELQA(/T) at this time. Is anyone willing to loan a DELQA out? (I'm in the northwest of the US -- Seattle.) I believe the DEQNA and DELQA use the same (BA123) cabinet kit, I thought I heard once?
Yes, the cabkit is the same. But I seriously doubt that would make a
difference.
Of course, the DEQNA could be really broken, but then I wouldn't expect
things to work at all.
All of that will take time***, and I HAD hoped to prioritize some improvements to the npf parser as my debut NetBSD and current spare-time project... so I guess we'll see which task wins.
Heh. Yeah, part of the problem really is because of how slow builds have
become... :-(
Johnny
--
Johnny Billquist || "I'm on a bus
|| on a psychedelic trip
email: bqt%softjar.se@localhost || Reading murder books
pdp is alive! || tryin' to stay hip" - B. Idol
Home |
Main Index |
Thread Index |
Old Index