Port-vax archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: NetBSD/vax - worth continuing?



On 2016-10-19 18:50, Anders Magnusson wrote:

After lots of experimentation and playing around, I think the problem
is not related to loosing buffers. I'll try to explain my
observations, but this requires a bit of describing my setup as well,
so bear with me.
Hm, I still think it can be a reason :-)

I can't rule it out, but it feels unlikely that we would only loose buffers on one controller, and the less used one (looking at overall activity).

Having skipped ccd, and just working on disks on the first UDA-50, the
system seems to not have any problems. But when I do disk operations
on the second UDA-50, sooner or later, the process gets stuck in
biowait, and never recovers.
It may be a ring buffer sync error, or so.  Since it only occurs on the
second controller it is likely a software bug somewhere.
I do not think it is a hardware error.

Agreed. It definitely feels like a software bug. But here too I cannot rule out a hardware problem. But I think software is most probable.

But for now, this really smells as if we have some kind of issue with
additional Unibuses in NetBSD. Interesting detail is that looking at
vmsstat -i, I can see that uba0 have generated some interrupts, but
uba1 never generate any interrupts.
Interrupt count for the adapters themselves should only be counted in
case of error, see uba_dw780int_common().
Otherwise interrupts are counted against the device.

Right.

Ragge, what interrupts would the Unibus adapter generate, and does it
make sense that only one of the adapters are generating interrupts?
No, if only one adapter would interrupt then the second disk controller
wouldn't work at all.

So why is vmstat -i only showing interrupts on uba0? The interrupt counter for uba1 is at zero.

It may be a problem with having a secondary unibus adapter.  Would it be
possible to move the UDA50 to the first unibus (for testing purposes)?

Possible, but it will take some time, as I need to convince people near the machine to replumb things. I can do lots of testing around xmas, when I'm in Sweden, but for now, I only have console access, and any hardware modifications needs to be carried out by someone at Update with some spare time and skills.

If that works then we have an idea about where to search for the bug at
least :-)

True. It could be an interesting test.

...and yes, using BDPs should probably help up performance.

Performance is something we can worry about once we've figured out the actual problem causing the machine to hang.

By the way, in case it matters, the Unibuses are actually DW0 and DW2. (tr3 and tr5). DW1 (tr4) is not installed. All three Unibuses are prewired in the cabinet, but DW2 have an internal Unibus in the CPU cabinet, and DW0 is the standard Unibus in the expansion box that always exist.

Just in case that might be something that could affect things as well.

	Johnny

--
Johnny Billquist                  || "I'm on a bus
                                  ||  on a psychedelic trip
email: bqt%softjar.se@localhost             ||  Reading murder books
pdp is alive!                     ||  tryin' to stay hip" - B. Idol


Home | Main Index | Thread Index | Old Index