Port-amiga archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: CF & DOM IDE timeouts (Was: NetBSD/amiga sysinst)

On Fri, 3 Mar 2023 at 17:12, John Klos <john%ziaspace.com@localhost> wrote:
> Hi,
> > That should be an excellent test case to see if the issue is still
> > present in HEAD and netbsd-10.
> >
> > (If the '600 _doesn't_ show the same issue with netbsd-9 then that
> > would be a different (but also useful data point :))
> I have a data point to add:
> I have a colocated Amiga 1200 (http://lilith.zia.io) which boots the
> NetBSD bootblocks and kernel from a CF card on IDE, with root set to sd0,
> which is on a Blizzard 1260's SCSI. phase5's SCSI card doesn't work with
> the NetBSD bootblocks, which is why it's done this way.
> With NetBSD 9, I could not write to the CF card at all - all attempts to
> write would lock. This meant that all updates required that I visit the
> datacenter and upgrade the kernel by mounting the CF card on another
> machine.
> I recently visited the datacenter and upgraded it to netbsd-10 from
> 9-January-2023 and tested writing to the CF card. It worked! But when I
> tried doing lots of writes, like untargzipping base.tgz on the CF, it
> eventually locked:
> [  1176.477218] autoconfiguration error: wdc0:0:0: lost interrupt
> [  1176.494223]         type: ata tc_bcount: 8704 tc_skip: 7680
> [  1176.505040] wd0a: device timeout writing fsbn 3114031 of 3114016-3114047 (wd0 bn 3729919; cn 3700 tn 5 sn 4), xfer 1f30, retry 0
> [  1486.716389] autoconfiguration error: wdc0 channel 0: reset failed for drive 0
> [  1497.766719] wdc0:0:0: wait timed out
> [  1497.780357] wd0a: device timeout writing fsbn 3114016 of 3114016-3114047 (wd0 bn 3729904; cn 3700 tn 4 sn 52), xfer 1f30, retry 1
> [  1508.287022] wdc0:0:0: wait timed out
> Ad infinitum (well, until hard reset).

OK, so netbsd-10 appears to be "better" than netbsd-9, if not fixed
for this case. At least things are moving in the right direction!
(though it may make it harder to determine when things are fixed :)

The reset printf comes from:

I wonder if:
- netbsd pokes the chipset at the wrong time/wrong way and manage to
wedge it, hence the lost interrupts and hangs
- wdc->reset() is failing to reset things
- __wdcwait_reset() is failing to read a good state (too short timeout or...?)

It might be interesting to see what a kernel with ATADEBUG shows,
though there is a possibility that outputting the debug will change
the timing enough not to trigger the issue. It may also be interesting
to see if running with polling IO rather than interrupts avoids the
issue (though the latter definitely just for testing).

I'm not sure I'd feel right trying to encourage a testing of that
nature on a col-lo datacentre machine tho' :)



wdc->reset(chp, poll);

Home | Main Index | Thread Index | Old Index