Subject: Re: ahc bug in current? (was: ccd changed in current?)
To: Michael L. VanLoon -- HeadCandy.com <michaelv@mindbender.serv.net>
From: None <tober@albino.ir.bbn.com>
List: current-users
Date: 08/29/1997 14:34:12
> [...] 
> To give more info on the installed hardware, here's the dmesg output
> from booting my 1.2+ kernel:
> 
> NetBSD 1.2 (MINDBENDER) #417: Tue Mar 11 21:47:49 PST 1997
>     michaelv@MindBender.serv.net:/u/src/sys/arch/i386/compile/MINDBENDER
> [...]
> ppb0 at pci0 dev 11 function 0: Digital Equipment DECchip 21050 PCI-PCI Bridge (rev. 0x02)
                                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

> pci1 at ppb0 bus 1
> ahc1 at pci1 dev 4 function 0
> [...]

While this may or may not be the problem, you should know that the DECchip
21050 is a first-generation PCI-to-PCI bridge and contains a serious design
flaw.  It is susceptible to deadlock under certain conditions.  Under certain
cirumstances where a downstream and an upstream transaction are trying to
proceed through a 21050 simultaneously, neither transaction will ever proceed.
This depends on how "aggressive" the devices involved are with respect to
retrying a transaction that encountered a temporary failure.  This happens
because the 21050 can only actually be working on one transaction (either
upstream or downstream) at once and it does not implement PCI delayed reads.
Thus, when, for example, an upstream transaction is starting but the 21050 is
in the midst of a downstream read, the 21050 will back off the
master on the subordinate bus but then be unable to finish the posted write
because the subordinate bus device is busy.  This can go on infinitely.
Whether the deadlock situation will occur is difficult to predict.  It depends
on device implementation details.  It becomes much more likely that it will
happen when the same upstream device and the same downstream device are
simultaneously attempting to master each other.  It becomes more likely with
additional loading on the buses involved.  It also becomes very much more
likely with multiple-level PCI-to-PCI bridging with the 21050 (e.g. - having
a PPB on the main PCI bus which attaches a subordinate PCI bus which itself
has a PPB on it attaching a bus which is subordinate to that one, then put
a bus-mastering device on that 2nd subordinate bus).  This deadlock situation
can readily be detected with a PCI bus analyzer, though not everyone has
access to one.  If it is readily possible to do so, I suggest trying a 
different PPB such as the DECchip 21150, which complies with the PCI 2.1
PPB specification and implements delayed reads.  Delayed reads solve the
deadlock problem present in the 21050 by allowing a single read through a
PPB to take place as two PCI transactions.  In the first transaction, the PPB
finds out what address the device wants to read but doesn't actually do the
read through the bridge at that time.  At some time when it is able to, the
bridge actually does the read on the other attached bus, storing the result
in an internal register which is paired with the register where it stored
the address before.  Finally, the original master eventually retries its
read and gets an immediate response because the bridge has stored the result.
It is possible that an optimization to the ahc driver at some point to use
DMA more could have made deadlock much more likely.
-ben