Subject: Strange PCMCIA behavior on hpcmips...
To: None <tech-kern@netbsd.org>
From: Matthew Orgass <darkstar@city-net.com>
List: tech-kern
Date: 07/29/2004 15:29:36
  I have been trying to get my updated slhci driver (a CF USB card)
working on my hpcmips machine (a clio C-1000, Vr4111).  The PCMCIA chip on
this machine is the Vadem VG-469.  Cache is VIPT 16KB I, 8KB D, direct
mapped, no L2, 16 byte line.  D cache is write-back.  ISA bus runs at
7.01MHz.  I am running the 2.0 branch.  I have run many tests, but I don't
know what to make of the results.  The driver works on my i386 machine
with LOCKDEBUG (though the rest of the USB code does not have locking) and
also does not show LOCKDEBUG problems on the Clio.  I tend to think the
problems are not a result of driver bugs, partly due to being unable to
think of any driver bugs that could cause the symptoms I see, although I
am having difficulty thinking of any problems that could cause this
behavior.  I have verified that all card access occur at the proper SPL.
Hopefully someone here will have some ideas.

  My driver configures the device to generate an interrupt every 1ms on
SOF and at the completion of every every transfer.  Each individual bus
transaction must be set up separately.  There are two sets of transfer
registers, however most of the tests I have been doing never have more
than one transfer pending at a time.

  One problem unrelated to my driver is that some CF cards do not work on
the clio, but do on my (fast) i386 machine also running the 2.0 branch.
I mention this first becaue the most serious problem I am seeing with my
driver is bad PCMCIA reads (or at least appears to be bad PCMCIA reads).
The card I have tested most frequently contains a dos file system and
works fine from CE on the Clio.  The CF card I use as the main drive of
the system works perfectly under NetBSD;  near and occasionally during
many of the tests I have done I also run md5 on a large binary (larger
than RAM size) and it has never shown any difficulties or produced an
incorrect md5, nor does it seem to have any effect on the result of the
test.  I have two cards that work and two that do not.  One of the drives
that does not work is not detected at all (last line printed is atabus at
...).  The other is detected but shows incorrect data (in particular,
indicates chs addressing rather than LBA and reports incorrect size).
When I tried a cache disabled kernel (via HPCMIPS_L1CACHE_DISABLE) this
card did not detect at all, however when tested with delay(5) on each side
of bus_space_{read,write}_1 (which for current hpcmips bus_space means all
multi, etc. functions also) I was able to mount it with a few soft errors.
I don't recall seeing error after it mounted, although I didn't do much
(just ls a directory or two).  I did not test it with smaller delays.  I
have also noticed that the two cards that work have longer delays between
the atabus line and the drive line, even on the pc.

  The main problem I am seeing is that PCMCIA reads appear to occasionally
be bad.  One of the first ways I noticed this is when testing audio:
setting the volume many times in a row to the same value via mixerctl will
sometimes show an old or new volume value different than the value being
set.  At times, register reads of status values in the slhci driver will
be incorrect, so it is not just bus data that is being corrupted.  One
kernel I tested often fails to attach the card with "unknown chip
revision", meaning the very first read was bad.  When plugging in my umass
device (a cd drive) it often fails to attach saying that the drive is
fixed.  The cd I am testing with has a number of pages of all ones (0xff)
near the beginning of the drive.  I genearlly test with the following
command: dd if=/dev/cd0a bs=1024 skip=10 count=4 | less

  The data returned has some regular errors: with the current hpcmips
bus_space it is usually one byte starting with the second byte and
repeating every 512 bytes (511 good bytes between two bad values).  The
error byte is 0 half the time or so, and arbitrary other values the rest
of the time.  Repeating the command causes the bad values to change, but
stay in the same spots.  I have also implemented a more efficient
bus_space for hpcmips that does the read or write multi loop in four
instructions.  With this bus_space, there are two bad bytes and the second
bad byte is always the same value as the first (still starting with the
second byte and repeating every 512 bytes).  If I break at __bs_rm_1
(which is called from bus_space_read_multi_1 macro) with my bus_space and
single step through reading the first several bytes then the two bytes
after I stop single stepping will be bad and the 513th and 514th bytes
from the beginning will be bad.  Note that individual
bus_space_read_multi_1 calls are for 64 bytes, since for this chip each
bus transaction must be set up separately.  There are several other
register writes and reads for each such transaction.

  If I single step through a few bytes at the beginning of read then
continue to a break point at the return of that __bs_rm_1 call the bad
bytes appear in the buffer already, so it does seem to be from the card
read (I have also tried with splhigh() in __bs_rm_1: same behavior).  One
time when I tried adding a delay(1) on each side in the read_1 and write_1
of the current bus_space, I got for the first 512 bytes one bad byte at
the second byte, but then in the second 512 bytes there were two good
bytes and then 16 bad bytes.  This made me think cache might have
something to do with it, although I can't think of any good reason how
cache could affect it (and the 16 bad bytes are not cache aligned).
Running with HPCMIPS_L1_CACHE_DISABLE caused the problem to go away, but
so did adding delay(5) in the current bus_space read_1 and write_1 (which
was estimated based on calculations of average time for the long reads
without cache from cycle counter code in the driver, although it turned
out to be longer because I had forgotten that delay runs slow on my
machine).  The second time I tried delay(1) that also caused the problem
to go away.

  Trying delay(1) again causes the first 512 byte block to be ok, but the
rest to have the single bad byte in the second byte.  This time I tested
further kernels with only the value of the delay changed.  Trying all
values from 1 to 5 results in no change.  Note that when the delay(1)
worked the delay value was the only change from the delay(5) kernel that
worked.  Trying delay(0) generally causes the same bad byte pattern, but
sometimes causes the dd to take much longer (sometimes only reading two of
the four 1k blocks), and once or twice it looked like the second 512 byte
block was correct also (although it also showed bad several times that
otherwise appeared similar, so it may just be that the bad value was
0xff).  This is delay() not DELAY(), however on hpcmips delay() is just a
function calling DELAY().  DELAY() is written in C, however the expected
code is being generated.  From cycle counting of the delay for long reset
delays it seems that delay runs 50% long.  Later VR series manuals mention
an extra branch delay, so it seems the VR4111 manual is incorrect and
branches cause an extra delay on VR4111 also.  Correcting the delay time
seems to have no effect.  When I now tested HPCMIPS_L1_CACHE_DISABLE
again, the problem appeared (I couldn't do the dd test, it repeatedly
reported being a fixed "VD-ROM").

  The only time I have seen bad reads from this chip on i386 is when I try
to draw too much power.  The specs for the card say it can do up to 100ma,
and trying to power a keyboard and mouse causes the bad reads.  However,
my umass cd drive is self powered and I don't think it should be drawing
much power.  The clio is unable to support the microdrive, but has no
trouble with my wi card that draws significantly more power than this
card's maximum.  However, perhaps the "quality" of the power is not as
good on the clio and the card is particularly sensitve.  I don't know why
the power would be so regularly affected, however.  Furthermore, the chip
revision error occurs with no device attached and the card itself uses
very little power.

  I have tried removing the ZEROWAIT pcic flag and adding the vadem
special "slow" flag (which adds extra wait states).  This seems to make no
difference.

  The other problem I am seeing occasionally is the slow scroll I
mentioned in hpcmips/26158.  On all of my recent tests I have been running
with the second patch to cpu_intr in the PR.  I don't know why it took me
so long to connect the obvious, but given the above problems it shouldn't
be surprising that interrupt problems could occur, although I can't say
that I understand precicely what is happening.  One behavior of the slow
scroll seems to me particularly strange: sometimes the scroll will come to
a complete stop and stay that way until a key is pressed.  This can be
triggered by doing "ls -R /" and then inserting my umass device.  I have
wall clocked it that way for over two minutes.  On a few occasions I have
seen it continue immediately after a kepress after being suspended for an
unknown time (but likely several minutes).  On hpcmips, there are two hard
and two soft interrupt priorities: hardclock alone is the highest hard
interrupt and softclock alone is the lowest of the soft interrupts.
Also, every ten minutes or so there is an unused compare interrupt that is
not masked so as to keep spllowersoftclock without needing a hpcmips
specific CLKF_BASEPRI.  I do not see anything that strikes me as
particularly consistant when breaking into DDB (which takes three key
presses).  It looks like holding down two keys will prevent it from coming
to a complete stop.  When I had seen the slower scroll when attaching a
uaudio device, playing audio would cause scroll to resume normal speed.

  Any ideas as to what could possibly be causing this behavior would be
appreciated.

Matthew Orgass
darkstar@city-net.com