NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: port-i386/40449: disk errors after ACPI suspend/resume



On Wed, Jan 21, 2009 at 06:00:00PM +0000, apb%cequrux.com@localhost wrote:
> >Number:         40449
> >Category:       port-i386
> >Synopsis:       disk errors after ACPI suspend/resume
> >Confidential:   no
> >Severity:       serious
> >Priority:       high
> >Responsible:    port-i386-maintainer
> >State:          open
> >Class:          sw-bug
> >Submitter-Id:   net
> >Arrival-Date:   Wed Jan 21 18:00:00 +0000 2009
> >Originator:     Alan Barrett
> >Release:        NetBSD 5.99.1
> >Organization:
> Not much
> >Environment:
> System: NetBSD 5.99.10 i386
> Architecture: i386
> Machine: i386
> >Description:
> 
> If I suspend the system via sysctl -w machdep.sleep_state=3 and
> then resume, a consant stream of disk error messages appears.
> The errors look like this:
> 
> wd0e: error reading fsbn blah blah retrying
> wd0: (aborted command)
> cgd1: error 5
> 
> There are several pairs of wd0e and wd0 messages for each cgd1 message.
> The block numbers in the wd0e messages repeat a few times and then
> change.  The errors scroll past rapidly and continuously.  The only
> obvious way to recover it to power cycle the machine.
> 
> wd0 is an ordinary laptop SATA disk attached to an Intel
> 82801GBM/GHM controller (configured in the BIOS for compatibility
> mode).  Here are some config messages:
> 
>     piixide0 at pci0 dev 31 function 2
>     piixide0: Intel 82801GBM/GHM Serial ATA Controller (ICH7) (rev. 0x01)
>     piixide0: bus-master DMA support present
>     piixide0: primary channel wired to compatibility mode
>     ioapic0: int14 0x69<vector=0x69,delmode=0x0,dest=0x0> 0x0<target=0x0>
>     piixide0: primary channel interrupting at ioapic0 pin 14
>     atabus0 at piixide0 channel 0
> 
>     wd0 at atabus0 drive 0: <Hitachi HTS542520K9SA00>
>     wd0: drive supports 16-sector PIO transfers, LBA48 addressing
>     wd0: 186 GB, 387621 cyl, 16 head, 63 sec, 512 bytes/sect x 390721968 
> sectors
>     rnd: wd0 attached as an entropy source (collecting and estimating)
>     wd0: 32-bit data port
>     wd0: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 6 (Ultra/133)
>     wd0(piixide0:0:0): using PIO mode 4, Ultra-DMA mode 6 (Ultra/133) (using 
> DMA)
> 
> The disk has an MBR and a NetBSD disklabel.  wd0e is one of the disklabel
> partitions.
> 
> cgd1 used wd0e as its backing store.
> 
> >How-To-Repeat:
> suspend, then resume.

You may be able to narrow this down by using drvctl -S/-Q to
suspend/resume wd0 and its parents, beginning with wd0:

        drvctl -S wd0; drvctl -Q wd0
        drvctl -S atabus0; drvctl -Q atabus0
        drvctl -S piixide0; drvctl -Q piixide0
        drvctl -S pci0; drvctl -Q pci0

Let us see if one of those steps will reliably reproduce the problem.
If so, then it may help both to have a look at the affected devices'
PCI configuration before and after suspension/resumption, using
pcictl(8), and to look at the devices' suspend/resume routines.

It may be desirable to suspend cgd1 before suspending its backing
store.  I don't know if cgd(4) suspends and resumes, though.  Not
all disk drivers will refrain from trying to issue a read/write to
the h/w while suspended.

Dave

-- 
David Young             OJC Technologies
dyoung%ojctech.com@localhost      Urbana, IL * (217) 278-3933


Home | Main Index | Thread Index | Old Index