Subject: Re: Testers wanted: amd64 ACPI suspend/resume
To: Jared D. McNeill <jmcneill@invisible.ca>
From: Brian de Alwis <bsd@cs.ubc.ca>
List: current-users
Date: 09/29/2007 19:59:50
I gave GENERIC.MP a try on a ThinkPad T60 (2GHz Core Duo) laptop...
and successfully suspended and resumed from a single-user boot!

The kernel was always able to suspend: the little moon light came
on, and screen and fan turned off.  I had three possible outcomes
on resume: unfortunately resume wasn't always successful.
Regardless, the video was always successfully restored.

(1) On a successful resume I'd get a flashing cursor.  Hitting enter
    would show the root prompt (# ).  I could run a command here
    and, providing it had been in the cache before I suspended, it
    would run.  Otherwise the program would seem to hang.  Using
    ^T (status) showed the program was in biowait.  After about 40
    seconds, I'd get the kernel message about

	Resuming devices: mainbus0 cpu0 cpu1 ioapic0 acpi0 acpibut0 [...]

    and the machine would be fine.  Both CPUs were reported.  After
    a first successful resume, I could then suspend and resume
    multiple times with no problems.  I did once get a `wd0 soft
    error' message.

(2) There were two failure modes.  The first started the same as the
    successful case described in (1), except there would be a page
    fault while going through the "Resuming devices:" message.  The
    console looked something like:

	Resuming devices: mainbus0 cpu0 cpu1 ioapic0 acpi0 acpilid0 acpibut0 attimer1 hpet0 pcppi1 npx1 pckbc1 wskbd0 pms0 wsmouse0 pckbc2 acpiec0 acpibat0 acpicad0 apm0 pci0 pchb0 agp0 ppb0 pci1 vga1 wsdisplay0 azalia0uhci:host controoler halted
	 audio0 ppb1 pci2 wm0 ppb2 pci3 ath0 ppb3 pci4 ppb4 pci5 uhci0 uhub0 uhci1 uhub1 uhci2 uhub2 audio1 uhci3 uhub3 ehci0 uhub4 ppb5 pci6 pcib0 isa0 ispnp0 piixide0 atabus0uvm_fault(0xc0a77a40, 0, 1) -> 0xe
	kernel: supervisor trap page fault, code=0
	Stopped in pid 0.3 (system) at  netbsd:AcpiPsPopScope+0x21: movl 0x8(%eax),%ecx

    Doing a bt:

	AcpiPsPopScope(c231c1ec,cbb35b1c,c231c014,c231c00c,c0a76b20)
	AcpiPsParseLoop(c231c000,c2188480,0,c071c67a,0)
	AcpiPsParseAmI(c231c000,3,c21937c0,cbb6cbf8,16d)
	AcpiPsExecutePass(6,2194600,0,cbb35c28,cbb35c20)
	AcpiPsExecuteMethod(cbb35c28,1,cbb35bfc,c071cf50,c21b3e0c)
	AcpiNsEvaluateByHandle(cbb35c28,0,3,c0939acc,cbb41c40)
	AcpiEvAsyncExecuteGpeMethod(c21b3e0c,c0a35050,cbb41c40,c0118440,0)
	sysmon_task_queue_thread(cbb41c40,0,c01002d2,fbff,c01002d2)

(3)  I once got this message immediately on resume:

	ACPI Error (utmisc-0311): Release of non-allocated OwnerId: 23 [20060217]

    The machine was initially completely unresponsive, but suddenly
    started responding to ^T.  Jumping into DDB showed it was trying
    to do a sync.  But whatever was happening was happening very
    slowly.  I could switch between VTs, but none accepted anything
    except for VT0.  Trying to poke with the simplest commands in
    ddb caused the machine to hang.


I also tried doing the suspend/resume on a kernel with DIAGNOSTIC,
MPDEBUG, LOCKDEBUG, and MPVERBOSE.  This failed on resume with a
lock-related panic.  The console looked something
like:

    [...]
    cpu1: err0 10000<vector=0,delmode=0,masked,dest=0> 0<target-0>
     cpu1: ioapic0Mutex error: lockdebug_wantlock: locking against myself

    acpi0Block address   0x0000000c1e77848 type                     spin
    shared holds :                  0 exclusive:                 1
    shared wanted:                  0 exclusive:                 1
    current cpu  :                  1 last held:                 1
    current lwp  : 0x00000000cbb21a80 last held: 0x0000000cbb21a80
    last locked  : 0x00000000c83fd67f unlocked : 0x0000000c841aea9
    owner field  : 0x0000000000000600 wait/spin:               0/1

     acpilid0panic: LOCKDEBUG
     aStopped in pid 0.4 (system) at  netbsd.cpu_Debugger+0x4:     popl %ebp
    db{1}> bt
    cpu_Debugger
    panic
    lockdebug_abort1
    lockdebug_wantlock
    mutex_vector_enter
    idle_loop
    _prop_dictionary_keysym32_pool
    Bad frame pointer: 0xc1e77800

I wondered if this might be because of the verbose CPU detail
messages.

Brian.

On 2007.09.28 20:06:31 -0400, Jared D. McNeill wrote:
> Heyas folks --
>
> Thanks to Joerg, suspend/resume is now working properly for at least the 
> two of us on amd64 on the jmcneill-pm branch. If anybody else could test it 
> out and send us success/failure reports, that would be fantastic.
>
> Same rules as i386 apply wrt. playing with machdep.acpi_vbios_reset and 
> vbetool post as necessary.
>
> Cheers,
> Jared

-- 
  Brian de Alwis | Software Practices Lab | UBC | http://www.cs.ubc.ca/~bsd/
      "Amusement to an observing mind is study." - Benjamin Disraeli