Port-ofppc archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Current hangs at boot



On 14-May-2008 Jochen Kunz wrote:
> I run this firmware by purpose: It doesn't have that very annoying 60
> second delay when netbooting...

I don't blame you.  I hate that.  And I would prefer to not make people upgrade
firmware, and just support it where possible.  Upgrading is my last resort.

> It seems to be somthing different: The B50 has only two PCI slots on
> the PCI riser card. The -150 has more slots and a PCI-PCI bridge on the
> PCI riser card.

Aha.  Well.. now thats definately a difference.

> arch/powerpc/pci/pci_machdep_ofw.c:genofw_find_node_by_devfunc()
> goes into an infinite loop if it hits the PCI-PCI bridge. The folowing
> hack breaks the loop:
> 
> Can you give me some hints to understand that code in
> pci_machdep_ofw.c for further debugging? (I am missing the big
> picture.)

Well.. I have to think about it a bit here.. but basically we start at a
specific node, maybe /.  We then look to see if the regs for that node match
the dev/bus/func we are looking for.  If they do not, we attempt to see if the
node has a child.  If so, we decend into it.  If not, we look to see if there
is a sibling/peer node.  If we find one, we look at that one.  Finally, if that
fails, we jump back up one level to the parent of the current node, and restart
the loop, probably hoping to move to the next sibling/peer node.

What might be happening here, is that no node on the tree ever happens to
return the right bus/dev/func set.  I have seen two different behaviours on
OFW, between the IBM and the Pegasos.  On one firmware, if you encounter the
last node, the next node is null, and you would bail out of the for loop.  On
the other firmware, the firmware happily loops you around back to the beginning
of the tree, where you can continue finding peers forever.

It's been a little while, but I think IBM does #2.  You can look at the OFWDUMP
code in ofwboot to see which, and how I hacked around it there.

What you need to look at to debug this, is the ofw dump for the node in
question.  Its having a problem with that ppb, so if you could provide a
.properties on that node, it would help.  According to the dump I have for a
7043-150, that node does have the proper reg data for 0/23/0, so it's odd that
it would never locate it.  Maybe also printf the node name in
genofw_find_node_by_devfunc's for loop, to see where it is looking, and make
sure the function itself is checking things properly.

It might also be useful to compile an OFWDUMP version of ofwboot and boot that
on the machine, to get yourself a full dump of the ofw.  It's usually pretty
handy.  If you do so, send me a copy.

> BTW: How do I "boot net -d" from OFW?
> I.e. how can I make the kernel entering DDB as soon as possible?

Yeah.. umm.. I did it once.  But I allways have to play around with things to
remember how I did it.  It's really derranged how the options work in IBM ofw. 
It might be something stupid like boot -d net or something.  I can never
remember.  If you figure it out, post it here.  :)

>> Yeah, thats how most of the POWER3/POWER4 series service processors
>> work.  You can power them up, but not down.  You can get back to the
>> service processor by issuing a powerdown from the OS via rtas.
> Braindead. What do I need a service processor for? Power on the machine
> when it is off. OK, that works. But as important: Power down / reset
> the machine if it is stuck at some point.

Well.. it's one step better than the older ones, that do nothing.  It's not
great, but it's not everything you would have hoped for either.  Recycling a
hung 270 is a real PITA though.

> Thanks for taking care of the -170.

Yeah.  Hang prior to banner is annoying to debug.

---
Tim Rightnour <root%garbled.net@localhost>
NetBSD: Free multi-architecture OS http://www.netbsd.org/
Genecys: Open Source 3D MMORPG: http://www.genecys.org/


Home | Main Index | Thread Index | Old Index