tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: PCIVERBOSE causing kernel stack overflow during boot - why?



On Tue, 25 Oct 2016, Paul Goyette wrote:

As I mentioned earlier, I also have an "unusual" pcibus. On my machine it is bus 255, and has a whole bunch of unsupported devices, similar to what your pcictl shows.

So, I just built a GENERIC+PCIVERBOSE+DDB_COMMANDONETER kernel from HEAD and tried to boot it. sure enough, it's broken! In my case I got a "stack overflow" panic. I set command_on_enter to "bt;c" and I got a backtrace, but the c(ontinue) did not dump or reboot due to not having a working keyboard at this point. Here's the relevant portion of the backtrace:

	...
	ssp_init()
	pci_aprint_devinfo_fancy()
	pri_print() + 0x4c
	config_found()
	...

I'll have a deeper look...

Hmmm, the code at pciprint + 0x4c is actually a call to pci_devinfo() and the last few instructions in pci_devinfo are

   ...
   0xffffffff801818e5 <+831>:   callq  0xffffffff80870050 <snprintf>
   0xffffffff801818ea <+836>:   movslq %eax,%rbx
   0xffffffff801818ed <+839>:   add    %r15,%rbx
   0xffffffff801818f0 <+842>:   jmpq   0xffffffff801817c6 <pci_devinfo+544>
   0xffffffff801818f5 <+847>:   callq  0xffffffff8084806e <__stack_chk_fail>
   End of assembler dump.

Earlier in pci_devinfo() we have

   ...
   0xffffffff8018160c <+102>:   callq  *0xfed6b6(%rip)        # 0xffffffff8116ecc8 <pci_findvendor>
   ...
   0xffffffff80181623 <+125>:   callq  *0xfed697(%rip)        # 0xffffffff8116ecc0 <pci_findproduct>
   ...
   0xffffffff80181682 <+220>:   callq  0xffffffff80870050 <snprintf>
   0xffffffff80181687 <+225>:   mov    -0xd8(%rbp),%edx
   0xffffffff8018168d <+231>:   test   %edx,%edx
   0xffffffff8018168f <+233>:   jne    0xffffffff8018173b <pci_devinfo+405>
   0xffffffff80181695 <+239>:   mov    -0x38(%rbp),%rax
   0xffffffff80181699 <+243>:   xor    0x106d1e0(%rip),%rax        # 0xffffffff811ee880 <__stack_chk_guard>
   0xffffffff801816a0 <+250>:   jne    0xffffffff801818f5 <pci_devinfo+847>
   0xffffffff801816a6 <+256>:   add    $0xb8,%rsp
   0xffffffff801816ad <+263>:   pop    %rbx
   0xffffffff801816ae <+264>:   pop    %r12
   0xffffffff801816b0 <+266>:   pop    %r13
   0xffffffff801816b2 <+268>:   pop    %r14
   0xffffffff801816b4 <+270>:   pop    %r15
   0xffffffff801816b6 <+272>:   pop    %rbp
   0xffffffff801816b7 <+273>:   retq
   ...

That call to snprintf() appears to be the one located at line 615 of src/sys/dev/pci/pci_subr.c

So, my guess is that some manipulation of cp is triggering the SSP check, probably in the class/subclass/interface checks.




On Sun, 23 Oct 2016, Jaromír DoleÄ~Mek wrote:

Here is the output from lspci/pcictl.

I'll try that DDB_COMMANDONENTER also - the machine is remote though,
so I'll send it later when I get it.

Thanks.

Jaromir

2016-10-19 7:23 GMT+02:00 Paul Goyette <paul%whooppee.com@localhost>:
On Tue, 18 Oct 2016, Paul Goyette wrote:

Just as an added experiment, can you try to boot the non-PCIVERBOSE
kernel, and at the boot prompt enter

        load pciverbose

before actually booting?

As far as getting a back-trace, you could set DDB_COMMANDONENTER="bt" in
your config file ....

The dmesg looks interesting, especially with that strange pci9 bus.  My
machine has a similar "management devices" pci bus.


Also, if you have installed pkgsrc/sysutils/pciutils it would be useful to
get the output from

        lspci -tvnn

Otherwise, please provide output from following two commands:

        pcictl pci0 list -N
        pcictl pci0 list -n






On Mon, 17 Oct 2016, Jaromír DoleÄ~Mek wrote:

Hi,

I've got an amd64 system which panics with 'stack overflow detected'
on boot, somewhere halfway through probing pci9 bus, when booted with
kernel with PCIVERBOSE. Same kernel config without PCIVERBOSE boots
fine. dmesg without PCIVERBOSE is attached.

Any idea what might be causing this?

I've had cursory look at pci code, it doesn't seem as if anything
would be allocating extra space there. Maybe some interaction with
dev_verbose module code? Unfortunately can't get backtrace as this
happens before the keyboard is probed and attached.

Jaromir





+------------------+--------------------------+------------------------+
| Paul Goyette     | PGP Key fingerprint:     | E-mail addresses:      |
| (Retired)        | FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com   |
| Kernel Developer | 0786 F758 55DE 53BA 7731 | pgoyette at netbsd.org |
+------------------+--------------------------+------------------------+


+------------------+--------------------------+------------------------+
| Paul Goyette     | PGP Key fingerprint:     | E-mail addresses:      |
| (Retired)        | FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com   |
| Kernel Developer | 0786 F758 55DE 53BA 7731 | pgoyette at netbsd.org |
+------------------+--------------------------+------------------------+


!DSPAM:580ce657154322062083821!


+------------------+--------------------------+------------------------+
| Paul Goyette     | PGP Key fingerprint:     | E-mail addresses:      |
| (Retired)        | FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com   |
| Kernel Developer | 0786 F758 55DE 53BA 7731 | pgoyette at netbsd.org |
+------------------+--------------------------+------------------------+

+------------------+--------------------------+------------------------+
| Paul Goyette     | PGP Key fingerprint:     | E-mail addresses:      |
| (Retired)        | FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com   |
| Kernel Developer | 0786 F758 55DE 53BA 7731 | pgoyette at netbsd.org |
+------------------+--------------------------+------------------------+


Home | Main Index | Thread Index | Old Index