Port-alpha archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Occasional core dumps in netbsd-8



On two different Alpha systems (API CS20, AlphaServer DS25), I'm seeing occasional core dumps which don't seem to have a pattern or reason. Compiling pkgsrc packages fails every once in a great while, but restarting the same compile will succeed. It's highly doubtful both systems with ECC memory have hardware issues.


Here's a core from egrep:

Core was generated by `egrep'.
Program terminated with signal SIGSEGV, Segmentation fault.

warning: Hit heuristic-fence-post without finding enclosing function for address 0x696c2f7273752e
This warning occurs if you are debugging a function without any symbols
(for example, in a stripped executable).  In that case, you may wish to
increase the size of the search with the `set heuristic-fence-post' command.

Otherwise, you told GDB there was a function where there isn't one, or
(more likely) you have encountered a bug in GDB.
#0  0x00696c2f7273752e in ?? ()


From dirname:

Core was generated by `dirname'.
Program terminated with signal SIGSEGV, Segmentation fault.

warning: Hit heuristic-fence-post without finding enclosing function for address 0x120010f44
This warning occurs if you are debugging a function without any symbols
(for example, in a stripped executable).  In that case, you may wish to
increase the size of the search with the `set heuristic-fence-post' command.

Otherwise, you told GDB there was a function where there isn't one, or
(more likely) you have encountered a bug in GDB.
#0  0x0000000120010f44 in ?? ()


I'm also seeing strange behavior when the API CS20 first boots. For the first several minutes, something as simple as an "ls -l" of a lot of files, or a "cvs update" would cause the following errors:

ssh_dispatch_run_fatal: Connection to
2602:fe6b:1012:1:4abd:4a19:6e29:9e0d: message authentication code
incorrect

After a certain amount of traffic, the problem goes away completely.

Likewise, similar strange behavior happened when I was using a Silicon Image 3512 card in the CS20 where no disklabel could be seen on a drive, but after running "dd if=/dev/wd1c of=/dev/null bs=1m count=100", the diskabel and other data visible on the drive.

I have Tobias' fix added on both systems:
http://mail-index.netbsd.org/port-alpha/2017/07/29/msg000857.html

This doesn't seem to fix things, but I haven't seen filesystem corruption since adding that.


I doubt these things are related, but if anyone has any ideas about the causes of these problems, I'd be happy to try out anything I can.

John


Home | Main Index | Thread Index | Old Index