Current-Users archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Serious bugs in NetBSD-current, have they been fixed?



Hi Tom,

On 25/10/21 14:01, Thomas Mueller wrote:
One of these bugs relates to entropy and how it impedes building many packages in pkgsrc.

I seemed to get around this bug on one computer but not the other.

It's the old story that it's not a bug, but a feature. It's quite possible that it is "fixed" now (I'm running 9.99.79).

I'm going to assume that you want entropy that is "good enough" rather than "guaranteed" because that's going to be easier for all of us.

I think what you need to do is the following:

	1) Run some command like "ls -lR /" to generate some entropy
	2) Run "sysctl -w kern.entropy.consolidate=1"
	3) Run "/etc/rc.d/random_seed stop" to store the entropy

This is only needed on systems that don't have a built-in secure random number generator and I don't have any such systems running right now.

Rebooting a machine with "shutdown -r now" will run step 3 for you. If you are like me, then you might be in the habit of running "reboot" when setting up a new machine instead of "shutdown -r now". Doing that skips step 3 and doesn't end well.

Other bug is longer-standing and plagued me in NetBSD 8.99.51 and again in 9.99.82.

That bug causes device timeouts on some types of hard drive but not all.

Sample output is, excerpt from /var/run/dmesg.boot on the following reboot:

...
wd1d: device timeout writing fsbn 2391623176 of 2391623176-2391623199 (wd1 bn 2391623176; cn 2372642 tn 0 sn 40), xfer e0, retry 3

You haven't told us what sort of hardware this is. Drive models, motherboard chipsets, etc.

My personal experience is that this indicates a physical failure of a hard disk. I've also seen errors caused by SATA cables and SSD firmware. I have a small stack of failed HDDs and SDDs and the only reason it isn't a large stack is because I've thrown half of them away.

Try running a SMARTS tool to ask the disk what it thinks is going on. The SMARTS report is quite hard to read, but very thorough. Run "dd if=/dev/rwd1d of=/dev/null bs=32k" and see if it hangs at the same place every time. If it does, then you have a bad block on your disk.

You can also try booting into SeaTools from Seagate to run a full health check on the disk (works for non-Seagate disks).

To test the cable, just use another cable that is as different as possible from the current cable and hope for the best. SMARTS will tell you about some cable problems, but then we get into knowing how to read the report.

Cheers,
Lloyd


Home | Main Index | Thread Index | Old Index