Subject: Re: CPU 100% on lock on file write
To: Andrew Doran <ad@netbsd.org>
From: Daniel Carosone <dan@geek.com.au>
List: tech-kern
Date: 02/21/2007 23:39:01
--u4YTXYx8N/vaYnQM
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Wed, Feb 21, 2007 at 11:16:16AM +0000, Andrew Doran wrote:
> > I've tried to run 2 writes to a FFS1 filesystem mounted without softdep=
s:
> > dd if=3D/dev/zero of=3Dfile1 bs=3D1024 count=3D500000000&
> > dd if=3D/dev/urandom of=3Dfile2 bs=3D1024 count=3D500000000&
> >=20
> > Surprisingly, the writes are quite slow (with the drive busy at 1 to 2%)
> >=20
> > I think the 2 dd commands are busy on spin locks in the kernel.
>=20
> At a guess, a lot of that time is being soaked up by rndpool_extract_data=
().
> That will put back pressure on the big lock and keep out the second dd.

I'm not sure the problem here is the rnd device.

Rather, I'm not sure the *interesting* problem here is rnd per se.
Clearly, you're correct that one cpu is spent in rnd, with the lock
held, starving out the other dd: /dev/urandom is not a high-rate
source, and is not a good choice for filling a file full of
unpredictable junk quickly.  I don't expect its much faster even
without the other /dev/zero dd.

That's a separate known problem with urandom, and there are a number
of solutions to that which involve different urandom RNGs and periodic
reseeding from the rnd pool, rather than locking changes. They're
discussed from time to time, but noone's ever got round to
implementing them. (A better recipe for fast filling in the meantime
is a random-key cgd, or a faster userspace prng).

However, the current rnd is a good way to demonstrate the problem I
think Manuel is trying to show, because its an easy way to make one
cpu spend a lot of time in the kernel.  Fixes for rnd would be great,
and are probably low-hanging heavy fruit as far as getting it out of
the big lock - but more interesting would be to make the other dd able
to proceed in spite of whatever is hogging the kernel on the other
cpu. Certainly reading /dev/zero, and eventually as much of the file
writing path as possible as well.

Or maybe I'm reading too much into it, and Manuel really just wasn't
thinking about the relative cpu cost of /dev/urandom vs /dev/zero,
used different devices to avoid possible lock contention for the same
device, and just picked an unfortunate example.  Manuel?  (Do we need
/dev/one, /dev/two, etc as well? :)

> It would not be hard to put a spinlock into the rnd driver, and have it d=
rop
> the kernel_lock when we want to extract data from the pool. The only issue
> is that in order to avoid deadlocking against the kernel lock, the spinlo=
ck
> would need to be taken at splaudio() on x86, splsched() to be portable.
>=20
> Alternatively, some of the more expensive steps like the SHA transforms
> could be modified to run without holding any locks. I haven't looked too
> closely though.

If we're looking at reworking it, urandom should be a cloning driver;
each open should get its own prng state (pool equivalent), and use a
faster prng. At one time "something like yarrow" was the popular
choice.  That can run outside the big lock with private locks on its
own data almost entirely uncontested (funky fd-passing or contention
between several threads on the one fd notwithstanding) and an
occasional reseeding from the real pool when appropriate.

--
Dan.
--u4YTXYx8N/vaYnQM
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (NetBSD)

iD8DBQFF3D1lEAVxvV4N66cRAmLnAKDiXiPniaDWVgu2D40fqKWK5bojhACdEyuF
IvTdFdi1zNNWbDmNMPxYvw0=
=Or5H
-----END PGP SIGNATURE-----

--u4YTXYx8N/vaYnQM--