Subject: Re: corrupt METALOG files on SMP machines with UNPRIVED build
To: <>
From: David Laight <david@l8s.co.uk>
List: tech-toolchain
Date: 05/21/2002 15:15:01
On Tue, May 21, 2002 at 03:25:07PM +0200, Christian Limpach wrote:
> Hi!
> 
> I've noticed on several occasions now that an UNPRIVED build on a SMP
> machine will fail because the METALOG file gets messed up when two instances
> of install update it at the same time.  The corrupt lines will look like
> this:
> ./devel/netbsd/build/current-next6./devel/netbsd/build/current-next68k/root/
> usr/include/sys/ptrace.h type=file mode=0444
>  uname=root gname=wheel time=1021321141.0
> [...  several lines which are ok...]
> 8k/root/usr/include/sys/protosw.h type=file mode=044
> 4 uname=root gname=wheel time=985202549.0
> 
> This is with -j 3 on Linux.  The filesystem is reiserfs.  Has anybody seen
> this with NetBSD?

Well,  xinstall does flock(fileno(metafp), LOCK_EX) and won't
write to the file if it doesn't succeed..

It does seem to rely on the lock working to avoid 'corrupt' lines.
It also is using block bufferring - which must be what is causing
the errors you are seeing (I've not seen a printf() that splits
%s output - although I have seem ones that don't always buffer
printf()!)

OTOH in some cases the METALOG is written with 'echo xxx >>METALOG'
At least some shells treat this as open, 'seek to eof' - so
concurrent opens can lead to processes overwriting from the same
point!

ISTM that if the echo xxx >>METALOG lines are going to work
against concurrent xinstall requests, extreme care needs to
be taken to ensure that all writes are of multiples of lines.

But it does look as though your system has a broken flock().


	David

-- 
David Laight: david@l8s.co.uk