NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

bin/41512: Parallel cross builds randomly fail with "File Exists"

>Number:         41512
>Category:       bin
>Synopsis:       Parallel cross builds randomly fail with "File Exists"
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    bin-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Sat May 30 15:20:00 +0000 2009
>Originator:     Andreas Gustafsson
>Release:        NetBSD-current as of 2009.
System: NetBSD
Architecture: i386
Machine: i386

I am trying to cross-build NetBSD-current on a Linux host (Intel Core
2 Quad Q6600 running Ubuntu Server x86_64 8.04.29), passing "-j 6" to to take advantage of the four processor cores.  These are
fresh release builds into an empty destination directory.

I find that the builds frequently fail with nbinstall reporting a
"File exists" error as in the following:

    #   install  
    /bracket/build/2009. -U -M 
/bracket/build/2009. -D 
/bracket/build/2009. -h sha1 -N 
/bracket/build/2009. -l h -r -o root -g wheel -m 444  
    i486--netbsdelf-install: link 
/bracket/build/2009. -> 
 File exists
 Error code 1

The failing nbinstall is invoked with the option "-l h -r", causing it
to link rather than copy; to be precise it will link to a temporary
file, and then rename the temporary file to the final install target.

By adding some debug printfs to xinstall.c, I have determined that the
"File exists" error originates in the link() call in do_link() where
the source file is linked to a temporary file, not in the subsequent
rename() call renaming the temporary file to the final target.

The name of the temporary file is generated using mktemp(3).  I assume
the problem is caused by the race condition inherent in mktemp(); the
bug can happen if nbinstall processes invoked from two branches of the
parallel make happen to generate the same temporary file name.  Since
the check for file existence in mktemp() and the subsequent link() are
not a single atomic operation, the two nbinstall processes can both
generate the same file name, then both check that it does not yet
exist, and finally both attempt to link to it.

There is a comment in the source for do_link() indicating that the call
to mktemp() has been reviewed and found to be "safe", but presumably
that just means that the code is free from security holes, not that it
actually works :)


Perhaps by attempting a similar cross-build (Linux host, -j 6), but I
suspect the conditions under which mktemp() will produce filename
collisions are timing related and therefore hard to reproduce on a
different host.  Instead of reproducing the problem, its existence may
be verified through code inspection, bearing in mind that mktemp()
does not guarantee that the file names returned are unique, only that
no file of that name exists at the time of the call.


There are several ways this could be fixed.  One would be to make
do_link() in xinstall.c generate the name of the temporary file using
something like

  snprintf(tmpl, sizeof(tmpl), "%s.inst.XXXXXX", to_name);

instead of

  snprintf(tmpl, sizeof(tmpl), "%s/inst.XXXXXX", xdirname(to_name));

which would guarantee that the temporary files are unique because each
of the concurrent nbinstall processes is installing a different file.
Another would be to retry the mktemp() call when the link() fails.

Home | Main Index | Thread Index | Old Index