Subject: Re: sh reads byte by byte
To: None <tech-userlevel@netbsd.org>
From: Manuel Bouyer <bouyer@antioche.eu.org>
List: tech-userlevel
Date: 01/21/2007 23:43:03
On Sat, Jan 20, 2007 at 07:45:35PM +0000, David Laight wrote:
> On Sat, Jan 20, 2007 at 05:01:08PM +0100, Manuel Bouyer wrote:
> > Hi,
> > while trying to see how to speedup audit-packages as used by the bulk builds,
> > I noticed (using ktrace) that something like this:
> > #!/bin/sh
> > 
> > while read a b c; do
> > echo $a $b $c
> > done < /tmp/file
> > 
> > will read /tmp/file byte by byte. With the current pkg-vulnerabilities
> > this makes 221909 syscalls. Would there be a way to have it read the
> > file in a more efficient way (rewriting it in another language is not an
> > option for now) ?
> 
> Not easily, the problem is that it has to leave the fd positioned to the
> correct byte after each 'read' - since it might fork/exec some other
> process that reads from the same fd, and will expect to get the byte
> following the newline.

Sure. But I was hoping this could be written differently so that sh would
read the whole file at one and work on a memory copy, using some tricks.
but I've found none.

The problem I have is that in a bulk upload this is called for every
package, so we're doing 221909 syscalls * 6100 packages, and this takes
a lot of time (several hours on a Xeon 5130). If there was a way to get
this in some kind of table once, it would help a lot. But sh is not perl :(

-- 
Manuel Bouyer <bouyer@antioche.eu.org>
     NetBSD: 26 ans d'experience feront toujours la difference
--