tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: RFC: lseek() extensions: SEEK_HOLE / SEEK_DATA with patch



On Sun, Aug 07, 2011 at 06:52:03PM +0200, Reinoud Zandijk wrote:
 > > > i've implemented the SEEK_HOLE / SEEK_DATA additions to lseek() as
 > > > introduced by Solaris for ZFS.
 > > 
 > > What does this operation have to do with seeking? And why involve the
 > > seek pointer, especially at a time when new calls are being added left
 > > and right to cope with files accessed from more than one thread at a
 > > time in the same process?
 > 
 > One `seeks' the next hole of data block in the file from a given pos? Sounds
 > logical to me.

But you don't. You want to return the position of the hole, but
setting the current file position (seeking) to it in the file is
valueless. If you're trying to print out the ranges in a file, there's
no need to touch the seek pointer at all, and if you're trying to copy
and preserve holes, you *don't* want to seek to the hole, you want to
*read* up to the hole, which means you want to find out where it is
*without* changing the seek pointer.

Even your example code does this: it seeks repeatedly, teleporting the
seek pointer all over the place for no reason, then does a bunch of
reads.

 > The seek positions are returned by the lseek() calls.

So? A proper API for the functionality would return the positions too.

 > So yes, it changes the file pointer too but thats a side-effect

Well right! It's a completely unnecessary, useless, and in fact
actively harmful side effect, exactly the sort of thing you don't want
to do.

 > and the file pointer is not meant to be used by multiple threads
 > anyway. Just use pread()/pwrite() with mutexes. 

...precisely! You cannot safely use this API from multiple threads. A
proper API for this functionality, that didn't involve the seek
pointer for no reason, would be thread-safe.

 > A copier can thus only be:
 > 
 >     while (pos < extent) {
 >         data = lseek(fdi, pos, SEEK_DATA);
 >      hole = lseek(fdi, data, SEEK_HOLE);
 >      printf("DATA = %"PRIi64"\n", data);
 >      printf("HOLE = %"PRIi64"\n", hole);
 >      ....

So? Not is that not thread-safe, it also doesn't actually copy
anything. How about:

   struct range range;

   while (pos < len) {
      fgetrange(fd, pos, &range);
      if (range.r_type == RANGE_DATA) {
         amount = range.r_end - pos;
         if (amount > sizeof(buf)) {
            amount = sizeof(buf);
         }
         pread(fd, buf, amount, pos);
         pwrite(outfd, buf, amount, pos);
      }
      pos = range.r_end;
   }

 > And yes, it could be changed during the two calls but thats a race
 > condition anyway.

Yes THAT'S THE POINT

 > I don't think whatever interface one uses that
 > you can access a single file without careful design.

Buncombe.

 > Since its also used by Solaris and Linux (work-in-progress) 

That doesn't make it right...

-- 
David A. Holland
dholland%netbsd.org@localhost


Home | Main Index | Thread Index | Old Index