port-i386: Re: Possible bug relating to malloc()/realloc(), popen(), and read()

Subject: Re: Possible bug relating to malloc()/realloc(), popen(), and read()
To: None <port-i386@NetBSD.org>
From: Vincent Stemen <netbsd@crel.us>
List: port-i386
Date: 12/05/2004 17:22:34
On Sat, Dec 04, 2004 at 01:22:16AM -0500, der Mouse wrote:
> >> Here is your actual problem.  [...]
> 
> > Ok.  I thought it would always ready the full block_size unless it
> > ran into EOF or got an error.
> 
> No; this has never been promised for anything but plain files.  It
> would be perfectly legal - though rather odd and inefficient - for a
> read from a pipe to always return at most one byte.
> 
> > Apparently I cannot depend on that due to buffering issues when
> > reading from a network pipe as in popen.
> 
> A "network pipe"?  Pipes and network connections are actually quite
> different in recent NetBSD.  (In less recent NetBSD, they have been
> only slightly different; going back before that, before NetBSD qua
> NetBSD existed, there was a time when network connections didn't exist
> and pipes were yet a third sort of animal, one which no longer exists.
> There may even have been other sorts of pipe, I don't know.)

I refered to it was a network connection based on the following
statements in the manual for popen() in NetBSD-2.0 beta that says it
uses sockets.

    Historically, popen was implemented with a unidirectional pipe;
    ...
    Since popen is now implemented using sockets, the type may request
    a bidirectional data flow.

> The advantage of read is that it avoids stdio buffering, whereas the
> advantage of fread is that it provides stdio buffering. :-)

Yes, I came to the conclusion that using stdio would be better in this
case.  I re-wrote the routine using stdio and it seems to work great
now :-).


> >>  /* I consider NULL harmful;
> 
> > I am curious why you consider NULL harmful?
> 
> The lesser reason is that it tends to get confused with NUL, ASCII
> character code 0, which C canonically writes '\0'.  For example, I have
> actually seen code that does things like "*sp = NULL;" to terminate a
> character string - and yes, I know that's broken, that's my point:
> someone got NULL and NUL confused enough to write it.
> 
> The greater reason is that it isn't what it purports to be, which is, a
> polymorphic nil pointer (called a "null pointer" by most references, a
> term I dislike because of its spelling similarity to, and confusion
> danger with, NULL).  It is a polymorphic nil pointer in some contexts,
> yes, such as the RHS of an assignment statement where the LHS is of
> pointer type.  But it is not a polymorphic nil pointer in exactly those
> cases where a polymorphic nil pointer is most needed - those where
> there is no compile-time type available for the rvalue.  The commonest
> such case that comes to mind is an argument where no prototype in scope
> specifies a type for that argument.  (I _think_ that's the only case,
> but I'm not quite sure enough to say so outright.)
> 
> In fact, NULL is a polymorphic nil pointer in only and exactly those
> cases where 0 is also a polymorphic nil pointer.  (Indeed, one of the
> acceptable definitions for NULL is just that: 0.)  Other cases -
> perhaps the commonest is the execl() arglist terminator - require a
> cast in order to be portable to machines where int and char * are not
> the same size, where integer 0 is not the same bit pattern as a nil
> char *, or where integer 0 and nil char * are passed in different ways.
> 
> This leaves as the only benefit of NULL over 0 that it is documentation
> to human readers that the rvalue is conceptualized as a nil pointer by
> the code's author.  Given the confusion and the misuses I have seen
> result, I consider this benefit to be far outweighed by the problems
> that come with it.

Interesting points.  I seem to remember there being a push toward
using NULL rather than 0 in C a number of years ago.  It looks like it
is switching back the other way in recent years.  I have always liked
using plain 0.  I personally think it is more readable.  It seems the
compilers have finally gotten smarter about automatically casting it
for the context it is used in.  For example, a long time ago (over 10
or 12 years) I used to always use '\0' to terminate character strings
and compare for end of string.  I seem to remember getting casting
errors or warnings from the compiler if I just used 0 without casting
when assigning or comparing to anything other than an int.  However, I
have noticed now I never seem to get warnings even when using -Wall
when using 0 in place of NULL or '\0'.

I also found a comment in a posting from "Bjarne Stroustrup" (The
designer and original implementor of C++) from back in 1999 where he
said, "I use 0 and I don't recommend NULL.", when discussing pointer
comparisions.

Another interesting point though:  It looks like the Linux kernel
developers are moving from using 0 to using NULL, based on this patch
I found titled "[PATCH 494] NULL vs. 0 cleanups".

http://www.ussg.iu.edu/hypermail/linux/kernel/0410.0/0368.html


> > What I still am not clear about is:
> >     Why does it consistently read only 1024 bytes only on the first
> >     read and then always read the full specified block size on all
> >     subsequent reads.
> 
> This is almost certainly an artifact of the way the pipes popen uses
> are implemented - see below for more.
> 
> >     Why does it not do that on FreeBSD?  This is one of the things
> >     that made me wonder if there was a bug in NetBSD.
> 
> Because FreeBSD does pipes differently.  I think FreeBSD still does
> pipes as AF_LOCAL socketpairs (this is the intermediate historical
> implementation I refered to above).

Yes, I see in the manual where FreeBSD says it uses a bi-directional
pipe, whereas NetBSD says it uses sockets.


> >     Also, why does the problem not exist when I run it through the
> >     debugger (gdb).  In that case it always seems to read the full
> >     specified block size, even on the first read if I step through
> >     it.  If I let it run at normal speed through the first two reads,
> >     it still only reads 1024 bytes on the first read.  It is
> >     apparently timing related.
> 
> This does not really surprise me, though to explain it in full would
> probably require delving into things I can't really look at, such as
> the exact alignment of the buffers used in the output code of the
> process you popen()ed.  I would guess that the first write writes only
> 1024 bytes, so when running at full speed it immediately
> context-switches to your code, which finds only 1024 bytes waiting.  If
> you stop in gdb, your process stops while interacting with you, the
> writing application gets to run again, and it fills up the pipe, so
> when you get around to actually doing the read - even if only a very
> short time later on human timescales - the pipe is full and you get all
> the bytes you ask for (since you ask for less than the pipe can hold).

That's a good point.  Thanks for your analysis.  


> (The part that would require complicated explanation is why the first
> write writes only 1024 bytes - and why later writes don't, or,

Yes, that was the part that puzzled me the most, is why that behavior
was consistent.  I would think it would have been more random as to
which reads were short.  If the developers feel this is normal
behavior, I will try not to use any more bandwidth in the list on the
issue :-).  However, if you know the cause or probable cause and can
find the time to tell me through direct email or point me to some
documentation that would explain it, I would appreciate it very much.

> alternatively, why the writer gets to write twice for one read by the
> reader.)

Regards,
Vincent


-- 
Vincent Stemen
Avoid the VeriSign/Network Solutions domain registration trap!
Read how Network Solutions (NSI) was involved in stealing our domain name.
http://www.InetAddresses.net