Subject: Re: xdr problem
To: None <current-users@NetBSD.ORG>
From: der Mouse <mouse@Collatz.McRCIM.McGill.EDU>
List: current-users
Date: 12/03/1996 13:32:04
> I got the following problem when I'm using the xdr protocol.  It's a
> problem in the sense that NetBSD behaves in another way than all
> other Unix systems I have access to (Linux(i386), Solaris2.5(Sparc),
> AIX(rs6000), HP-UX).

This is a _good_ thing; it has allowed you to find a bug in your code.

> Run the following program to create the file "MyFile2" containing two
> integers xdr encoded. (I omited all error business for simplicity.)

>   fd = open("MyFile2", O_WRONLY|O_CREAT|O_EXCL, 0666);
>   fp = fdopen(fd, "a");
>   setbuf(fp, rawbuf);
>   xdrstdio_create(handle, fp, XDR_ENCODE);
>   xdr_int(handle, &a);
>   xdr_int(handle, &b);
>   xdr_destroy(handle);
>   fflush(fp);
>   close(fd);

> And now read the created file in:

>   fd = open("MyFile2", O_RDONLY);
>   fp = fdopen(fd, "r");
>   setbuf(fp, rawbuf); 
>   xdrstdio_create(handle, fp, XDR_DECODE);
>   xdr_int(handle, &a);
>   xdr_destroy(handle);
>   fp = fdopen(fd, "r");
>   setbuf(fp, rawbuf); 
>   xdrstdio_create(handle, fp, XDR_DECODE);
>   xdr_int(handle, &b);
>   xdr_destroy(handle);
>   printf("a: %d, b: %d\n", a, b);

> The results are:

> 	NetBSD(i386)   a: 1, b: 0
> 	Linux(i386)    a: 1, b: 2
> 	Solaris(Sparc) a: 1, b: 2
> 	AIX(RS6000)    a: 1, b: 2
> 	HP-UX          a: 1, b: 2

> O.k., one can say why you are doing "fdopen" twice.  But this should
> point out the problem only.

Doing the fdopen twice - on the same underlying file descriptor - is
_exactly_ the problem.  (Leaving off error checking on the second
xdr_int call in the second program is the second part of the problem; I
feel certain that xdr_int call is failing, and you're not noticing.)

stdio - which you are layering your XDR stream on top of - does
buffering.  The two stdio streams you are creating each reads a
bufferful from the underlying file descriptor - or rather, tries to;
the first one succeeds in reading 8 bytes, uses the first 4 bytes, and
leaves the second 4 bytes hanging around in a stdio buffer until
program exit, while the second one tries to read, fails, and presumably
passes that failure up to you, where you ignore it.  b is probably not
getting changed.

Why does it behave differently on other systems?  I don't know.
Perhaps their stdio-based XDR streams use stdio differently.  Perhaps
their stdios behave differently.

> The fdopen manual page says that the streams with mode "r" are always
> positioned at the beginning of the file.

Not quite.  The manpage you get when you say "man fdopen" does say
that, but it says it when describing fopen(), not fdopen().

> But this problem is bound to xdr as the following programs show.

> Write again two integers to the file "MyFile1"

>   fd = open("MyFile1", O_WRONLY|O_CREAT|O_EXCL, 0666);
>   write(fd, &a, sizeof(int));
>   write(fd, &b, sizeof(int));
>   close(fd);

> And read the file in

>   fd = open("MyFile1", O_RDONLY);
>   fp = fdopen(fd, "r");
>   fread(&a, sizeof(int), 1, fp);
>   fp = fdopen(fd, "r");
>   fread(&b, sizeof(int), 1, fp);
>   printf("a: %d, b: %d\n", a, b);

> And all of the above mentioned operatinmg systems behave the same:

> 	NetBSD(i386)   a: 1, b: 0
> 	Linux(i386)    a: 1, b: 0
> 	Solaris(Sparc) a: 1, b: 0
> 	AIX(RS6000)    a: 1, b: 0
> 	HP-UX          a: 1, b: 0

Right, all of them give you the "broken" behavior, presumably including
an error return from the second fread(), which you ignore.  Here, all
the stdios - not just the NetBSD one - are reading in a full bufferful
- or rather as much as is available - the first time, then (presumably)
failing the second time.

The real puzzle here is not why NetBSD's stdio XDR streams "fail", but
rather why the others "work".  And indeed, I tried your XDR test
programs on SunOS (4.1.3) and they "fail", just as NetBSD does, just as
I would have expected from all the OSes.  Perhaps the Linux, Solaris,
Aix, and HP-UX XDR packages fiddle with the stdio buffering themselves,
which would explain it.

In any case, writing code that depends on an undocumented and
unspecified detail of the implementation, as this code does, is just
asking for trouble.  I'm surprised it took you as long as it did to get
that trouble.

					der Mouse

		     7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B