Subject: bin/15412: join doesn't deal with '-e' option properly
To: None <gnats-bugs@gnats.netbsd.org>
From: Duncan McEwan <duncan@mcs.vuw.ac.nz>
List: netbsd-bugs
Date: 01/29/2002 15:29:50
>Number:         15412
>Category:       bin
>Synopsis:       join doesn't deal with '-e' option properly
>Confidential:   no
>Severity:       non-critical
>Priority:       low
>Responsible:    bin-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Mon Jan 28 18:30:01 PST 2002
>Closed-Date:
>Last-Modified:
>Originator:     Duncan McEwan
>Release:        NetBSD 1.5ZA, join.c,v 1.19 2000/06/10 19:21:05
>Organization:
	Victoria University of Wellington, New Zealand
>Environment:
System: NetBSD shed11.mcs.vuw.ac.nz 1.5ZA NetBSD 1.5ZA (GEN_X) #0: Fri Jan 4 12:56:58 NZDT 2002 mark@turakirae.mcs.vuw.ac.nz:/mnt/SAVE/build.obj/sys/arch/i386/compile/GEN_X i386
Architecture: i386
Machine: i386

>Description:
	The join command with options '-a1 -a2 -eZZZ' won't output the string
	ZZZ in place of the non-existent fields in the case where a line from
	one file didn't contain a matching join field with a line from the
	other.

>How-To-Repeat:
	Given "file1" containing

	line1
	line2
	line4

	and "file2" containing

	line1
	line3

	The command "join -1 1 -2 1 -a1 -a2 -e ZZZ -o1.1,2.1 file1 file2"
	produces

	line1 line1
	line2
	line3
	line4

	On Solaris 2.8 and OSF 4.0 the same command produces what I believe
	is the correct output.

	line1 line1
	line2 ZZZ
	ZZZ line3
	line4 ZZZ

>Fix:
	There are two problems in join.c 1.19.  Firstly, the code in
	outoneline() doesn't call outfield() if olist[cnt].fileno != F->number.
	The patch below fixes this by defining a "constant" noline and
	calling outfield with it, which will always output the string contained
	in the variable "empty" (if it's not NULL).

	With this change the output produced becomes

	line1 line1
	line2 ZZZ
	line3 ZZZ
	line4 ZZZ

	which is still not quite right :-(

	This turned out to be due to the variable "input2" being initialised
	incorrectly (the "number" field was being set to 1 rather than 2).

	The patch below fixes both of these problems.

	After working all this out I decided (too late!) to check the freebsd
	cvs repository.  They have already fixed both problems and their fix
	for the first was different to mine, so you may prefer to use theirs
	for compatibility.

*** join.c.prev	Sun Jun 11 07:21:05 2000
--- join.c	Tue Jan 29 15:14:49 2002
***************
*** 76,81 ****
--- 76,83 ----
  	u_long fieldalloc;	/* line field(s) allocated count */
  } LINE;
  
+ LINE noline = {"", 0, 0, 0, 0};	/* arg to outfield if no line to output */
+ 
  typedef struct {
  	FILE *fp;		/* file descriptor */
  	u_long joinf;		/* join field (-1, -2, -j) */
***************
*** 88,94 ****
  	u_long setalloc;	/* set allocated count */
  } INPUT;
  INPUT input1 = { NULL, 0, 0, 1, NULL, -1, 0, 0, },
!       input2 = { NULL, 0, 0, 1, NULL, -1, 0, 0, };
  
  typedef struct {
  	u_long	fileno;		/* file number */
--- 90,96 ----
  	u_long setalloc;	/* set allocated count */
  } INPUT;
  INPUT input1 = { NULL, 0, 0, 1, NULL, -1, 0, 0, },
!       input2 = { NULL, 0, 0, 2, NULL, -1, 0, 0, };
  
  typedef struct {
  	u_long	fileno;		/* file number */
***************
*** 433,438 ****
--- 435,450 ----
  		for (cnt = 0; cnt < olistcnt; ++cnt) {
  			if (olist[cnt].fileno == F->number)
  				outfield(lp, olist[cnt].fieldno);
+ 			else
+ 				/*
+ 				 * because of the way "noline" is initialised
+ 				 * this call to outfield will either produce
+ 				 * no output or the contents of the variable
+ 				 * "empty" (set by the -e option).  I did it
+ 				 * this way to avoid duplicating the code
+ 				 * from outfield() here.
+ 				 */
+ 				outfield(&noline, 1);
  		}
  	else
  		for (cnt = 0; cnt < lp->fieldcnt; ++cnt)
>Release-Note:
>Audit-Trail:
>Unformatted: