Subject: Re: bin/6485: sup seg faults trying to retrieve current-src
To: None <netbsd-bugs@netbsd.org>
From: Dave Sainty <dave@dtsp.co.nz>
List: netbsd-bugs
Date: 11/25/1998 00:27:34
dave writes:

> >Number:         6485
> >Category:       bin
> >Synopsis:       sup seg faults trying to retrieve current-src
> >Confidential:   no
> >Severity:       serious
> >Priority:       medium
> >Responsible:    bin-bug-people (Utility Bug People)
> >State:          open
> >Class:          sw-bug
> >Submitter-Id:   net
> >Arrival-Date:   Mon Nov 23 06:05:01 1998
> >Last-Modified:
> >Originator:     Dave Sainty
> >Organization:
> Dynamic Technology Services and Products Ltd
> >Release:        sup compiled from sources 16/11/98
> >Environment:
> >Description:
> 	Very repeatably sup is segfaulting when it gets to current-src.  This
> 	first happened on 18/11, the sup worked on 16/11, no sup on the 17th.

So it turned out to be a malloc() failure.  I can't see why though,
none of the system limits were ever approached, and I have to try
extremely hard to make a dent on my swap space.  So... I'm assuming
it's an old UVM bug of some kind.  The kernel is fairly old, compiled
20/8/98.

I rebooted recently (11 days ago), which I think must have been the
trigger, as previously the machine has been running perfectly, and
nothing has changed that could explain sudden memory allocation
problems.

This patch picks up on the malloc failure, and instead of the core
produces:

SUP 8.26 (4.3 BSD) for file netbsd-current.sup at Nov 24 23:47:29
SUP Upgrade of current-src at Tue Nov 24 23:47:31 1998
SUP Fileserver 8.13 (4.3 BSD) 13821 on caesar.sai.dtsp.co.nz at 23:47:31
SUP Requesting changes since Nov 16 13:28:04 1998
SUP: Error reading file list from file server
SUP: Upgrade of current-src aborted at Nov 24 23:50:35 1998
SUP: Aborted

However, it's incomplete, because Tinsert() will validly return NULL
in cases where it's conditionally inserting if a matching node does
not exist and it turns out it does.

Due to the design flaw in the Tinsert interface not allowing for
malloc() failures, possibly a better thing to do would be to exit() as
soon as the malloc/salloc fails in Tmake().  It will always end in
tears anyway. :)

Cheers,

Dave

--- src/usr.sbin/sup/source/supmsg.c.orig	Fri Aug 28 23:13:10 1998
+++ src/usr.sbin/sup/source/supmsg.c	Wed Nov 25 00:09:01 1998
@@ -336,6 +336,10 @@
 			if (x != SCMOK)  break;
 			t = Tinsert (&listT,name,TRUE);
 			free (name);
+			if (t == NULL) {
+				x = SCMERR;
+				break;
+			}
 			t->Tmode = mode;
 			t->Tflags = flags;
 			t->Tmtime = mtime;
@@ -377,6 +381,10 @@
 			if (x != SCMOK)  break;
 			t = Tinsert (&needT,name,TRUE);
 			free (name);
+			if (t == NULL) {
+				x = SCMERR;
+				break;
+			}
 			if (update)  t->Tflags |= FUPDATE;
 			x = readstring (&name);
 		}
--- src/usr.sbin/sup/source/stree.c.orig	Wed Oct  8 00:56:11 1997
+++ src/usr.sbin/sup/source/stree.c	Wed Nov 25 00:16:10 1998
@@ -97,20 +97,28 @@
 {
 	register TREE *t;
 	t = (TREE *) malloc (sizeof (TREE));
-	t->Tname = (p == NULL) ? NULL : salloc (p);
-	t->Tflags = 0;
-	t->Tuid = 0;
-	t->Tgid = 0;
-	t->Tuser = NULL;
-	t->Tgroup = NULL;
-	t->Tmode = 0;
-	t->Tctime = 0;
-	t->Tmtime = 0;
-	t->Tlink = NULL;
-	t->Texec = NULL;
-	t->Tbf = 0;
-	t->Tlo = NULL;
-	t->Thi = NULL;
+	if (t != NULL) {
+		if (p == NULL) {
+			t->Tname = NULL;
+		} else if ((t->Tname = salloc (p)) == NULL) {
+			free(t);
+			return (NULL);
+		}
+
+		t->Tflags = 0;
+		t->Tuid = 0;
+		t->Tgid = 0;
+		t->Tuser = NULL;
+		t->Tgroup = NULL;
+		t->Tmode = 0;
+		t->Tctime = 0;
+		t->Tmtime = 0;
+		t->Tlink = NULL;
+		t->Texec = NULL;
+		t->Tbf = 0;
+		t->Tlo = NULL;
+		t->Thi = NULL;
+	}
 	return (t);
 }