Subject: pkg/18911: wget will dump core when parsing certain html files
To: None <gnats-bugs@gnats.netbsd.org>
From: None <brook@biology.nmsu.edu>
List: netbsd-bugs
Date: 11/04/2002 16:18:05
>Number:         18911
>Category:       pkg
>Synopsis:       wget will dump core when parsing certain html files
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    pkg-manager
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Sun Nov 03 19:19:00 PST 2002
>Closed-Date:
>Last-Modified:
>Originator:     
>Release:        NetBSD 1.6
>Organization:
Brook G. Milligan                      Internet:  brook@nmsu.edu
Department of Biology
New Mexico State University            Telephone:  (505) 646-7980
Las Cruces, New Mexico  88003  U.S.A.  FAX:        (505) 646-5665
>Environment:
	
	
System: NetBSD xp000477.massey.ac.nz 1.6 NetBSD 1.6 (GENERIC_LAPTOP) #0: Sun Sep 8 19:55:58 UTC 2002 autobuild@tgm.daemon.org:/autobuild/i386/OBJ/autobuild/src/sys/arch/i386/compile/GENERIC_LAPTOP i386
Architecture: i386
Machine: i386
>Description:
	When wget parses html files it allocates storage for various
buffers.  If those buffers become too short it will reallocate them
and copy the contents from the old to the new location.  This is
accomplished (at least in part) by the macro DO_REALLOC_FROM_ALLOCA in
wget.h.  When reallocating a memory block allocated by alloca, wget
calls memcpy to copy a region equal in size to the final target
region, which may be much larger than the actual valid memory to be
copied.  The copy should only be over a region the size of the
original region.  If the discrepancy in sizes between these two
regions is large enough, memcpy will try to copy locations that lie
beyond the valid memory range for the process and dump core.
>How-To-Repeat:
	Create a html file with a large attribute field (e.g., a large
quote) and exectute "wget -F -i filename".  For example, the following
works for me:
<HTML>
<HEAD>
<META
name="keywords"

content="This is a long list of keywords intended to cause some buffer
overrun (I presume).  In any case, the observable effect is that wget
1.7 dumps core for the following command:

wget -F -i this_file

There are 2021 characters within this quote.  One fewer is handled
fine (at least by wget 1.7) and does not cause a core dump.  I am not
sure what happens for wget 1.8.

0123456789 0123456789 0123456789 0123456789 0123456789
0123456789 0123456789 0123456789 0123456789 0123456789
0123456789 0123456789 0123456789 0123456789 0123456789
0123456789 0123456789 0123456789 0123456789 0123456789
0123456789 0123456789 0123456789 0123456789 0123456789
0123456789 0123456789 0123456789 0123456789 0123456789
0123456789 0123456789 0123456789 0123456789 0123456789
0123456789 0123456789 0123456789 0123456789 0123456789
0123456789 0123456789 0123456789 0123456789 0123456789
0123456789 0123456789 0123456789 0123456789 0123456789
0123456789 0123456789 0123456789 0123456789 0123456789
0123456789 0123456789 0123456789 0123456789 0123456789
0123456789 0123456789 0123456789 0123456789 0123456789
0123456789 0123456789 0123456789 0123456789 0123456789
0123456789 0123456789 0123456789 0123456789 0123456789
0123456789 0123456789 0123456789 0123456789 0123456789
0123456789 0123456789 0123456789 0123456789 0123456789
0123456789 0123456789 0123456789 0123456789 0123456789
0123456789 0123456789 0123456789 0123456789 0123456789
0123456789 0123456789 0123456789 0123456789 0123456789
0123456789 0123456789 0123456789 0123456789 0123456789
0123456789 0123456789 0123456789 0123456789 0123456789
0123456789 0123456789 0123456789 0123456789 0123456789
0123456789 0123456789 0123456789 0123456789 0123456789
0123456789 0123456789 0123456789 0123456789 0123456789
0123456789 0123456789 0123456789 0123456789 0123456789
0123456789 0123456789 0123456789 0123456789 0123456789
0123456789 0123456789 0123456789 0123456789 0123456789
0123456789 0123456789 0123456789 012

Did it work?  Did wget dump core?  Which versions dump core and which
don't?">

</HEAD>
</HTML>
>Fix:
	Apply the following patch.

$NetBSD$

--- src/wget.h.orig	Mon May 28 07:35:15 2001
+++ src/wget.h	Mon Nov  4 13:20:16 2002
@@ -230,6 +230,7 @@
 #define DO_REALLOC_FROM_ALLOCA(basevar, sizevar, needed_size, allocap, type) do	\
 {										\
   /* Avoid side-effectualness.  */						\
+  long size = (sizevar);							\
   long do_realloc_needed_size = (needed_size);					\
   long do_realloc_newsize = 0;							\
   while ((sizevar) < (do_realloc_needed_size)) {				\
@@ -245,7 +246,7 @@
       else									\
 	{									\
 	  void *drfa_new_basevar = xmalloc (do_realloc_newsize);		\
-	  memcpy (drfa_new_basevar, basevar, sizevar);				\
+	  memcpy (drfa_new_basevar, basevar, size);				\
 	  (basevar) = drfa_new_basevar;						\
 	  allocap = 0;								\
 	}									\



>Release-Note:
>Audit-Trail:
>Unformatted: