Subject: Re: pcmcia patches in current (6/30/97)
To: Herb Peyerl <hpeyerl@beer.org>
From: Chris G. Demetriou <cgd@pa.dec.com>
List: tech-kern
Date: 07/04/1997 21:36:04
> Wolfgang Rupprecht <wolfgang@wsrcc.com>  wrote:
>  > The problem is that any network activity runs a very high risk of a
>  > panic in the vm_* code.  I can ftp out and do an ls on the server, but
>  > the laptop usually panics before the ls listing finishes.
> 
> Yes, I was having pretty bad problems with a machine that had 5 3com's 
> in it and Chris was also having problems... We (mostly chris) spent a 
> fair bit of time trying to narrow down the problem and, last I saw of
> the discussion, it seemed to be a compiler issue. 

Yah.

My testing indicated that with the do/while(0) changes in mbuf.h,
dev/ic/elink3.c (the back-end for the 'ep' driver), in particular the
function epget(), is miscompiled when the default i386 kernel
optimization, -O2, is turned on.

My data points:

	(1) when all kernel files are compiled with -O2 and with no
	    special patches applied, the kernel crashes when I try
	    to 'ssh' in or use X11 applications (in particular,
	    ghostview) with a remote display.

	(2) when all kernel files are compiled with -O2 except for
	    elink3.c compiled without -O2 (and with a few warning
	    flags disabled so some sketchy inlines don't cause
	    warnings) and with no special patches applied, the kernel 
	    works properly.

	(3) when all kernel files are compiled with -O2 and with the
	    patch below applied to mbuf.h and "#define
	    _LOSING_COMPILER" added only to elink3.c, the kernel
	    works properly.

epget() uses MGET, MGETHDR, and MCLGET, and that's what the diff below
changes to not use do/while(0).  I tried changing fewer, but that
didn't yield a kernel that was stable.  I also tested changing
elink3.c to use m_get instead of MGET and m_gethdr instead of MGETHDR,
and with that change only MCLGET needed the do/while(0) removed.  None
of the uses of the mbuf macros in elink3.c should have their behaviour
changed by use of the do/while(0).

That's why I think this is a compiler bug.  It could be something
different, but I know where I'd put my money.  8-)

When I defined _LOSING_COMPILER for all files in the kernel and
compiled with optimization, I found that elink3.o and a handful of
other object files (uipc_mbuf.o, and a few NFS objects, if i recall
correctly) differ from a non-_LOSING_COMPILER optimized build.  This
makes sense (because register allocation is likely to change in the
presense of a do/while(0)), but indicates that other functionality
which I wasn't testing (most importantly NFS) may cause similar
problems using the new macro definitions.


Anyway, I have a workaround that makes my system function properly.
I'm not even going to attempt to dig into gcc to find the problem.
(It's just not worth my time.)



chris
===================================================================
Index: mbuf.h
===================================================================
RCS file: /cvsroot/src/sys/sys/mbuf.h,v
retrieving revision 1.1.1.3
diff -c -r1.1.1.3 mbuf.h
*** mbuf.h	1997/06/09 03:43:15	1.1.1.3
--- mbuf.h	1997/07/05 04:03:41
***************
*** 41,46 ****
--- 41,54 ----
  #ifndef _SYS_MBUF_H_
  #define _SYS_MBUF_H_
  
+ #ifdef	_LOSING_COMPILER
+ #define	_MBUF_LOSING_DO
+ #define	_MBUF_LOSING_WHILE
+ #else
+ #define	_MBUF_LOSING_DO		do
+ #define	_MBUF_LOSING_WHILE	while (0)
+ #endif
+ 
  #ifndef M_WAITOK
  #include <sys/malloc.h>
  #endif
***************
*** 198,204 ****
   * If 'how' is M_WAIT, these macros (and the corresponding functions)
   * are guaranteed to return successfully.
   */
! #define	MGET(m, how, type) do { \
  	MALLOC((m), struct mbuf *, MSIZE, mbtypes[type], (how)); \
  	if (m) { \
  		MBUFLOCK(mbstat.m_mtypes[type]++;); \
--- 206,212 ----
   * If 'how' is M_WAIT, these macros (and the corresponding functions)
   * are guaranteed to return successfully.
   */
! #define	MGET(m, how, type) _MBUF_LOSING_DO { \
  	MALLOC((m), struct mbuf *, MSIZE, mbtypes[type], (how)); \
  	if (m) { \
  		MBUFLOCK(mbstat.m_mtypes[type]++;); \
***************
*** 209,217 ****
  		(m)->m_flags = 0; \
  	} else \
  		(m) = m_retry((how), (type)); \
! } while (0)
  
! #define	MGETHDR(m, how, type) do { \
  	MALLOC((m), struct mbuf *, MSIZE, mbtypes[type], (how)); \
  	if (m) { \
  		MBUFLOCK(mbstat.m_mtypes[type]++;); \
--- 217,225 ----
  		(m)->m_flags = 0; \
  	} else \
  		(m) = m_retry((how), (type)); \
! } _MBUF_LOSING_WHILE
  
! #define	MGETHDR(m, how, type) _MBUF_LOSING_DO { \
  	MALLOC((m), struct mbuf *, MSIZE, mbtypes[type], (how)); \
  	if (m) { \
  		MBUFLOCK(mbstat.m_mtypes[type]++;); \
***************
*** 222,228 ****
  		(m)->m_flags = M_PKTHDR; \
  	} else \
  		(m) = m_retryhdr((how), (type)); \
! } while (0)
  
  /*
   * Macros for tracking external storage associated with an mbuf.
--- 230,236 ----
  		(m)->m_flags = M_PKTHDR; \
  	} else \
  		(m) = m_retryhdr((how), (type)); \
! } _MBUF_LOSING_WHILE
  
  /*
   * Macros for tracking external storage associated with an mbuf.
***************
*** 280,286 ****
   * MEXTADD adds pre-allocated external storage to
   * a normal mbuf; the flag M_EXT is set upon success.
   */
! #define	MCLGET(m, how) do { \
  	MBUFLOCK( \
  		if (mclfree == 0) \
  			(void)m_clalloc(1, (how)); \
--- 288,294 ----
   * MEXTADD adds pre-allocated external storage to
   * a normal mbuf; the flag M_EXT is set upon success.
   */
! #define	MCLGET(m, how) _MBUF_LOSING_DO { \
  	MBUFLOCK( \
  		if (mclfree == 0) \
  			(void)m_clalloc(1, (how)); \
***************
*** 299,305 ****
  		(m)->m_ext.ext_arg = NULL;  \
  		MCLINITREFERENCE(m); \
  	} \
! } while (0)
  
  #define	MEXTMALLOC(m, size, how) do { \
  	(m)->m_ext.ext_buf = \
--- 307,313 ----
  		(m)->m_ext.ext_arg = NULL;  \
  		MCLINITREFERENCE(m); \
  	} \
! } _MBUF_LOSING_WHILE
  
  #define	MEXTMALLOC(m, size, how) do { \
  	(m)->m_ext.ext_buf = \