tech-net archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: checking m->m_pkthdr.csum_flags in ip_output()



On Fri, May 16, 2008 at 05:14:10PM -0500, David Young wrote:
> On Fri, May 16, 2008 at 03:05:03PM +0100, Patrick Welche wrote:
> > On Sun, May 04, 2008 at 12:33:05PM +0900, Takahiro Kambe wrote:
> > > Hi,
> > > 
> > > In message <20080415.203216.41648300.taca%back-street.net@localhost>
> > >   on Tue, 15 Apr 2008 20:32:16 +0900 (JST),
> > >   Takahiro Kambe <taca%back-street.net@localhost> wrote:
> > > > Today, NetBSD 4.0_STABLE machine paniced in ip_output() when
> > > > forwarding IPv4 multicast packet.  The packet was short (36 octets)
> > > > UDP/IP pakcet.
> > > ...
> > > > The kernel has DIAGNOSTIC option enabled and corresponding code
> > > > fragments in ip_output().
> > > > 
> > > > #ifdef  DIAGNOSTIC
> > > >         if ((m->m_flags & M_PKTHDR) == 0)
> > > >                 panic("ip_output: no HDR");
> > > > 
> > > >         if ((m->m_pkthdr.csum_flags & (M_CSUM_TCPv6|M_CSUM_UDPv6)) != 
> > > > 0) {
> > > >                 panic("ip_output: IPv6 checksum offload flags: %d",
> > > >                     m->m_pkthdr.csum_flags);
> > > >         }
> > > > 
> > > >         if ((m->m_pkthdr.csum_flags & (M_CSUM_TCPv4|M_CSUM_UDPv4)) ==
> > > >             (M_CSUM_TCPv4|M_CSUM_UDPv4)) {
> > > >                 panic("ip_output: conflicting checksum offload flags: 
> > > > %d",
> > > >                     m->m_pkthdr.csum_flags);
> > > >         }
> > > > #endif
> > > > 
> > > > It seems that this diagnostic code checking M_CSUM_TCPv4 and
> > > > M_CSUM_UDPv4 are exclusive one.
> > > I confirmed that bge(4) sets both M_CSUM_TCPv4 and M_CSUM_UDPv4 to
> > > m->m_pkthdr.csum_flags with usual unicast IP packets.
> > > 
> > > I don't know it is bug of bge(4) or above DIAGNOSTIC is wrong or
> > > obsolete.
> > 
> > Don't know whether relevant, but a 4.99.60/i386 box with bge gave:
> > 
> > uvm_fault(0xcdfae574, 0, 1) -> 0xe
> > kernel: supervisor trap page fault, code=0
> > Stopped in pid 22172.1 (dhcpd) at       0xc03a6f25:     movl    
> > 0x14(%eax),%eax
> > db{1}> bt/l
> > m_length(0,0,cd985abc,c0377c4f,5) at 0xc03a6f25
> > bpf_mtap(c2d822c0,0,cd985aec,c03a8f5d,cd985a05) at netbsd:bpf_mtap+0x17
> > bge_start(c2da7004,178,9000003,3,0) at netbsd:bge_start+0x10c
> > ifq_enqueue(c2da7004,c3111300,c2da7004,2,cdfae574) at 
> > netbsd:ifq_enqueue+0x13f
> > ether_output(c2da7004,c3111300,c06077a0,0,c06077a0) at 
> > netbsd:ether_output+0x71e
> > bpf_write(cdc82300,cdc82300,cd985c60,d5bf99c0,1) at netbsd:bpf_write+0x126
> > do_filewritev(7,bfbfc668,3,cdc82300,1) at netbsd:do_filewritev+0x270
> > sys_writev(cdfac900,cd985d04,cd985cfc,cd985d10,c03d0d79) at 
> > netbsd:sys_writev+0x3f
> > syscall(cd985d48,b3,ab,bfbf001f,bfbf001f) at netbsd:syscall+0x141
> > 
> > yesterday...
> 
> It looks like IFQ_POLL()/IFQ_DEQUEUE() did not honor their contract.  In
> order to reach the bpf_mtap() statement, IFQ_POLL() had to return m_head
> != NULL.  According to altq(9), "It is guaranteed that IFQ_DEQUEUE()
> immediately after IFQ_POLL() returns the same packet."
> 
> Are you using ALTQ?

No altq - also this was a kernel from 21st April - I hope I didn't
hijack Takahiro's thread - just noticed that they were both with
bge.

Cheers,

Patrick


Home | Main Index | Thread Index | Old Index