Subject: Re: kern/22002: panic: double tcp_freeq() may happen - TAILQ_*
To: None <enami@but-b.or.jp>
From: Havard Eidnes <he@netbsd.org>
List: netbsd-bugs
Date: 06/28/2003 01:01:34
> > However, contrary to what tcp_freeq()
> > expect, it appears that the segq and timeq queues are of different
> > length:
>
> Depending on where the pool_do_put + 0x35e is, the page which this tp
> was allocated from may be allready back to system and reuse for
> another use.
Hm, yes. pool_do_put+0x35e is pretty near the end, at:
(gdb) list *pool_do_put+0x35e
0xc016c00a is in pool_do_put (/sys/arch/i386/compile/BLACKHOLE/../../../../kern/subr_pool.c:991).
986 * the list, and make it the current page. The next allocation
987 * will get the item from this page, instead of further fragmenting
988 * the pool.
989 */
990 else if (ph->ph_nmissing == (pp->pr_itemsperpage - 1)) {
991 TAILQ_REMOVE(&pp->pr_pagelist, ph, ph_pagelist);
992 TAILQ_INSERT_HEAD(&pp->pr_pagelist, ph, ph_pagelist);
993 pp->pr_curpage = ph;
994 }
995 }
(gdb) i line subr_pool.c:991
Line 991 of "/sys/arch/i386/compile/BLACKHOLE/../../../../kern/subr_pool.c"
starts at address 0xc016c008 <pool_do_put+860>
and ends at 0xc016c067 <pool_do_put+955>.
(gdb) p 0x35e
$5 = 862
(gdb)
> Is the rest of *tp looks valid?
I think it doesn't look like garbage, anyway:
(gdb) up 10
#10 0xc01bf150 in tcp_freeq (tp=0xc089f164)
at /sys/arch/i386/compile/BLACKHOLE/../../../../netinet/tcp_subr.c:1181
1181 TAILQ_REMOVE(&tp->segq, qe, ipqe_q);
(gdb) p tp
$1 = (struct tcpcb *) 0xc089f164
(gdb) p *$
$2 = {t_family = 0, segq = {tqh_first = 0xc0867b04, tqh_last = 0xc089f168},
t_timer = {{c_link = {tqe_next = 0x1, tqe_prev = 0x1}, c_time = 853258484,
c_arg = 0xc089f164, c_func = 0, c_flags = 0}, {c_link = {tqe_next = 0x0,
tqe_prev = 0x0}, c_time = 0, c_arg = 0x0, c_func = 0, c_flags = 0}, {
c_link = {tqe_next = 0x1, tqe_prev = 0x1}, c_time = 853978385,
c_arg = 0xc089f164, c_func = 0, c_flags = 0}, {c_link = {tqe_next = 0x0,
tqe_prev = 0x0}, c_time = 0, c_arg = 0x0, c_func = 0, c_flags = 0}},
t_state = 8, t_rxtshift = 0, t_rxtcur = 2, t_dupacks = 0, t_peermss = 1460,
t_ourmss = 1460, t_segsz = 512, t_force = 0 '\000', t_flags = 16384,
t_template = 0x0, t_inpcb = 0xc0b796c8, t_in6pcb = 0x0, t_delack_ch = {
c_link = {tqe_next = 0x1, tqe_prev = 0x1}, c_time = 853258404,
c_arg = 0xc089f164, c_func = 0, c_flags = 0}, snd_una = 2911700198,
snd_nxt = 2911700198, snd_up = 2911700197, snd_wl1 = 414097017,
snd_wl2 = 2911700197, iss = 2911700161, snd_wnd = 17485,
snd_recover = 2911700198, rcv_wnd = 17520, rcv_nxt = 414097018,
rcv_up = 414097017, irs = 414096871, rcv_adv = 414114538,
snd_max = 2911700198, snd_cwnd = 2048, snd_ssthresh = 376539136,
t_rcvtime = 17065168, t_rtttime = 0, t_rtseq = 2911700197, t_srtt = 0,
t_rttvar = 0, t_rttmin = 2, max_sndwnd = 17520, t_oobflags = 0 '\000',
t_iobc = 0 '\000', t_softerror = 0, snd_scale = 0 '\000',
rcv_scale = 0 '\000', request_r_scale = 0 '\000',
requested_s_scale = 0 '\000', ts_recent = 0, ts_recent_age = 0,
ts_timebase = 17065168, last_ack_sent = 414097018, timeq = {tqh_first = 0x0,
tqh_last = 0xc089f294}, t_sc = {lh_first = 0x0}}
(gdb)
> > (gdb) up 10
> > #10 0xc01bf150 in tcp_freeq (tp=0xc089f164)
> > at /sys/arch/i386/compile/BLACKHOLE/../../../../netinet/tcp_subr.c:1181
> > 1181 TAILQ_REMOVE(&tp->segq, qe, ipqe_q);
> > (gdb) p tp
> > $1 = (struct tcpcb *) 0xc089f164
> > (gdb) p tp->segq
> > $2 = {tqh_first = 0xc0867b04, tqh_last = 0xc089f168}
>
> tp->segq is already inconsitient, since tqh_last points
> &tp->segq.tqh_first and this means this tailq is empty but tqh_first
> disagrees.
Yep.
> > (gdb) p tp->timeq
> > $6 = {tqh_first = 0x0, tqh_last = 0xc089f294}
>
> This tqh_last should be &tp->timeq.tqh_first, but is it so? For usual
> sizeof(struct tcpcb), the value is beyond the struct.
No, I don't quite think so -- I think this one is consistent, but I
included too little information to show that:
(gdb) p tp->timeq
$3 = {tqh_first = 0x0, tqh_last = 0xc089f294}
(gdb) p &$.tqh_first
$4 = (struct ipqent **) 0xc089f294
(gdb)
which seems quite normal for a tailq.
> > > >Fix:
> > > Explicitly mark queues as empty when they have been released
> > > in tcp_freeq()?
>
> Mark inp_ppcb NULL a bit earlier so that tcp_drain won't pick up this?
Yah, that sounds like a reasonable plan. I guess the problem is that
the storage is released before all possible references to the storage
are gone.
Regards,
- Havard