Subject: IPv4 Packet Processing in NetBSD (was Re: pf for NetBSD)
To: None <dyoung@pobox.com>
From: Joel Wilsson <joelw@unix.se>
List: tech-kern
Date: 11/08/2002 18:15:44
--OXfL5xGRrasGEqWY
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline

On Thu, Nov 07, 2002 at 03:17:13PM -0600, David Young wrote:
> 
> On Thu, Nov 07, 2002 at 08:53:39PM +0100, Jochen Kunz wrote:
> > On 2002.11.07 17:47 Joel Wilsson wrote:
> > 
> > > Heh, not really. I do have a good idea about how an IP packet goes
> > > through the kernel, how timeouts, fragmentation, and a few other 
> > > things work, which I didn't know before.
> > What about writing a documentation about that? ;-)
> > "IP Packet Processing in NetBSD" 
> 
>   I would love to read such a document.

Okay, you can read it, but only if you all point out the errors
I'm sure exist in this first draft ;)

It's a bit short, I think, so if you could point out anything you think
is missing that'd be great.


--OXfL5xGRrasGEqWY
Content-Type: application/x-troff-ms
Content-Disposition: attachment; filename="ipinput.ms"
Content-Transfer-Encoding: quoted-printable

=2ETL=0AIPv4 Packet Processing in NetBSD=0A=0A.AU=0AJoel Wilsson=0A=0A.AB=
=0AWhile extensive documentation is available for the 4.4BSD-Lite kernel's=
=0AIP stack, it is not freely available and not completely accurate for the=
=0ANetBSD code base. The basic design is the same, but the code has changed=
=0Aa bit over the last 8 years. This document is a brief introduction to=0A=
how IP version 4 packets are processed by the NetBSD kernel.=0A.AE=0A=0A.2C=
=0A.NH=0AA packet for you (maybe).=0A.PP=0AWe will limit the discussion to =
ethernet, IP version 4 and UDP.=0AThe ethernet network interface card (NIC)=
 driver is the first part of=0Athe kernel that sees the packet, which cause=
s an interrupt to be made.=0AWhen the kernel has time, hopefully very soon,=
 it calls the driver's=0Ainterrupt handler routine. The interrupt handler c=
opies the packet into=0Aan mbuf, which is a special data structure used a l=
ot by the networking=0Asubsystem. They are quite complicated, and have many=
 uses, but for now=0Ayou can think of the mbuf as representing a single pac=
ket, since we=0Aassume the packet is small enough to fit in a single mbuf.=
=0A=0AThe first few bytes in the packet is the basic ethernet header, which=
=0Ais defined as struct ether_header in sys/net/if_ether.h=0A.TS=0Acenter e=
xpand ;=0Ac s=0Ac c=0A| l | r | .=0AEthernet header=0AField	Size=0A_=0ASour=
ce	6 bytes=0A_=0ADestination	6 bytes=0A_=0AType	2 bytes=0A_=0A.TE=0A=0AThe =
destination address is usually the receiving NIC's MAC (media access=0Acont=
rol) address. All NICs have a unique MAC address, although it can be=0Achan=
ged.=0AWhen the interrupt handler has copied the the packet into an mbuf, i=
t=0Acalls the function ether_input. Unless the kernel has been configured t=
o=0Asupport bridging, the packet is dropped if the destination address=0Ado=
esn't match the with the NIC's address. Then, after saving the type=0Afield=
, the ethernet header is stripped off. What happens next will=0Adepend on t=
he value of that field: it tells us whether the packet is an=0AIP version 4=
 packet or something else. Other supported protocols include=0AAppleTalk, I=
PX and IP version 6.=0AIf the packet is of an unknown or unsupported type, =
it is dropped.=0AIn our case, it's an IPv4 packet, so ether_input puts it o=
n the IPv4=0Apacket queue and schedules a software interrupt.=0A=0A.NH=0AEn=
tering the IP stack.=0A.PP=0ASoftware interrupts are different from normal =
interrupts in that they=0Aare not generated by a physical event, like the c=
ompletion of a disk=0Asector read, but they are handled in almost the same =
way by the kernel.=0ASo eventually, when the kernel isn't busy with hardwar=
e interrupts, it=0Awill=0Acall the interrupt handler that ether_input decid=
ed should take care of=0Athe packet. For IPv4, that's ipintr. All ipintr do=
es is take packets=0Afrom the IPv4 packet queue and passes them on to ip_in=
put. This is where=0Athe real fun begins. First of all, ip_input makes a wh=
ole lot of checks=0Ato see if the packet is valid, most of which has someth=
ing to do with=0Athe IP header, which is defined as struct ip in sys/netine=
t/ip.h and=0Alooks like this:=0A.bp=0A.TS=0Acenter expand ;=0Ac s=0Ac c=0A|=
 l | r | .=0AIP header=0AField	Size=0A_=0AIP version	4 bits=0A_=0AHeader le=
ngth	4 bits=0A_=0AType of Service	1 byte=0A_=0ALength	2 bytes=0A_=0AIdentif=
ication	2 bytes=0A_=0AFragment offset	2 bytes=0A_=0ATime To Live	1 byte=0A_=
=0AProtocol	1 byte=0A_=0AHeader checksum	2 bytes=0A_=0ASource address	2 byt=
es=0A_=0ADestination address	2 bytes=0A_=0A.TE=0A=0AIf no IP addresses have=
 been configured, the packet can't be for us, so=0Ait's dropped. If it's sm=
aller than the minimum IP packet size, it's=0Adropped.  If it has a multica=
st reserved source address, bad header=0Achecksum, wrong IP version or isn'=
t allowed by the packet filter, it's=0Adropped. If it passes all those test=
s, it is either forwarded to another=0Ahost (if the kernel has been configu=
red for router functionality), or it=0Ais destined for this host. IP packet=
s can be fragmented, so ip_input=0Achecks if this packet is a fragment of a=
 bigger packet. If it is, we=0Alook in the queue of fragments to see if we =
have all the other fragments=0Aof that packet, and try to reassemble the fr=
agments into the original=0Apacket if we do.=0AIn any other case, the fragm=
ent is put on the fragment queue, where it=0Awill stay until it either time=
s out or all other fragments arrive, and=0Aip_input returns.=0AStill here? =
Then use the protocol field in the IP header to pass it up=0Athrough the st=
ack to the next level. The packet we received was a UDP=0Apacket, so ip_inp=
ut calls udp_input.=0A=0A.NH=0AUDP processing=0A.PP=0Audp_input looks at th=
e UDP header, which is called struct udphdr. The=0Adefinition can be found =
in sys/netinet/udp.h, but this is how it looks:=0A.TS=0Acenter expand ;=0Ac=
 s=0Ac c=0A| l | r | .=0AUDP header=0AField	Size=0A_=0ASource port	2 bytes=
=0A_=0ADestination port	2 bytes=0A_=0AUDP length	2 bytes=0A_=0AUDP checksum=
	2 bytes=0A_=0A.TE=0A=0AYou might recall that the IP header also had a leng=
th and checksum field=0Ain its header and wonder why UDP also has them. The=
 checksum here is=0Adifferent from the IP checksum, because it covers the w=
hole UDP=0Adatagram, while the IP checksum covers only the IP header. The l=
ength is=0Aincluded because there could be non-UDP data in the IP packet af=
ter this=0Adatagram.=0AAfter making sure the UDP length and checksum fields=
 are valid,=0Audp_input uses the IP header's source and destination fields =
together=0Awith the source fields in the UDP header to create source and=0A=
destination socket address structures. It then calls udp4_realinput,=0Awith=
 the socket adresses, the mbuf, and the header length as arguments.=0Audp4_=
realinput has an easy job, it just looks up which processes that=0Aare list=
ening on the socket and then calls udp4_sendup. The data=0Astructure that r=
epresents a process's binding to a particular port is=0Acalled a socket.=0A=
If the packet's destination is a multicast address, there can be more=0Atha=
n one listener, but that's usually not the case. udp4_sendup is=0Aanother v=
ery simple function; it copies the mbuf, puts the copy on the=0Asocket's re=
ceive queue, and notifies the socket that new data has=0Aarrived. If the pr=
ocess has used ioctl to indicate it wants to do=0Aasynchronous I/O on the s=
ocket, the kernel will send it a SIGIO signal.=0AThe packet can finally be =
read from userland.=0A
--OXfL5xGRrasGEqWY
Content-Type: application/postscript
Content-Disposition: attachment; filename="ipinput.ps"
Content-Transfer-Encoding: quoted-printable

%!PS-Adobe-3.0=0A%%Creator: groff version 1.16.1=0A%%CreationDate: Fri Nov =
 8 18:04:41 2002=0A%%DocumentNeededResources: font Times-Bold=0A%%+ font Ti=
mes-Italic=0A%%+ font Times-Roman=0A%%DocumentSuppliedResources: procset gr=
ops 1.16 1=0A%%Pages: 2=0A%%PageOrder: Ascend=0A%%Orientation: Portrait=0A%=
%EndComments=0A%%BeginProlog=0A%%BeginResource: procset grops 1.16 1=0A/set=
packing where{=0Apop=0Acurrentpacking=0Atrue setpacking=0A}if=0A/grops 120 =
dict dup begin=0A/SC 32 def=0A/A/show load def=0A/B{0 SC 3 -1 roll widthsho=
w}bind def=0A/C{0 exch ashow}bind def=0A/D{0 exch 0 SC 5 2 roll awidthshow}=
bind def=0A/E{0 rmoveto show}bind def=0A/F{0 rmoveto 0 SC 3 -1 roll widthsh=
ow}bind def=0A/G{0 rmoveto 0 exch ashow}bind def=0A/H{0 rmoveto 0 exch 0 SC=
 5 2 roll awidthshow}bind def=0A/I{0 exch rmoveto show}bind def=0A/J{0 exch=
 rmoveto 0 SC 3 -1 roll widthshow}bind def=0A/K{0 exch rmoveto 0 exch ashow=
}bind def=0A/L{0 exch rmoveto 0 exch 0 SC 5 2 roll awidthshow}bind def=0A/M=
{rmoveto show}bind def=0A/N{rmoveto 0 SC 3 -1 roll widthshow}bind def=0A/O{=
rmoveto 0 exch ashow}bind def=0A/P{rmoveto 0 exch 0 SC 5 2 roll awidthshow}=
bind def=0A/Q{moveto show}bind def=0A/R{moveto 0 SC 3 -1 roll widthshow}bin=
d def=0A/S{moveto 0 exch ashow}bind def=0A/T{moveto 0 exch 0 SC 5 2 roll aw=
idthshow}bind def=0A/SF{=0Afindfont exch=0A[exch dup 0 exch 0 exch neg 0 0]=
makefont=0Adup setfont=0A[exch/setfont cvx]cvx bind def=0A}bind def=0A/MF{=
=0Afindfont=0A[5 2 roll=0A0 3 1 roll=0Aneg 0 0]makefont=0Adup setfont=0A[ex=
ch/setfont cvx]cvx bind def=0A}bind def=0A/level0 0 def=0A/RES 0 def=0A/PL =
0 def=0A/LS 0 def=0A/MANUAL{=0Astatusdict begin/manualfeed true store end=
=0A}bind def=0A/PLG{=0Agsave newpath clippath pathbbox grestore=0Aexch pop =
add exch pop=0A}bind def=0A/BP{=0A/level0 save def=0A1 setlinecap=0A1 setli=
nejoin=0A72 RES div dup scale=0ALS{=0A90 rotate=0A}{=0A0 PL translate=0A}if=
else=0A1 -1 scale=0A}bind def=0A/EP{=0Alevel0 restore=0Ashowpage=0A}bind de=
f=0A/DA{=0Anewpath arcn stroke=0A}bind def=0A/SN{=0Atransform=0A.25 sub exc=
h .25 sub exch=0Around .25 add exch round .25 add exch=0Aitransform=0A}bind=
 def=0A/DL{=0ASN=0Amoveto=0ASN=0Alineto stroke=0A}bind def=0A/DC{=0Anewpath=
 0 360 arc closepath=0A}bind def=0A/TM matrix def=0A/DE{=0ATM currentmatrix=
 pop=0Atranslate scale newpath 0 0 .5 0 360 arc closepath=0ATM setmatrix=0A=
}bind def=0A/RC/rcurveto load def=0A/RL/rlineto load def=0A/ST/stroke load =
def=0A/MT/moveto load def=0A/CL/closepath load def=0A/FL{=0Acurrentgray exc=
h setgray fill setgray=0A}bind def=0A/BL/fill load def=0A/LW/setlinewidth l=
oad def=0A/RE{=0Afindfont=0Adup maxlength 1 index/FontName known not{1 add}=
if dict begin=0A{=0A1 index/FID ne{def}{pop pop}ifelse=0A}forall=0A/Encodin=
g exch def=0Adup/FontName exch def=0Acurrentdict end definefont pop=0A}bind=
 def=0A/DEFS 0 def=0A/EBEGIN{=0Amoveto=0ADEFS begin=0A}bind def=0A/EEND/end=
 load def=0A/CNT 0 def=0A/level1 0 def=0A/PBEGIN{=0A/level1 save def=0Atran=
slate=0Adiv 3 1 roll div exch scale=0Aneg exch neg exch translate=0A0 setgr=
ay=0A0 setlinecap=0A1 setlinewidth=0A0 setlinejoin=0A10 setmiterlimit=0A[]0=
 setdash=0A/setstrokeadjust where{=0Apop=0Afalse setstrokeadjust=0A}if=0A/s=
etoverprint where{=0Apop=0Afalse setoverprint=0A}if=0Anewpath=0A/CNT countd=
ictstack def=0Auserdict begin=0A/showpage{}def=0A}bind def=0A/PEND{=0Aclear=
=0Acountdictstack CNT sub{end}repeat=0Alevel1 restore=0A}bind def=0Aend def=
=0A/setpacking where{=0Apop=0Asetpacking=0A}if=0A%%EndResource=0A%%IncludeR=
esource: font Times-Bold=0A%%IncludeResource: font Times-Italic=0A%%Include=
Resource: font Times-Roman=0Agrops begin/DEFS 1 dict def DEFS begin/u{.001 =
mul}bind def end/RES 72=0Adef/PL 792 def/LS false def/ENC0[/asciicircum/asc=
iitilde/Scaron/Zcaron=0A/scaron/zcaron/Ydieresis/trademark/quotesingle/.not=
def/.notdef/.notdef=0A/.notdef/.notdef/.notdef/.notdef/.notdef/.notdef/.not=
def/.notdef/.notdef=0A/.notdef/.notdef/.notdef/.notdef/.notdef/.notdef/.not=
def/.notdef/.notdef=0A/.notdef/.notdef/space/exclam/quotedbl/numbersign/dol=
lar/percent=0A/ampersand/quoteright/parenleft/parenright/asterisk/plus/comm=
a/hyphen=0A/period/slash/zero/one/two/three/four/five/six/seven/eight/nine/=
colon=0A/semicolon/less/equal/greater/question/at/A/B/C/D/E/F/G/H/I/J/K/L/M=
/N/O=0A/P/Q/R/S/T/U/V/W/X/Y/Z/bracketleft/backslash/bracketright/circumflex=
=0A/underscore/quoteleft/a/b/c/d/e/f/g/h/i/j/k/l/m/n/o/p/q/r/s/t/u/v/w/x/y=
=0A/z/braceleft/bar/braceright/tilde/.notdef/quotesinglbase/guillemotleft=
=0A/guillemotright/bullet/florin/fraction/perthousand/dagger/daggerdbl=0A/e=
ndash/emdash/ff/fi/fl/ffi/ffl/dotlessi/dotlessj/grave/hungarumlaut=0A/dotac=
cent/breve/caron/ring/ogonek/quotedblleft/quotedblright/oe/lslash=0A/quoted=
blbase/OE/Lslash/.notdef/exclamdown/cent/sterling/currency/yen=0A/brokenbar=
/section/dieresis/copyright/ordfeminine/guilsinglleft=0A/logicalnot/minus/r=
egistered/macron/degree/plusminus/twosuperior=0A/threesuperior/acute/mu/par=
agraph/periodcentered/cedilla/onesuperior=0A/ordmasculine/guilsinglright/on=
equarter/onehalf/threequarters=0A/questiondown/Agrave/Aacute/Acircumflex/At=
ilde/Adieresis/Aring/AE=0A/Ccedilla/Egrave/Eacute/Ecircumflex/Edieresis/Igr=
ave/Iacute/Icircumflex=0A/Idieresis/Eth/Ntilde/Ograve/Oacute/Ocircumflex/Ot=
ilde/Odieresis=0A/multiply/Oslash/Ugrave/Uacute/Ucircumflex/Udieresis/Yacut=
e/Thorn=0A/germandbls/agrave/aacute/acircumflex/atilde/adieresis/aring/ae/c=
cedilla=0A/egrave/eacute/ecircumflex/edieresis/igrave/iacute/icircumflex/id=
ieresis=0A/eth/ntilde/ograve/oacute/ocircumflex/otilde/odieresis/divide/osl=
ash=0A/ugrave/uacute/ucircumflex/udieresis/yacute/thorn/ydieresis]def=0A/Ti=
mes-Roman@0 ENC0/Times-Roman RE/Times-Italic@0 ENC0/Times-Italic RE=0A/Time=
s-Bold@0 ENC0/Times-Bold RE=0A%%EndProlog=0A%%Page: 1 1=0A%%BeginPageSetup=
=0ABP=0A%%EndPageSetup=0A/F0 12/Times-Bold@0 SF(IPv4 P)199.902 123 Q(ack)-.=
12 E(et Pr)-.12 E=0A(ocessing in NetBSD)-.216 E/F1 10/Times-Italic@0 SF -.2=
5(Jo)262.985 162=0AS(el W).25 E(ilsson)-.55 E(ABSTRA)264.535 210 Q(CT)-.3 E=
/F2 10=0A/Times-Roman@0 SF .781(While e)133 237.6 R(xtensi)-.15 E 1.081 -.1=
5=0A(ve d)-.25 H .781(ocumentation is a).15 F -.25(va)-.2 G .781=0A(ilable =
for the 4.4BSD-Lite k).25 F(ernel')-.1 E 3.281(sI)-.55 G 3.281=0A(Ps)-3.281=
 G(tack,)-3.281 E .035(it is not freely a)108 249.6 R -.25(va)=0A-.2 G .035=
=0A(ilable and not completely accurate for the NetBSD code base. The basic)=
=0A.25 F 1.042(design is the same, b)108 261.6 R 1.041=0A(ut the code has c=
hanged o)-.2 F -.15(ve)-.15 G 3.541(rt).15 G 1.041=0A(he last 8 years. This=
 document will)-3.541 F -.15(ex)108 273.6 S=0A(plain ho).15 E 2.5(wI)-.25 G=
 2.5(Pp)-2.5 G(ack)-2.5 E=0A(ets are processed by the NetBSD k)-.1 E(ernel.=
)-.1 E/F3 10/Times-Bold@0=0ASF 2.5(1. A)72 309.6 R(pack)2.5 E(et f)-.1 E(or=
 y)-.25 E(ou \(maybe\).)=0A-.25 E F2 2.498 -.8(We w)97 325.2 T .899=0A(ill =
limit the discussion to ethernet, IP).8 F -.15(ve)72 337.2 S 1.37=0A(rsion =
4 and UDP).15 F 6.37(.T)-1.11 G 1.37(he ethernet netw)-6.37 F=0A1.37(ork in=
ter)-.1 F(-)-.2 E -.1(fa)72 349.2 S .004(ce card \(NIC\) dri)=0A.1 F -.15(v=
e)-.25 G 2.504(ri).15 G 2.504(st)-2.504 G .004=0A(he \214rst part of the k)=
-2.504 F(ernel)-.1 E .878(that sees the pack)=0A72 361.2 R .877(et, which c=
auses an interrupt to)-.1 F 2.152(be made.)72=0A373.2 R 2.153(When the k)7.=
153 F 2.153(ernel has time, hopefully)-.1 F=0A-.15(ve)72 385.2 S 2.057(ry s=
oon, it calls the dri).15 F -.15(ve)-.25 G=0A(r').15 E 4.557(si)-.55 G 2.05=
7(nterrupt handler)-4.557 F 2.003=0A(routine. The interrupt handler copies =
the pack)72 397.2 R(et)-.1 E=0A2.278(into an mb)72 409.2 R 2.277(uf, which =
is a special data structure)=0A-.2 F .477(used a lot by the netw)72 421.2 R=
 .478(orking subsystem. The)=0A-.1 F 2.978(ya)-.15 G(re)-2.978 E 1.96(quite=
 complicated, and ha)72=0A433.2 R 2.26 -.15(ve m)-.2 H(an).15 E 4.46(yu)-.1=
5 G 1.96(ses, b)-4.46 F=0A1.96(ut for)-.2 F(no)72 445.2 Q 3.503(wy)-.25 G 1=
.003=0A(ou can think of the mb)-3.503 F 1.004(uf as representing a)-.2 F .1=
25=0A(single pack)72 457.2 R .125(et, since we assume the pack)-.1 F .125=
=0A(et is small)-.1 F(enough to \214t in a single mb)72 469.2 Q(uf.)-.2 E=
=0A.108(The \214rst fe)72 493.2 R 2.608(wb)-.25 G .108(ytes in the pack)=0A=
-2.608 F .108(et is the basic ether)-.1 F(-)-.2 E .056(net header)72=0A505.=
2 R 2.556(,w)-.4 G .056(hich is de\214ned as struct ether_header)=0A-2.556 =
F(in sys/net/if_ether)72 517.2 Q(.h)-.55 E(Ethernet header)=0A141.285 535.2=
 Q 92.098(Field Size)110.151 547.2 R .4 LW 273.6 551.7 72=0A551.7 DL 91.968=
(Source 6)97.376 561.2 R(bytes)2.5 E 273.6 565.7 72 565.7=0ADL 73.628(Desti=
nation 6)97.376 575.2 R(bytes)2.5 E 273.6 579.7 72 579.7=0ADL -.8(Ty)97.376=
 589.2 S 99.988(pe 2).8 F(bytes)2.5 E 273.6 593.7 72=0A593.7 DL 273.6 551.7=
 273.6 593.7 DL 181.55 551.7 181.55 593.7 DL 72=0A551.7 72 593.7 DL 2.072(T=
he destination address is usually the recei)72=0A621.2 R(ving)-.25 E(NIC')7=
2 633.2 Q 4.182(sM)-.55 G 2.482 -.4(AC \()=0A-4.182 H 1.682(media access co=
ntrol\) address. All).4 F .123(NICs ha)72=0A645.2 R .423 -.15(ve a u)-.2 H =
.124(nique MA).15 F 2.624(Ca)-.4 G .124=0A(ddress, although it can)-2.624 F=
 4.164(be changed.)72 657.2 R 4.163=0A(When the interrupt handler has)9.164=
 F 1.942(copied the the pack)72=0A669.2 R 1.942(et into an mb)-.1 F 1.943(u=
f, it calls the)-.2 F 1.447=0A(function ether_input. Unless the k)72 681.2 =
R 1.446(ernel has been)-.1=0AF 4.128(con\214gured to support bridging, the =
pack)72 693.2 R 4.129=0A(et is)-.1 F 1.32(dropped if the destination addres=
s doesn')72 705.2 R=0A3.82(tm)-.18 G(atch)-3.82 E .437(the with the NIC')72=
 717.2 R 2.937(sa)=0A-.55 G .438(ddress. Then, after sa)-2.937 F .438(ving =
the)-.2 F 3.246=0A(type \214eld, the ethernet header is stripped of)72 729.=
2 R(f.)-.25 E=0A1.712(What happens ne)302.4 309.6 R 1.713(xt will depend on=
 the v)-.15 F=0A1.713(alue of)-.25 F 1.449(that \214eld: it tells us whethe=
r the pack)=0A302.4 321.6 R 1.449(et is an IP)-.1 F -.15(ve)302.4 333.6 S 2=
.252=0A(rsion 4 pack).15 F 2.253(et or something else. Other sup-)-.1 F 1.7=
47=0A(ported protocols include AppleT)302.4 345.6 R 1.746(alk, IPX and IP)-=
.8=0AF -.15(ve)302.4 357.6 S 2.798(rsion 6.).15 F 2.799(If the pack)7.799 F=
=0A2.799(et is of an unkno)-.1 F 2.799(wn or)-.25 F 1.105=0A(unsupported ty=
pe, it is dropped.)302.4 369.6 R 1.105(In our case, it')=0A6.105 F(s)-.55 E=
 .498(an IPv4 pack)302.4 381.6 R .499=0A(et, so ether_input puts it on the =
IPv4)-.1 F(pack)302.4 393.6 Q=0A(et queue and schedules a softw)-.1 E(are i=
nterrupt.)-.1 E F3 2.5=0A(2. Entering)302.4 429.6 R(the IP stack.)2.5 E F2(=
Softw)327.4 445.2 Q=0A1.496(are interrupts are dif)-.1 F 1.496(ferent from =
nor)-.25 F(-)-.2 E=0A.846(mal interrupts in that the)302.4 457.2 R 3.347(ya=
)-.15 G .847=0A(re not generated by a)-3.347 F(ph)302.4 469.2 Q .863(ysical=
 e)-.05 F=0A-.15(ve)-.25 G .863(nt, lik).15 F 3.363(et)-.1 G .862=0A(he com=
pletion of a disk sec-)-3.363 F .837(tor read, b)302.4 481.2 R=0A.838(ut th=
e)-.2 F 3.338(ya)-.15 G .838(re handled in almost the same)=0A-3.338 F -.1(=
wa)302.4 493.2 S 2.562(yb).1 G 2.562(yt)-2.562 G .061(he k)=0A-2.562 F 2.56=
1(ernel. So)-.1 F -2.15 -.25(ev e)2.561 H(ntually).25 E=0A2.561(,w)-.65 G .=
061(hen the k)-2.561 F(ernel)-.1 E(isn')302.4 505.2 Q=0A2.727(tb)-.18 G .22=
7(usy with hardw)-2.927 F .228=0A(are interrupts, it will call the)-.1 F 1.=
27=0A(interrupt handler that ether_input decided should)302.4 517.2 R(tak)=
=0A302.4 529.2 Q 2.904(ec)-.1 G .404(are of the pack)-2.904 F .404(et. F)=
=0A-.1 F .405(or IPv4, that')-.15 F 2.905(si)-.55 G(pintr)-2.905 E 2.905=0A=
(.A)-.55 G(ll)-2.905 E 1.492(ipintr does is tak)302.4 541.2 R 3.991(ep)=0A-=
.1 G(ack)-3.991 E 1.491(ets from the IPv4 pack)-.1 F(et)-.1 E 2.457=0A(queu=
e and passes them on to ip_input. This is)302.4 553.2 R 1.958=0A(where the =
real fun be)302.4 565.2 R 1.957(gins. First of all, ip_input)=0A-.15 F(mak)=
302.4 577.2 Q .193=0A(es a whole lot of checks to see if the pack)-.1 F .19=
3(et is)-.1 F -.25=0A(va)302.4 589.2 S .206(lid, most of which has somethin=
g to do with the)=0A.25 F 4.662(IP header)302.4 601.2 R 7.162(,w)-.4 G 4.66=
3=0A(hich is de\214ned as struct ip in)-7.162 F=0A(sys/netinet/ip.h and loo=
ks lik)302.4 613.2 Q 2.5(et)-.1 G(his:)-2.5 E=0AEP=0A%%Page: 2 2=0A%%BeginP=
ageSetup=0ABP=0A%%EndPageSetup=0A/F0 10/Times-Roman@0 SF(-2-)282.17 48 Q(IP=
 header)153.78 84 Q 88.849=0A(Field Size)119.898 96 R .4 LW 273.6 100.5 72 =
100.5 DL(IP v)90.878 110 Q=0A98.714(ersion 4)-.15 F(bits)2.5 E 273.6 114.5 =
72 114.5 DL(Header length)=0A90.878 124 Q 2.5(4b)85.524 G(its)-2.5 E 273.6 =
128.5 72 128.5 DL -.8(Ty)=0A90.878 138 S(pe of Service).8 E 2.5(1b)76.054 G=
(yte)-2.5 E 273.6 142.5=0A72 142.5 DL 104.404(Length 2)90.878 152 R(bytes)2=
.5 E 273.6 156.5 72=0A156.5 DL 79.404(Identi\214cation 2)90.878 166 R(bytes=
)2.5 E 273.6 170.5=0A72 170.5 DL(Fragment of)90.878 180 Q 69.384(fset 2)-.2=
5 F(bytes)2.5 E=0A273.6 184.5 72 184.5 DL -.35(Ti)90.878 194 S(me T).35 E 2=
.5(oL)-.8 G=0A-2.15 -.25(iv e)-2.5 H 2.5(1b)85.374 G(yte)-2.5 E 273.6 198.5=
 72 198.5=0ADL 102.734(Protocol 1)90.878 208 R(byte)2.5 E 273.6 212.5 72 21=
2.5 DL=0A(Header checksum)90.878 222 Q 2.5(2b)63.874 G(ytes)-2.5 E 273.6 22=
6.5 72=0A226.5 DL(Source address)90.878 236 Q 2.5(2b)74.974 G(ytes)-2.5 E 2=
73.6=0A240.5 72 240.5 DL(Destination address)90.878 250 Q 2.5(2b)56.634 G(y=
tes)=0A-2.5 E 273.6 254.5 72 254.5 DL 273.6 100.5 273.6 254.5 DL 197.795 10=
0.5=0A197.795 254.5 DL 72 100.5 72 254.5 DL 3.619(If no IP addresses ha)72 =
282=0AR 3.918 -.15(ve b)-.2 H 3.618(een con\214gured, the).15 F(pack)72 294=
 Q=0A2.835(et can')-.1 F 5.335(tb)-.18 G 5.335(ef)-5.335 G 2.836=0A(or us, =
so it')-5.335 F 5.336(sd)-.55 G 2.836(ropped. If it')-5.336 F(s)=0A-.55 E 3=
.022(smaller than the minimum IP pack)72 306 R 3.021=0A(et size, it')-.1 F(=
s)-.55 E 5.335(dropped. If)72 318 R 2.836=0A(it has a multicast reserv)5.33=
5 F 2.836(ed source)-.15 F 1.133=0A(address, bad header checksum, wrong IP =
v)72 330 R(ersion)-.15 E .943=0A(or isn')72 342 R 3.443(ta)-.18 G(llo)-3.44=
3 E .944(wed by the pack)-.25=0AF .944(et \214lter)-.1 F 3.444(,i)-.4 G(t')=
-3.444 E 3.444(sd)-.55 G=0A(ropped.)-3.444 E .367(If it passes all those te=
sts, it is either forw)=0A72 354 R .367(arded to)-.1 F .218(another host \(=
if the k)72 366 R .219=0A(ernel has been con\214gured for)-.1 F .201=0A(rou=
ter functionality\), or it is destined for this host.)72 378 R .645=0A(IP p=
ack)72 390 R .646(ets can be fragmented, so ip_input checks)-.1 F=0A.118(if=
 this pack)72 402 R .117(et is a fragment of a bigger pack)-.1 F=0A.117(et.=
 If it)-.1 F .452=0A(is, we look in the queue of fragments to see if we)72 =
414 R(ha)72 426 Q=0A.348 -.15(ve a)-.2 H .048(ll the other fragments of tha=
t pack).15 F .047=0A(et, and try)-.1 F 3.882(to reassemble the fragments in=
to the original)=0A72 438 R(pack)72 450 Q 1.015(et if we do.)-.1 F 1.014(In=
 an)6.015 F=0A3.514(yo)-.15 G 1.014(ther case, the fragment)-3.514 F 1.477=
=0A(is put on the fragment queue, where it will stay)72 462 R 2.701=0A(unti=
l it either times out or all other fragments)72 474 R(arri)72 486 Q=0A-.15(=
ve)-.25 G 3.345(,a).15 G .845(nd ip_input returns.)-3.345 F .846=0A(Still h=
ere? Then use)5.846 F 1.717=0A(the protocol \214eld in the IP header to pas=
s it up)72 498 R .378=0A(through the stack to the ne)72 510 R .378(xt le)-.=
15 F -.15(ve)-.25 G=0A.378(l. The pack).15 F .378(et we)-.1 F(recei)72 522 =
Q -.15(ve)-.25 G=0A5.598(dw).15 G 3.097(as a UDP pack)-5.698 F 3.097(et, so=
 ip_input calls)=0A-.1 F(udp_input.)72 534 Q/F1 10/Times-Bold@0 SF 2.5(3. U=
DP)72 570 R(pr)=0A2.5 E(ocessing)-.18 E F0 1.123(udp_input looks at the UDP=
 header)97=0A585.6 R 3.624(,w)-.4 G(hich)-3.624 E 3.406(is called struct ud=
phdr)72=0A597.6 R 5.906(.T)-.55 G 3.405(he de\214nition can be)-5.906 F 3.7=
43=0A(found in sys/netinet/udp.h, b)72 609.6 R 3.743(ut this is ho)-.2 F=0A=
6.243(wi)-.25 G(t)-6.243 E(looks:)72 621.6 Q(UDP header)148.225 639.6 Q=0A9=
0.237(Field Size)115.734 651.6 R 273.6 656.1 72 656.1 DL(Source port)=0A93.=
654 665.6 Q 2.5(2b)83.302 G(ytes)-2.5 E 273.6 670.1 72 670.1 DL=0A(Destinat=
ion port)93.654 679.6 Q 2.5(2b)64.962 G(ytes)-2.5 E 273.6 684.1=0A72 684.1 =
DL(UDP length)93.654 693.6 Q 2.5(2b)82.182 G(ytes)-2.5 E 273.6=0A698.1 72 6=
98.1 DL(UDP checksum)93.654 707.6 Q 2.5(2b)67.192 G(ytes)-2.5=0AE 273.6 712=
.1 72 712.1 DL 273.6 656.1 273.6 712.1 DL 190.855 656.1=0A190.855 712.1 DL =
72 656.1 72 712.1 DL -1.1(Yo)302.4 84 S 4.687(um)1.1 G=0A2.187(ight recall =
that the IP header also had a)-4.687 F .702=0A(length and checksum \214eld =
in its header and w)302.4 96 R(on-)-.1 E=0A.872(der wh)302.4 108 R 3.372(yU=
)-.05 G .871=0A(DP also has them. The checksum here)-3.372 F .36(is dif)302=
.4 120 R .36=0A(ferent from the IP checksum, because it co)-.25 F(v-)-.15 E=
 .449=0A(ers the whole UDP datagram, while the IP check-)302.4 132 R 3.052=
=0A(sum co)302.4 144 R -.15(ve)-.15 G 3.052(rs only the IP header).15 F=0A5=
.553(.T)-.55 G 3.053(he length is)-5.553 F .513=0A(included because there c=
ould be non-UDP data in)302.4 156 R 1.558=0A(the IP pack)302.4 168 R 1.559(=
et after this datagram.)-.1 F 1.559=0A(After making)6.559 F 3.13=0A(sure th=
e UDP length and checksum \214elds are)302.4 180 R -.25(va)302.4=0A192 S 1.=
3(lid, udp_input uses the IP header').25 F 3.8(ss)-.55 G 1.3=0A(ource and)-=
3.8 F .031=0A(destination \214elds together with the source \214elds in)302=
.4 204 R=0A1.307(the UDP header to create source and destination)302.4 216 =
R(sock)=0A302.4 228 Q .8(et address structures. It then calls udp4_real-)-.=
1 F=0A.622(input, with the sock)302.4 240 R .623(et adresses, the mb)-.1 F=
=0A.623(uf, and the)-.2 F 1.752(header length as ar)302.4 252 R 4.252=0A(gu=
ments. udp4_realinput)-.18 F(has)4.252 E .797=0A(an easy job, it just looks=
 up which processes that)302.4 264 R 5.683=0A(are listening on the sock)302=
.4 276 R 5.683(et and then calls)-.1 F=0A.598(udp4_sendup. The data structu=
re that represents a)302.4 288 R=0A(process')302.4 300 Q 4.093(sb)-.55 G 1.=
592=0A(inding to a particular port is called a)-4.093 F(sock)302.4 312 Q 3.=
99=0A(et. If)-.1 F 1.49(the pack)3.99 F(et')-.1 E 3.99(sd)-.55 G 1.49=0A(es=
tination is a multicast)-3.99 F 1.184=0A(address, there can be more than on=
e listener)302.4 324 R 3.683(,b)-.4 G=0A(ut)-3.883 E(that')302.4 336 Q 7.72=
1(su)-.55 G 5.222=0A(sually not the case. udp4_sendup is)-7.721 F .996(anot=
her v)302.4 348 R=0A.996(ery simple function; it copies the mb)-.15 F(uf,)-=
.2 E 1.217=0A(puts the cop)302.4 360 R 3.717(yo)-.1 G 3.717(nt)-3.717 G 1.2=
18=0A(he sock)-3.717 F(et')-.1 E 3.718(sr)-.55 G(ecei)-3.718 E 1.518 -.15=
=0A(ve q)-.25 H 1.218(ueue, and).15 F .143(noti\214es the sock)302.4 372 R=
=0A.142(et that ne)-.1 F 2.642(wd)-.25 G .142(ata has arri)-2.642 F -.15=0A=
(ve)-.25 G .142(d. If the).15 F 1.518=0A(process has used ioctl to indicate=
 it w)302.4 384 R 1.518(ants to do)=0A-.1 F 1.849(asynchronous I/O on the s=
ock)302.4 396 R 1.848(et, the k)-.1=0AF 1.848(ernel will)-.1 F .532(send it=
 a SIGIO signal.)302.4 408 R .532=0A(The pack)5.532 F .533(et can \214nally=
 be)-.1 F(read from userland.)=0A302.4 420 Q EP=0A%%Trailer=0Aend=0A%%EOF=0A
--OXfL5xGRrasGEqWY--