Subject: Re: Config ...
To: None <grefen@hprc.tandem.com>
From: Chris G. Demetriou <cgd@netbsd.org>
List: tech-kern
Date: 08/21/1998 08:56:53
Stefan Grefen <grefen@hprc.tandem.com> writes:
> > >     4) We don't have a 'device identification system' 
> > > 	    means reattching a device generates a new one, and does not 
> > > 	    revive a deconfigured one		[so far only PCMCIA]
> > 
> > So, i'd say that this is Wrong.
> 
> This is broken :-) 
> 
> Two scenarios:
>     1) On my Libretto I've only one PCMCIA slot.
>        What I want to do is:
> 	Remove the Ethernet card,
> 	Insert the Flashdisk from eletronic still camera
> 	Remove the Flashdisk
> 	Insert the Ethernet card
> 	Remove the Ethernet card
> 	Insert the Flashdisk from eletronic still camera ..
> 
> 	etc.
> 
>     Today I would end up with smc99 and wd180, after some
>     iterations, which is completly bogus. With devices realy gone 
>     I would have to redo all the network configuration every time.

Actually, if the detach code were actually finished, it'd end up
repeatedly attaching and detaching the same device numbers.  With
multiple slots, you could get into the situation where you always use
more and more unit numbers, but that is a bug, and easy to fix.

Re: network configuration: "that's what a daemon to sense that you've
inserted a card and run a script based on that is for."  Also,
realistically, in the 'modern world' (all i really care about,
w/pcmcia), you want to be using DHCP on your network interfaces, to
configure them and find you a good IP address.


>   2) Scsi:
>     If you rescan the SCSI bus and the user has switched the scsi-id of
>     two devices what do you do? 
>     Assume it's to tape drives? 
>     I know you shouldn't do it, but Joe blow will do it and scream if
>     his date goes south.

You detach the first instance, attach the second.

>     More common: you turn of one of your disks/tapedrives. 
>     Are you going to renumber the remaining ones?

Already-attached devices should stay attached, and not be detached
unless the kernel happens to sense that they've gone away.  (I.e. if
you don't tell the system to freeze the SCSI bus -- however you might
do that -- and disconnect stuff temporarily, it might to away.  But
that's your fault because you didn't freeze the SCSI bus.)


> The problem is that there may be device information kept at places where
> you don't expect it (May be implicitly in some daemon process). You can't  
> delete it all when the kernel wants to unconfigure the drive. (It would
> be a major effor I fear doing it on the kernel level only).
> I tried that a long time ago (for my old pcmcia code).

In the case of things like ethernet, you already have that problem
when, e.g. you want to switch networks.  The solution is _not_ to come
up with a horrible kludge, it's to _fix_ the programs so that they can
cope with a dynamic environment.

No, not an easy task.  Yes, the right solution.


> > If you detach a device, it should go away.  Completely gone.  Not
> > 'flagged inactive', not 'kinda-sorta there.' Gone.  detached.  no
> > longer a valid device.
> 
> But you can delete all 'secondary' knowledge of the device. That get nasty 
> if you do a bud-rescan.

Rescan shouldn't reassign unit numbers to children.  It should simply
say "are there any children here which aren't attached, and if so,
let's attach them."  There's no point in renumbering everything just
to handle a new device.

> The kernel calls a function to take down the device, calls a bus specific
> function to reclaim io-space/interrupts.
> Than switches all entry points to error returns.
> This can be hidden in specfs and the generic network interface code.

... and any other place that might need access to the device, of
course.  tty layer, if the device is open, etc.  I.e. you kludge a lot
of things in a lot of places, to avoid the complexity of doing detach
right.


> There should be an option where the user can say this device is really
> gone.

And the user who doesn't know about this and who goes from card to
card to card (which should work fine) gets stuck with an increasing
number of devices and kernel bloat?  "no thank you."

The logical behaviour of devices is that when you detach them, they
are gone.  As far as your computer's concerned, when you've taken your
ethernet out, you may well have put it straight into a shredder.  As
far as I can tell, as far as a naive user is concerned, that is true
as well.  They just want whatever card they happen to stick in to work
(via dynamic user-land adjustment of configuration parameters,
e.g. start up dhcp on the network interface) and when they take it out
it's _gone_.



> > it's cleaner if you just _remove_ the device from the system,
> > i.e. unconfigure it.  That has several advantages:
> > 
> > 	* all of the existing semantics around "is a device there?"
> > 	  are preserved.
> > 
> > 	* it allows for dynamic _unloading_ of device drivers, etc.
> > 	  For instance, say you pop in a SCSI pcmcia card.  The 'right
> > 	  thing' is to have it configure itself, loading a driver if
> > 	  necessary, function for a while, then unconfigure itself when
> > 	  you pop it out and unload the driver.
> 
> I aggree, but 
> 	1) LKM is not in a state to really allow that (only for the basic
> 	    driver)

Uh, bug, not feature.

> 	2) If we go down that path, we should introduce 'virtual devices'
> 	    like eth[0-n]  etc. in which case the real hardware id of
> 	    the device is never exported to a real 'user'-process.

Why?  I would argue that any user-land process that wants to talk to a
hardware interface, if properly written (if there are APIs available)
_should_ be able to cope in some sane way with the interface going
away.

Most aren't properly written, sure.  So, have your dynamic-event demon
kill them and restart them if you want that behaviour, or...

What do virtual devices buy you?


> > 	* method for direct-config bus device drivers to say to a daemon
> > 	  "I have this device here, that i've not a clue about.  What can
> > 	  you do for me."
> 
> I would make that passive, eg. have process asking which devices are in 
> which state (up-and-running, unprobed, probe-failed). The daemon would
> come in to late anyway.

"syslog starts up relatively late in the game, but it manages to get
kernel messages anyway..."

It's not as if there's much state to worry about anyway.  i mean, the
way I see this working is:

	configuration happens, some devices maybe don't get matched.

	daemon says "rescan."

	bus code passes back the information "I have devices X, Y, and
	Z which I can't cope with, here's information about them, what
	can you do for me?"

	daemon loads some more kernel drivers

	daemon says "rescan."

etc.  In my world, there's no "sort-of attached," there's only "is" or
"isn't."  "Isn't" only happens because:

	* the kernel wasn't told to attach that type of device there
	  ("locator-related" issues)
	* the kernel has no driver for that type of device.
	* the device was just detached from the device tree (i.e. by
	  user command).



The biggest sticking point in my mind is the issue of "how do you
decide that a device is really gone," and what meaning does "hardware
not there but still attached" have.

As noted, I think it's ... nonobvious to have attached device stick
around after the hardware is removed.  "If somebody borrows your
blender, do you queue jobs for your blender?  Do you try to wash it?"
8-) It also leads the situation where a user "just doing the naive
thing" (in an abnormal, but not too weird way) can get into a bad
situation and have to dig through the manuals or reboot to find out
what's going on.

The "still attached" issue is more irksome.  You can't assume that
settings can be kept over hardware removal; the hardware may not allow
that.  (e.g. power on the tape drive, it rewinds the tape, your state
about that tape is now hosed, and the kernel can't know that.)  And
there are other pathological cases, where e.g. you're plugging a
PCMCIA card into a DOS box because you want to change some EEPROM
setting that you can't change from NetBSD.  It's really the same card
that you took out, but it may be detected differently.  You'll want
your existing dynamic configuration software solution (e.g. network
restart scripts) to cope with it, but it might be attached as a
different interface, etc.

In a nutshell, keeping devices "still attached" when hardware has been
removed is non-intuitive, and it adds a bunch of Weird (wrong) semantics.


cgd
-- 
Chris Demetriou - cgd@netbsd.org - http://www.netbsd.org/People/Pages/cgd.html
Disclaimer: Not speaking for NetBSD, just expressing my own opinion.
Plug: Get your official NetBSD-1.3.2 CDROM set today! http://www.netbsd.com/