Subject: Re: Changing the I/O scheduler on-the-fly
To: Juan RP <juan@xtrarom.org>
From: Julio M. Merino Vidal <jmmv84@gmail.com>
List: tech-kern
Date: 09/19/2005 13:36:03
On 9/19/05, Juan RP <juan@xtrarom.org> wrote:
> On Fri, 09 Sep 2005 12:51:04 +0900
> YAMAMOTO Takashi <yamt@mwd.biglobe.ne.jp> wrote:
>=20
> > as you are going to make them by-name,
> > you don't need to expose them to userland anymore, right?
>=20
> Yes, and I have new patch ready for review at
> http://www.xtrarom.org/disk_scheduler.diff
>=20
> I've changed my previous patch to pass strings instead of
> bits as you (and blymn & cube) suggested me in the past, also it is
> possible to specify the sort method, a new file was created to
> add the three functions I've added (sys/kern/subr_disk_sched.c).
>=20
> By default it will use BUFQ_SORT_RAWBLOCK if it's not specified (it will
> be the same than BUFQ_DISK_DEFAULT_STRAT()|BUFQ_SORT_RAWBLOCK).
>=20
> I haven't removed the ioctl to list the available schedulers built in
> the kernel, because in the future (as cube pointed out) we could
> specify add/remove strategies for some drivers.
>=20
> Is this ok? is there something wrong?

+=09=09if (argv[0] !=3D NULL)
+=09=09=09strcpy(dks.dks_schedname, argv[0]);

This can overflow the target string, causing a segfault in the program.
Use strlcpy setting a limit of sizeof(dks.dks_schedname).  There are
several other strcpy's in the code with the same problem.

+        =09if (ioctl(fd, DIOCSIOSCHED, &dks) =3D=3D -1)
+                =09err(1, "%s: scheduler", dvname);

Use EXIT_FAILURE as the error code.  (Same problem in other calls
to err.)

+=09switch (strat) {
+=09case BUFQ_FCFS:
+=09=09strcpy(dks->dks_schedname, "fcfs");
+=09=09break;
+=09case BUFQ_DISKSORT:
+=09=09strcpy(dks->dks_schedname, "disksort");
+=09=09break;
[...]

This is not scalable.  The kernel already has a 'bufq_strats' link
sets that holds all available link sets, changeable at build time.
Iterate over that link set (__link_set_foreach) to do what you need.
(See sys/bufq.h.)  Note: I don't know if that link set alone will be
enough for your patch, but if it is not, it should be changed or
a new one added.  Having big switches in the code with hardcoded
values in them goes against extensibility.

+=09if (strcmp(dks->dks_schedname, "fcfs") =3D=3D 0)
+=09=09bqflag =3D BUFQ_FCFS;

Same as above (avoid "big switches"), and also use strncmp.
We'd get false positives otherwise (based on incorrect user
input).

Now, I see that in disk_list_sched you already iterate over the
link set.  However, the switch inside it is wired.  Note that the items
in the link set already carry out the name of the item, which you can
reuse.

+=09char=09=09dks_schedname[32];=09/* disk scheduler name */
+=09char=09=09dks_methodname[32];=09/* sorting method name */

I'd define a constant holding the value 32 instead of using the value
directly.

And another thing: in the operation to list schedulers, DIOCLIOSCHED,
returning a single string with the whole list in it is suboptimal.  Instead=
,
change it to return a vector (if it was dynamically sized, that'd be
superb, but I'm not sure how you'd do it) so that the user can iterate
over it and print its items as he wishes.

Hope this helps,

--=20
Julio M. Merino Vidal <jmmv84@gmail.com>
http://www.livejournal.com/users/jmmv/
The NetBSD Project - http://www.NetBSD.org/