Subject: Re: RFC: patch to change the disk I/O scheduler via dkctl(8)
To: None <tech-kern@netbsd.org>
From: Juan RP <juan@xtrarom.org>
List: tech-kern
Date: 09/25/2005 19:31:41
On Sun, 25 Sep 2005 12:30:43 -0400
christos@zoulas.com (Christos Zoulas) wrote:
> +#define DKSCHEDNMAX 32 /* strategy max
> name */ +#define DKSCHEDVMAX 16 /* max
> strategies */
>
> probably no need to use a struct. you can just use a buffer.
> the get scheduler call will return strat:method
> and the set scheduler call can take [strat]:[method]. This way you can
> set only the strat or only the method. Or you can have 2 pairs of
> ioctls one for method and one for scheduler.
> The list schedulers ioctl takes and returns a
> struct disk_sched {
> size_t bufsiz;
> char *schedbuf;
> };
> which will return a comma separated list of schedulers; on return
> bufsiz will be set to the required size to satisfy the operation, so
> you can call it with bufsiz = 0 and schedbuf = NULL; to find out how
> much you need if you want.
Grr, the way you're suggesting here was implemented sometime
ago... and jmmv suggested me to use a vector instead of using
a buffer.
> +
> + /* strategy */
> + __link_set_foreach(bq_strat, bufq_strats) {
> + if ((*bq_strat)->bs_name != NULL) {
> + /* skip dummy strategy (subr_disk.c) */
> + if ((*bq_strat)->bs_id == 0)
> + continue;
> + /* strategy found, let's copy it then */
> + if ((*bq_strat)->bs_id == strat) {
> + strlcpy(dks->dks_schedname,
> + (*bq_strat)->bs_name,
> + sizeof(dks->dks_schedname));
> + break;
>
> that check for default should be done before the link set foreach
> and maybe the link set walk should be skipped.
Do you mean this one: if ((*bq_strat)->bs_id == 0) ?
I cannot check the value of the current strategy without running
__link_set_foreach(), or am I wrong?
> + /* sort method */
> + switch (method) {
> + case BUFQ_SORT_RAWBLOCK:
> + strlcpy(dks->dks_methodname, "sortblk",
> + sizeof(dks->dks_methodname));
> + break;
> + case BUFQ_SORT_CYLINDER:
> + strlcpy(dks->dks_methodname, "sortcyl",
> + sizeof(dks->dks_methodname));
> + break;
> + default:
>
> Is the default case really useful? when can it happen?
It's useful, because it needs to assign a default sort method.
Otherwise the kernel will panic().
> +int
> +disk_set_sched(void *ptr, struct disk_sched *dks)
> +{
> + __link_set_decl(bufq_strats, const struct bufq_strat);
> + const struct bufq_strat * const *bq_strat;
> + struct buf *bp;
> + struct bufq_state *bufq = ptr; /* orig bufq */
> + struct bufq_state tbufq; /* temporal bufq */
> + int s, bqflag;
> + boolean_t strat = FALSE;
> +
> + bqflag = 0;
> + __link_set_foreach(bq_strat, bufq_strats) {
> + if (strncmp(dks->dks_schedname, (*bq_strat)->bs_name,
> + sizeof(dks->dks_schedname)) == 0) {
> + bqflag = (*bq_strat)->bs_id;
> + strat = TRUE;
> + break;
> + }
> + }
> +
> + if (!strat)
> + return EINVAL;
> +
> move that before the foreach, then change the |= below to =. Then you
> can change the assignment in the foreach loop to |=. Then you don't
> need to initialize bqflag to zero. the names sortblk and sortcyl
> should probably be in a static array and the array used instead, so
> that we don't repeat strings in the kernel (yes the linker merges
> them, but what if they were in different files?)
Ok.
> retrieving revision 1.311
> diff -b -u -r1.311 wd.c
> --- sys/dev/ata/wd.c 5 Sep 2005 22:55:31 -0000 1.311
> +++ sys/dev/ata/wd.c 22 Sep 2005 01:28:08 -0000
> @@ -73,6 +73,8 @@
> #endif /* ATADEBUG */
>
> #include "rnd.h"
> +/* to change the disk I/O scheduler on the fly */
> +#include "opt_disk_scheduler_custom.h"
>
> Why have this conditional?
I thought it might be useful for embedded devices with low memory,
I can disable it if you want.
I want to say all things I've fixed in my patch:
* First version: patch implemented passing bits from userland to kernel
and vice-versa.
Answer: yamt and cube suggested to make this passing strings.
* Second version: patch implemented passing strings from kernel to
userland and using a buffer for the strategies.
Answer: jmmv said that using a buffer for the strategies is not
scalable, and having hardcoded values is unoptimal.
* Third version: patch implemented passing strings and using a vector
for the strategies and fixed all hardcoded values from userland/kernel.
Answer: christos suggests a buffer for the strategies, using a comma
delimiter.
So what's next? I've implemented the patch three times with three
different ways...
I would like to know what's the FINAL way to implement it, because I've
spent a lot of time implementing all these things for nothing...
Thanks.