Subject: Re: RFC: patch to change the disk I/O scheduler via dkctl(8)
To: None <tech-kern@netbsd.org>
From: Juan RP <juan@xtrarom.org>
List: tech-kern
Date: 09/25/2005 19:31:41
On Sun, 25 Sep 2005 12:30:43 -0400
christos@zoulas.com (Christos Zoulas) wrote:

> +#define	DKSCHEDNMAX	32		/* strategy max
> name */ +#define	DKSCHEDVMAX	16		/* max
> strategies */
> 
> probably no need to use a struct. you can just use a buffer.
> the get scheduler call will return strat:method
> and the set scheduler call can take [strat]:[method]. This way you can
> set only the strat or only the method. Or you can have 2 pairs of
> ioctls one for method and one for scheduler.
> The list schedulers ioctl takes and returns a
> struct disk_sched {
> 	size_t bufsiz;
> 	char *schedbuf;
> };
> which will return a comma separated list of schedulers; on return
> bufsiz will be set to the required size to satisfy the operation, so
> you can call it with bufsiz = 0 and schedbuf = NULL; to find out how
> much you need if you want.

Grr, the way you're suggesting here was implemented sometime
ago... and jmmv suggested me to use a vector instead of using
a buffer.

> +
> +	/* strategy */
> +	__link_set_foreach(bq_strat, bufq_strats) {
> +		if ((*bq_strat)->bs_name != NULL) {
> +			/* skip dummy strategy (subr_disk.c) */
> +			if ((*bq_strat)->bs_id == 0)
> +				continue;
> +			/* strategy found, let's copy it then */
> +			if ((*bq_strat)->bs_id == strat) {
> +				strlcpy(dks->dks_schedname,
> +			    	    (*bq_strat)->bs_name,
> +			    	    sizeof(dks->dks_schedname));
> +				break;
> 
> that check for default should be done before the link set foreach
> and maybe the link set walk should be skipped.

Do you mean this one:  if ((*bq_strat)->bs_id == 0) ?

I cannot check the value of the current strategy without running
__link_set_foreach(), or am I wrong?

> +	/* sort method */
> +	switch (method) {
> +	case BUFQ_SORT_RAWBLOCK:
> +		strlcpy(dks->dks_methodname, "sortblk",
> +		    sizeof(dks->dks_methodname));
> +		break;
> +	case BUFQ_SORT_CYLINDER:
> +		strlcpy(dks->dks_methodname, "sortcyl",
> +		    sizeof(dks->dks_methodname));
> +		break;
> +	default:
> 
> Is the default case really useful? when can it happen?

It's useful, because it needs to assign a default sort method.
Otherwise the kernel will panic().
 
> +int
> +disk_set_sched(void *ptr, struct disk_sched *dks)
> +{
> +	__link_set_decl(bufq_strats, const struct bufq_strat);
> +	const struct bufq_strat * const *bq_strat;
> +	struct buf *bp;
> +	struct bufq_state *bufq = ptr;	/* orig bufq */
> +	struct bufq_state tbufq;	/* temporal bufq */
> +	int s, bqflag;
> +	boolean_t strat = FALSE;
> +
> +	bqflag = 0;
> +	__link_set_foreach(bq_strat, bufq_strats) {
> +		if (strncmp(dks->dks_schedname, (*bq_strat)->bs_name,
> +		    sizeof(dks->dks_schedname)) == 0) {
> +			bqflag = (*bq_strat)->bs_id;
> +			strat = TRUE;
> +			break;
> +		}
> +	}
> +
> +	if (!strat)
> +		return EINVAL;
> +
> move that before the foreach, then change the |= below to =. Then you
> can change the assignment in the foreach loop to |=. Then you don't
> need to initialize bqflag to zero. the names sortblk and sortcyl
> should probably be in a static array and the array used instead, so
> that we don't repeat strings in the kernel (yes the linker merges
> them, but what if they were in different files?)

Ok.
 
> retrieving revision 1.311
> diff -b -u -r1.311 wd.c
> --- sys/dev/ata/wd.c	5 Sep 2005 22:55:31 -0000	1.311
> +++ sys/dev/ata/wd.c	22 Sep 2005 01:28:08 -0000
> @@ -73,6 +73,8 @@
>  #endif /* ATADEBUG */
>  
>  #include "rnd.h"
> +/* to change the disk I/O scheduler on the fly */
> +#include "opt_disk_scheduler_custom.h"
> 
> Why have this conditional?

I thought it might be useful for embedded devices with low memory,
I can disable it if you want.

I want to say all things I've fixed in my patch:

* First version: patch implemented passing bits from userland to kernel
and vice-versa.

Answer: yamt and cube suggested to make this passing strings.

* Second version: patch implemented passing strings from kernel to
userland and using a buffer for the strategies.

Answer: jmmv said that using a buffer for the strategies is not
scalable, and having hardcoded values is unoptimal.

* Third version: patch implemented passing strings and using a vector
for the strategies and fixed all hardcoded values from userland/kernel.

Answer: christos suggests a buffer for the strategies, using a comma
delimiter.

So what's next? I've implemented the patch three times with three
different ways...

I would like to know what's the FINAL way to implement it, because I've
spent a lot of time implementing all these things for nothing... 

Thanks.