Subject: Versioning Structs Considered Harmful.
To: None <tech-kern@netbsd.org>
From: None <toddpw@toddpw.org>
List: tech-kern
Date: 04/18/2004 07:20:55
I just filed PR kern/25226.

This PR is caused by a fundamental pitfall: if you have a syscall that
takes a pointer to a struct, and you then add fields into the struct, you
are probably setting yourself up for a horrible COMPAT problem later on.

Right now we have a situation where a 1.6.2 mount_msdos running on a 2.0
kernel will get screwy results, because the kernel expects 'gmtoff' where
mount_msdos passes 'flags', and the kernel expects 'flags' where mount_msdos
passes garbage. Worse, if the garbage flags have VERSIONED set, then the
kernel looks at more garbage to set the directory permissions of the mount.

This is a fundamental screw and requires a (relatively low-impact) flag day.
Developers should think VERY CAREFULLY before adding fields to syscall args
structures. It is better to deal with the pain of adding a length argument
and letting that stand in as the version of the structure, than to go this
route and set us up for absolute COMPAT lossage months or years later.


The patch attached to the PR is one totally safe solution, but it has the
obnoxious property that it calls copyin() three times for the same struct.

A better solution to this problem can be seen with an interface like bind(2)
which passes in an explicit length as another syscall argument. This allows
the kernel to perform a single read and if it faults, the user gets EFAULT.

In contrast, the "backwards compatible" versioning scheme introduced by
sys/fs/msdosfs/msdosfsmount.h version 1.3 requires a lot of care to get
right. I added extra braces below to illustrate how the struct grows;
also, the bogus change in -r1.4 has been cleaned up:

struct msdosfs_args {
        char    *fspec;         /* blocks special holding the fs to mount */
        struct  export_args export;     /* network export information */
        uid_t   uid;            /* uid that owns msdosfs files */
        gid_t   gid;            /* gid that owns msdosfs files */
        mode_t  mask;           /* mask to be applied for msdosfs perms */
        int     flags;          /* one flag indicates versioned structure */
}; // original
        int     version;        /* version of the struct */
        mode_t  dirmask;        /* v2: mask to be applied for msdosfs perms */
}; // v2
        int     gmtoff;         /* v3: offset from UTC in seconds */
}; // v3

To avoid reading off the end of the user's struct, you first read the original
struct to get 'flags', then if the VERSIONED flag is set you read the next bit
to get 'version', then you know how big the struct is. However, if the version
number is bumped by one, then you must either read the largest possible struct
(and risk bogus EFAULT returns) or consult a lookup table to figure out which
version of the struct is which true length.

Avoiding the lookup table pretty much requires that 'version' be defined as
the sizeof() the struct. This also has an advantage, that the defaulting code
is very regular and uses symbolic offsetof()/sizeof() expressions instead of
hardcoded numbers. (An example of this is also in the PR's patch.)

-- 
Todd Whitesel
toddpw @ toddpw.org