Subject: Re: CVS commit: src/sys/kern
To: Mindaugas R. <rmind@NetBSD.org>
From: Elad Efrat <elad@bsd.org.il>
List: source-changes
Date: 04/04/2007 16:28:11
Mindaugas R. wrote:
> Elad Efrat <elad@bsd.org.il> wrote:
>> therefore, I kindly ask you again to revert.
> I would certainly revert if at least one more people would ask for that (I
> re-asked them to take a look at this). But I still have no objections on this.

the fact of the matter is that you modified code you are not familiar
with and don't understand without compile testing it and/or getting
your changes reviewed. THIS is the real issue here.

I do not expect netbsd to make proper judgment here. like I said, it
took me several days to convince an "old timer" that the kernfs KASSERT
was wrong. it took me *two weeks* to make core@ understand why the
kauth_impl.h change is devastating. I'm afraid I'm not going to put much
more effort into this case than I already have. if the powers that be in
this project believe this is how development should proceed, with people
completely ignoring any kind of commit guidelines and quality assurance,
so be it.

>> later on, we lookup the table itself. according to your change, if it
>> doesn't exist, we return ENOENT. this is *after* a file_free() call that
>> (lacking return value :) succeeded in removing a file from the table.
> Heh, so rather let to dereference NULL pointer?

can you, for the record, list the chain of events that may lead to a
NULL dereference here?

> Yes, in a normal case, instead of this check, there could be simply KASSERT,

a KASSERT is closer to what's needed there, I agree.

> but in this case, when there is a lack of locking (this code is not MP
> safe,by the way) 

you are telling me it's not MP safe after I showed you there's an open
pr for it from over 3 months ago and told you I waited for ad@ to commit
the newlock2 locking primitives? are you for real?

> - I would really rather put the checks in such places. 

then it should be pointed out again that *you* are not in place to make
such calls. you are a NEW developers, you are NOT FAMILIAR with the code
you changed, you DID NOT COMPILE TEST before committing, and you DID NOT
GET YOUR CHANGES REVIEWED either. now you are talking to the person who
*designed and wrote* this subsystem and I'm telling you that what you
did is wrong. instead of acknowledging that your change is *WRONG* no
matter how you look at it -- I even showed you what the *right* way of
doing it would be, if you really wanted to, because right now you are
returning ENOENT -- you continue arguing.

> To be
> honest, as long I look at the code, I would tend to agree with Jared to mark
> fileassoc as experimental, while somebody fixes/finishes it.

this is way over your head. I'll explain.

the reason fileassoc lacked locking primitives is that I waited for
ad@ to commit and stablize newlock2. I *discussed* this pr with ad@
and yamt@, and, as a matter of fact, *prior* to newlock2, in a biglock
kernel, it is impossible to trigger the NULL deref.

the idea was, once newlock2 is in, to add locking primitives to
fileassoc. however, because of problems just like this one, of people
in this project messing up code they're not familiar with, I decided
it is best for me to wait until there are people who actually know what
they're doing in charge. this still didn't happen, so newlock2 was
merged and fileassoc remained without any locking. so *now* it *is* a
problem. fileassoc is not "experimental": it is probably more thoroughly
tested than a huge portion of the tree.

to trigger the NULL dereference you'd have to:

  1. delete a file that has an entry in fileassoc
  [ time your actions right, and ]
  2. as root, unmount the filesystem with the file you deleted in #1

I'm not too familiar with vfs -- perhaps you are -- so I'm not 100% sure
if the unmount will succeed without '-f' if sys_unlink() didn't return
yet, but you get the idea: the NULL deref in the kernel can only be
triggered by a malicious root user. since netbsd did not (yet) implement
the strict notion of user->root->kernel privilege transition in the
bsd44 secmodel, I'm going to argue that this bug is a non-issue, and
that it is far from critical.

bottom line, ONCE AGAIN, please revert.

> Anyway, your primary claim that "it is wrong" is not true.

sure it is. you are returning ENOENT after successfully removing an
entry for the file from the tables because the table vanished in the
middle of the process. basically you should return 0, because there's
no entry count to lower. :)

-e.