Subject: PR 36963
To: None <tech-kern@netbsd.org>
From: Jan Danielsson <jan.m.danielsson@gmail.com>
List: tech-kern
Date: 09/19/2007 02:39:02
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512

Hello all,

   For background, read:
http://www.netbsd.org/cgi-bin/query-pr-single.pl?number=36963

   I have "confirmed" (reproduced many times, with 100% success rate)
that the bug does disappear if I log in with my user "jan", and log out
again. I even got postfix running that way(!).

   To summarize:
   - When I start my system, postfix, postgresql, apache, etc won't run.
postfix complains about problems with the permissions on some directory
(I have verified that the permissions are in fact ok). postgres just
hangs if a database has been initialized. If none exists, it tries to
create it, but fails do to permission problems. apache doesn't complain,
but it doesn't start.
   - If I login/logout with my user "jan", which belongs to the "wheel"
group, I can start postfix, postgresql and apache without any problems.
Once they are up and running, they seem to be fine. But the fix is only
temporary. It takes only a few minutes for the permission problem to
reassert itself.
   - The bug is global. When it kicks in, it appears to affects all
users, except my user "jan" (which belongs to the group wheel) and root.
   - If I can start postgresql, and I wait a few minutes until the "bug"
gets "activated" , then I can not start postfix, and vice versa.
   - I wrote a simple program which just calls opendir("."). It fails
for all users I have tried (except "root" and "jan") when the "bug" is
"active". The opendir(".") succeeds for all users when the "bug" is
"inactive" (I can disable its effect temporarily by doing the
login/logout trick as mentioned above).
   - I have tried to boot on a netbsd-4 (as of yesterday) kernel, but
the problem persists.

   What I may try: I may add postgresql and/or postfix to the wheel
group and see if that changes anything. But I doubt it'll do any difference.

   I just want to know if this problem makes sense to anyone? It seems
that no one is able to reproduce it. And I've noticed quite a lot of
skepticism when I've described the problem; so I assume it's so weird
that very few actually believe me. :-/

   I would just like to check if anyone has any interest in this
problem. I've build a kernel with the built-in kernel debugger. If the
bug appears with that too, I'll be able to track it down, if I just
spend enough time with it. Though it took me many days to learn the ins
and outs in the OS/2 kernel with its kernel debugger. I fear it'll take
just as long with the netbsd kernel. So *any* hints on where I should
start looking are *very* welcome. I have some ideas (see the ktrace
dumps in the PR), but if anyone feels like giving me some more pointers;
then please do.

- --
Kind regards,
Jan Danielsson

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (NetBSD)

iD4DBQFG8G+muPlHKFfKXTYRChahAJdWYJ8KoqmJa5a9tS5HIpXILsDbAJ4y8wYF
HgVo+YJxBf8RNLeSwqLeNw==
=qrcW
-----END PGP SIGNATURE-----