Subject: Re: Problems with postfix on NetBSD 4 RC1
To: Michael van Elst <mlelstv@serpens.de>
From: Jan Danielsson <jan.m.danielsson@gmail.com>
List: netbsd-users
Date: 09/16/2007 02:46:48
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512

Michael van Elst wrote:
>>   http://www.netbsd.org/cgi-bin/query-pr-single.pl?number=36963
> 
> Without further data, this is difficult to analyse.

   As I discover new data which I think may be relevant, I will try to
add it to the PR.

> But I would
> guess, that the permissions are indeed wrong. What permissions
> did you verify?

   I made sure that the ownership/groups are correct, as well as the
rwx-permissions, on all directories up to the root.

   I understand you skepticism, I truly do. If someone described this
problem to me, and I hadn't seen it with my own eyes, I too would
default to thinking: "This is obviously a case is simple
misconfiguration.". It's what everyone on IRC assumed to.

   Tell me what outputs you need to be convinced, and I'll add them to
the PR. It may take some time, because I haven't figured out any way to
control the behavior. Suddenly, it'll just switch to the mode when
applications start complaining about permission problems. And yesterday,
I may have witnessed the first time it actually switched back to working
again(!).

   I can start with this, I guess:

- ---------------------------
nl102-238-202# cd /
nl102-238-202# ls -l | grep home
drwxr-xr-x  10 root  wheel      512 Sep 15 23:02 home
nl102-238-202# cd home
nl102-238-202# ls -l | grep pkgsrc
drwxr-xr-x   6 pkgsrc  users       512 Sep 11 23:49 pkgsrc
nl102-238-202# su - pkgsrc
$ pwd
/home/pkgsrc
$ ls -l
ls: .: Permission denied
$ exit
nl102-238-202# user info pkgsrc
login   pkgsrc
passwd  <not important>
uid     1001
groups  users
change  NEVER
class
gecos
dir     /home/pkgsrc
shell   /bin/ksh
expire  NEVER
- ---------------------------

   But this is actually pretty uninteresting. The *interesting* part is
that if I reboot, I'll probably be able to get the directory listing
when I've su:ed to pkgsrc and run "ls -l". Then, if I go out for a walk,
or if I use my normal user "jan" to startx, and browse around using
Firefox for a while, then "ls -l" with the su:ed pkgsrc user *won't* work.

> As for postfix, this looks like a manually created queue or an
> incomplete or broken configuraton. Places to look at are
> the queue_directory and mail_owner parameters in main.cf.
> If the configuration is ok, you may try 'postfix set-permissions'.

   I've done that, it didn't help. There's another command, which tries
to verify the postfix installation. but it complains about problems with
the permissions. I also asked someone with a working postfix
configuration to verify the owner/group and rwx-permissions against
mine. They were identical.

> As for pgsql, this looks like an existing database with a different
> owner. Please check permissions and ownerships of /var/pgsql and
> everything underneath.

   Believe me -- there's nothing wrong with them; I'm 100% sure they
are. But just for a second opinion, I went through all this with someone
on IRC too. I got all output he requested, he reviewed it, and just came
to the conclusion that "it's strange".

   There's some further data worth mentioning. I tried to make it clear
in the PR, but I'm guessing it wasn't stated clearly enough. The
permission problem is _not_ always there. For instance, almost every
time just after I have booted, I can do this:

# su - pkgsrc
$ ls -l
<it lists the files properly>

   After a few hours of using the system (wihtout changing any system
configuration what-so-ever), I get this:

# su - pkgsrc
$ ls -l
ls: .: Permission denied

   Once it gets into this state, it generally doesn't revert, but I
believe it has done so _once_ (meaning I could "su" to pkgsrc, and list
the files, even though it previously didn't work. Again, I did *nothing*
to muck around with any system configuration during the change in
behavior). (Now it's back to not being able to list the files, btw).


   I'll reiterate, because it's such an important point: It's not a 100%
consistent problem. Some things work when the system is booted, but once
the problem starts, it seems to affect several applications. Case in
point: When I rebooted last time, I noticed that after su:ing to the
pkgsrc user, I could indeed "ls -l". I quickly tried to launch
postgresql, and it did start (though I'm not totally sure that there's a
correlation there). But now I can no longer "ls -l" in the pkgsrc user,
so I _assume_ (according to previous experiences) that if I would stop
postgresql now, I would not be able to start it again (until I enter the
state when "ls -l" works with the pkgsrc user again (and I have no idea
how to reach that, other than just trying it from time to time)).

> Neither case seems to be a default configuration. You may want to
> show the changes you made.

   My postgresql configuration worked perfectly on my 3.1 system. The
installation now is identical to it (sans NetBSD/amd64 3.1). Also, note
that I have postgresql currently running, because I managed to start it;
I assume because I found a window when "ls -l" was working for the
pkgsrc user.

   In the case of postgresql; I also tried to rename /var/pgsql to
/var/pgsql_old, and allowed it to reinitialize a new completely new
database. When it's in the "bad" state ("ls -l" won't work for the
pkgsrc user), the database initialization simply won't work (problems
with permissions..). If it's in the "good" state, the database
initialization will work flawlessly.


   There seems to be two states; when su:ed sessions fail to access
directories due to problems permissions, and a state when it works as
expected. But there is one exception: I haven't seen postfix work, ever,
which may be due to some other problem, or I simply haven't been able to
time it properly. But it does always complain about a permission on
directory "defer" (again, the ownership, group and permissions have been
verified by a user who is running a fully functional postfix).

   Again, just ask for any more information, and I'll provide it. But
please remember that the "good" state is *very* rare. It pretty quickly
switched to the "bad" state, and it appears to stay there (with that one
exception I mentioned).

- --
Kind regards,
Jan Danielsson

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (NetBSD)

iD8DBQFG7Hz4uPlHKFfKXTYRCrvjAJ9BBrYqsTh/+aTLDtinwoVhy2SzPACeKV8v
MEjeY0MbbZTuO4vL5jbyjKo=
=iaxa
-----END PGP SIGNATURE-----