Subject: Re: Bug found: help to isolate it
To: Manuel Bouyer <bouyer@antioche.eu.org>
From: Lista de NetBSD Users <list10@sepc.edu.mx>
List: netbsd-users
Date: 05/20/2002 20:19:46
On Sun, 19 May 2002, Manuel Bouyer wrote:

> BTW, I suspect you have some very special setup. I have > 20
> NetBSD servers (i386, alpha, sparc), some of them with very large
> uptime (> 500 days) and the only time I've seen that was when someone
> hit the 'scroll lock' on the console. Unlocking it unwedged syslogd.

When I start a new server for production I do
the following list:

0) Write a list of services and packages this server will need.
   Check for each package the kernel options needed.
   Check for each package if it can have his own logfile
   (sendmail->maillog, named->local1, pgsql->local0,
    imap-uw->maillog, dhcpd->local2, etc)
1) Configure setup and turn off APM and these things
2) Install NetBSD, no X, no games.
3) Compile a new kernel with the devices found in dmesg
   and the options needed.
4) If this server will move a lot of mail, then
   mkdir /var/log/mail and touch /var/log/mail/maillog
   and adjust /etc/newsyslog.conf with the number of
   backup files and the size.
5) In our case, our server will run pgsql and horde/imp
   (apache, php4, etc are needed) then add to newsyslog.conf
   /var/log/horde/horde   640  7    750  *     Z
   /var/log/pgsql/pgsql   640  7    750  *     Z
   Also edit /etc/syslog.conf and add:
   local0.*     /var/log/pgsql/pgsql
   local4.*     /var/log/horde/horde
6) Edit conf files of pgsql and horde for:
   a) to use syslogd and b) to use localX (0 or 4)
   This is not enough because we got a lot of messages
   of local0/4 in /var/log/messages, then we add
   local0.none and local4.none to the following line
   of syslog.conf

*.info;auth,authpriv,cron,ftp,kern,local0,local4,lpr,mail.none
/var/log/messages
--------------------------
This is my newsyslog.conf
# logfilename           [user:group]    mode ngen size time [ZBN-]
[/pidfile] [sigtype]
#
/var/cron/log           root:wheel      600  5    50   *     Z
#/var/log/aculog        uucp:dialer     640  7    *    24    Z
/var/log/authlog                        600  9    75   *     Z
/var/log/kerberos.log                   640  7    *    24    Z
/var/log/lpd-errs                       640  7    10   *     Z
/var/log/mail/maillog                   600  9    750  *     Z
/var/log/messages                       644  9    750  *     Z
/var/log/wtmp                           644  7    75   *     ZBN
/var/log/xferlog                        640  7    250  *     Z
/var/log/horde/horde                    640  7    750  *     Z
/var/log/pgsql/pgsql                    640  7    750  *     Z
----------------------------------
#       $NetBSD: syslog.conf,v 1.6 1997/02/21 09:04:26 mikel Exp $

*.err;kern.*;auth.notice;authpriv.none;mail.crit /dev/console
*.info;auth,authpriv,cron,ftp,kern,local0,local4,lpr,mail.none
/var/log/messages
kern.debug                                       /var/log/messages

# The authpriv log file should be restricted access; these
# messages shouldn't go to terminals or publically-readable
# files.
auth,authpriv.info                           /var/log/authlog

cron.info                                    /var/cron/log
ftp.info                                     /var/log/xferlog
lpr.info                                     /var/log/lpd-errs
mail.info                                    /var/log/mail/maillog
#uucp.info
/var/spool/uucp/ERRORS

*.emerg                                      *
*.notice;auth.debug                          root

local0.*                                     /var/log/pgsql/pgsql
local4.*                                     /var/log/horde/horde
-----------------------------------------------------

At this point, I believe this is the right way to
use syslogd with packages. If I am wrong, please
tell me.

> Can you give more details about your setup (hardware, console type, software
> using syslogd, syslogd.conf, etc ...)

server# fstat | grep c05cee80
nobody   httpd        234    8* unix dgram c064e700 <-> c05cee80
nobody   httpd        231    5* unix dgram c0645c80 <-> c05cee80
root     named        126    3* unix dgram c05cef40 <-> c05cee80
root     syslogd      117    3* unix dgram c05cee80

sendmail is not in this list because is not running like a daemon.
I do not see pgsql in the list... horde/imp are php4 programs,
I supose horde/imp are one of the httpd processes of the list.

The hardware is the following (from dmesg)

Standard hardware were ommited and number of disks
and LAN cards may vary, all the memory is ECC
---- we have 8 like this ------------------------
cpu0: Intel Pentium II/Celeron (Deschutes) (686-class), 350.82 MHz
total memory = 127 MB
avail memory = 116 MB
using 1658 buffers containing 6632 KB of memory
BIOS32 rev. 0 found at 0xf0210
pchb0: Intel 82443BX Host Bridge/Controller (rev. 0x03)
pcib0: Intel 82371AB PCI-to-ISA Bridge (PIIX4) (rev. 0x02)
pciide0 at pci0 dev 7 function 1: Intel 82371AB IDE controller (PIIX4)
(rev. 0x01)
uhci0 at pci0 dev 7 function 2: Intel 82371AB USB Host Controller (PIIX4)
(rev. 0x01)
Intel 82371AB Power Management Controller (PIIX4) (miscellaneous bridge,
revision 0x02) at pci0 d
ev 7 function 3 not configured
vga0 at pci0 dev 10 function 0: ATI Technologies Mach64 B (rev. 0x5c)
ahc0: aic7880 Wide Channel A, SCSI Id=7, 16/255 SCBs
ex0 at pci0 dev 14 function 0: 3Com 3c905B-TX 10/100 Ethernet (rev. 0x30)
ex1 at pci0 dev 16 function 0: 3Com 3c905B-TX 10/100 Ethernet (rev. 0x30)
sd0 at scsibus0 target 0 lun 0: <IBM, DDRS-39130D, DC1B> SCSI2 0/direct
fixed
---- we have 3 like this --------------------------
NetBSD 1.5.2 (MINERVA) #2: Wed Nov 14 17:45:50 CST 2001
    gallegos@minerva:/usr/src/sys/arch/i386/compile/MINERVA
cpu0: Intel Pentium Pro (686-class), 198.96 MHz
total memory = 127 MB
avail memory = 116 MB
using 1659 buffers containing 6636 KB of memory
BIOS32 rev. 0 found at 0xf0210
pchb0: Intel 82441FX PCI and Memory Controller (PMC) (rev. 0x02)
pcib0: Intel 82371SB PCI-to-ISA Bridge (PIIX3) (rev. 0x01)
pciide0 at pci0 dev 7 function 1: Intel 82371SB IDE Interface (PIIX3)
(rev. 0x00)
vga0 at pci0 dev 12 function 0: S3 Trio32/64 (rev. 0x54)
ex0 at pci0 dev 14 function 0: 3Com 3c905B-TX 10/100 Ethernet (rev. 0x30)
ahc0: aic7880 Wide Channel A, SCSI Id=7, 16/255 SCBs
sd0 at scsibus0 target 0 lun 0: <IBM, DDRS-39130D, DC1B> SCSI2 0/direct
fixed
sd1 at scsibus0 target 1 lun 0: <IBM, DDRS-39130D, DC1B> SCSI2 0/direct
fixed
sd2 at scsibus0 target 2 lun 0: <IBM, DDRS-39130D, DC1B> SCSI2 0/direct
fixed
sd3 at scsibus0 target 3 lun 0: <IBM, DDRS-39130D, DC1B> SCSI2 0/direct
fixed
-----------------------------------------

Well... I think it is all we can do now... and
wait for the next event.

The case of the second server with syslogd stopped is
most interesting... but it is for another mail
message (tonight or tomorrow).

Thanks a lot

Heron Gallegos