Subject: email (was Re: Recoverable Network File System?)
To: NetBSD User's Discussion List <netbsd-users@NetBSD.org>
From: Chuck Yerkes <chuck+nbsd@2003.snew.com>
List: netbsd-users
Date: 12/12/2003 18:47:05
Quoting Sean J.Schluntz (schluntz@workofstone.com):
> Thanks for the feedback, I guess that brings me down to one question.  
> Is it possible to turn off the write cache?  I have one instance where 
> two mail servers may try to write to the same network mounted mail 
> spool at the same time :/  Having one hit cached could really run me in 
> to trouble.

Ah, we're talking about mail.
That opens up other cans of worms.

I've worked with mail systems a little bit.  I also worked for a
company that wrote an HA application for Sun's a while back.

FURTHER, I've worked on with production systems, including trading
floors where 3 minutes of downtime means, no matter how bad the trader
has done for several months, you just cost him the deal of a lifetime
and cost the company millions^WBILLIONS of dollars :)
(part of why we wrote HA for Sun).


REALITY.
Lets just breath for a minute and ponder it.
In reality, failover and redundant systems are harder to run that
vanilla systems.  That means only people trained in it, used to it,
deeply familiar with it are allowed to make changes to it.  Ever.
Jane Jr. System admin is NOT allowed to update inetd.conf or services.
Because it can affect the redundant systems.

EMAIL
Email is a 30 year old application.  Since DNS, we got MX records.
This allows messages to travel to machines with redundancy.
HA is usually an impediment here.  Worried about mail in transit?
Fine, mirror or use RAID on the spool disks.

MailStore:
This is where mail lives once it lands.  In older systems, this
might be a version 7 mail box (mbox) or related.  Fine for ISPs
and POP where the user empties it regulary.  Sucks for IMAP and
corp where mail stays.  Courier (an almost RFC IMAP server) and
Cyrus (an actual RFC IMAP server) store a message per file.
Sendmail's server stores 1 file even if it's delivered to 100 people
on the same box (handy when the sales person sends a 5 line, 2MB
powerpoint slide to everyone or when Sammy Student sends an MP3 of
himself in the shower yodelling Inna Gadda Davida to all his buddies
on campus).

A file/message means I can delete message 147 of 20584 with an unlink()
where version 7 boxes must be copied and rewritten.


EMAIL in transit or on the mail store is ALL about I/O.  A $30k computer
with $2k of software mirrored disks will be half the speed of a $4k
machine with a $18000 RAID box.  It's about I/O.  My favored RAID vendor
can WRITE at 50MB/s in real world usage.  Per RAID box (stripe them
across PCI busses for fun and more speed).

NETWORKS
At 100Mb/s, 100baseTX can give us files at about the speed of a 7200RPM
baracuda drive.  Maybe 8MB/s.  GigE is faster, but still not as fast as
local 15kRPM drives.  Not close.


Now you want to access the mailstore's data over a network.


REALITY:
Again, I deal with this a lot.
1) If your mailstore is down for an hour during production day or a couple
   at night, life goes on.  Really.  As long as its infrequent, people get
   cranky, but keep the torches and pitchforks away.
2) A machine that's not "joe bob's discount hut" machine is unlikely to fail.
   A machine with dual power supplies, ECC RAM and decent cooling might crash.
   It *might* fail in a year.  Not likely, but perhaps.

If, just if, that were to happen, with your mail and software on
external RAID....  If the machine is unrecoverable....  You plug
the RAID into the cold spare and turn it on.

If it takes 60 minutes during the day or a couple hours at night,
you're moving slowly.  And yet you meet acceptable downtimes.
Anyone can be trained to move a SCSI or FCAL plug and power up a
machine.


Your gain is a machine that you can hire about anyone to work on.
You're not playing games with shares and hoping you don't corrupt
them somehow.
You're mail server gets SPEED.  It appears to respond to users quickly.

A DL320 with external RAID will serve IMAP for 20,000 people ok.
I might want two if concurrency exceeded 8-10k people.

A spare DL320 will cost you $3000? ($2k?).  You can run straight unix on it.
Your mail is debuggable.  You'll live in a reality that's workable.