Subject: Re: [Linux-HA] Integrating OCF framework w/ (Net|Free)BSD rc.d
To: General Linux-HA mailing list <linux-ha@lists.linux-ha.org>
From: Alan Robertson <alanr@unix.sh>
List: tech-userlevel
Date: 05/18/2006 23:10:08
Brian A. Seklecki wrote:
> 
> What is OCF? The extensions required to make any RC script register a 
> system service as a Cluster Resource in the Linux-HA infrastructure.
> 
> For those of you unfamiliar with OCF, please refer to the draft standard 
> at: 
> http://www.opencf.org/cgi-bin/viewcvs.cgi/specs/ra/resource-agent-api.txt?rev=HEAD 
> 
> http://linux-ha.org/HeartbeatResourceAgent
> http://linux-ha.org/LSBResourceAgent
> http://linux-ha.org/OCFResourceAgent
> http://linux-ha.org/ResourceAgentSpecs
> 
> Fortunately, our rc.d system infrastructure is sufficiently extensible 
> in nature to easily mitigate the need for duplicate OCF script coding 
> efforts by Port maintainers.  Existing in-tree and Ports-provided rc.d/ 
> compliant scripts can be extended with very little effort.
> 
> However some discussion would be prudent as to how to best accomplish 
> the task most conducive
> 
> ==
> 
> *) Exit codes
> 
> We deviate slightly from LSB, as so does LSB from OCF.
> 
> Example: When we execute a "stop" command to run_rc_command() on an 
> already stopped service, we return code "1", while OCF/LSB calls for 
> code 3.

Actually, both call for code 0.  This is taken care of by some other 
language elsewhere for this specific case.  There are no known cases 
where the LSB defines something that the OCF deviates from.  This was a 
design goal.  My guess it's a documentation problem - or other 
misunderstanding - or a bug.  Can you point out a specific example?

> We could put a conditional check around the return/exit 
> statement, however OCF "compatibility mode" would need to be a variable 
> we source in from the service's RC script.  like 'checkyesno 
> $ofc_compat' at line 691 in rc.subr
> 
> There are other adjustments to exit codes that would so need to be made 
> (see the standards doc)
> 
> *) Environmental variables for per-instance services
> 
> Currently we don't have a uniform method for dealing with multiple 
> instances of services.  The Apache2 method is nice ("Profiles"); -- 
> "apache2.sh [instance] [argument]" syntax.  OCF scripts expect to be 
> differentiate using exported environmental variables from the calling 
> application: OCF_RESKEY_*, which is not an issue.
> 
> *) Additional arguments
> 
> The spec calls for additional arguments "monitor" (to augment the 
> existing "status") as well as "metadata" (to describe the service using 
> an XML DTD) and an optional "validate-all".
> 
> These can be very easily implemented using a $extra_commands="".
> 
>   extra_commands="monitor metadata"
> 
>   monitor_cmd="slapd_monitor"
>   metadata_cmd="slapd_metadata"
> 
> Note: commands with hyphens (or function names) in them do not seem to 
> be honored.  'monitor' should hypothetically be an intelligent service 
> check, more than just checking if a TCP socket is open but also if the 
> service is healthy -- which implies calling/exec'ing an outside program, 
> like a Nagios health check.  Something that generates dynamic input and 
> checks it against generated output.

And OCF supports multiple levels of checks.  You can define a "shallow" 
check and a deep check, and a really really deep check.

With Linux-HA, you can schedule the shallow check every 5 seconds, the 
deep check every 5 minutes and the really really deep check once a day.

> Note: None of the included OCF examples do truely objective testing yet! 
> (Sorry guys, systems that don't permitted fragmented packets to the 
> network broadcast address aren't going to run Apache's mod_status by 
> default }:> ).  We can leave the $servicename_monitor() symantics up to 
> the port maintainer, plus the end user can always do more aggressive 
> checking.

I'm missing something regarding apache here.  You may have to help me in 
my ignorance.  Maybe use single syllable words?  ;-)

[regarding doing things better: patches are being accepted ;-)]

> However, our current "status" routine checks for the existence of a PID 
> file and cross-references it against the process list (which is a more 
> extensive than most other RC systems).  Given that, as a temporary fix, 
> "monitor" can be easily mapped to "status" using a small code snippit:


We already do this for the 'lsb' style init scripts.   This is all done 
through plugins.  Maybe you need a FreeBSD plugin (in lieu of the "lsb" 
plugin)?

> As for the "metadata" and "validate-all" routines, the XML DTD simply 
> helps describe the command line and environmental variable arguments 
> valid for the OCF service script, so we might be able to develop a 
> reusable set of routines for generating the output without any crazy 
> dependencies on libXML.

We just do cats from here documents.  You certainly don't need libxml to 
generate fixed content (constant) XML.

> The trickier question arises when we want to start making in-tree rc.d/ 
> scripts OFC compliant/compatible (nfsd, named, inetd, sendmail, ntpd, 
> etc. come to mind)
> 
> I'm interested in any discussion / thoughts on a strategy or apporach 
> for coding OCF compatibility / integration into our rc.d/ system

I like the idea that you're discussing this and taking it semi-seriously.

I think your choices might include:
  a) write a "bsd" RA plugin (reasonably easy)
  b) adopt OCF for your scripting (moderately difficult - esp.
                                   politically)
  c) define Yet Another RC standard to make things better
  d) adopt LSB conventions for your scripts (very unlikely I would guess)


-- 
     Alan Robertson <alanr@unix.sh>

"Openness is the foundation and preservative of friendship...  Let me 
claim from you at all times your undisguised opinions." - William 
Wilberforce