tech-userlevel archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

libcodecs(3), take 2



Many thanks to everyone for the feedback, both on and off list.

I've taken everything on board, and made the following changes:

        peer review changes
        + name change to libcodecs(3) and codecs(1)
        + name changes for some external functions
        + type of output array in allocated space codec changed
          to (arguably more correct) void **

        other changes
        + autoconf glue
        + c++ guard in codecs.h
        + new bz2 compression added
        + 64bit original size in gzip compression now in network-order 
        + added endianness runtime indication -- I'm still in 2 minds
          about this, and may rip it out again
        + various bugs fixed

The new archive is in

        http://www.netbsd.org/~agc/codecs-20100920.tar.gz

and I've attached the libcodecs(3) library man page.

Once again, all feedback gratefully received.

Many thanks,
Alistair
LIBCODECS(3)            NetBSD Library Functions Manual           LIBCODECS(3)

NNAAMMEE
     lliibbccooddeeccss -- string coding and decoding functions for 
transforming data

LLIIBBRRAARRYY
     library ``libcodecs''

SSYYNNOOPPSSIISS
     ##iinncclluuddee <<ccooddeeccss..hh>>

     _i_n_t
     ccooddeeccss__ttrraannssffoorrmm(_c_o_d_e_c_s___t 
_*_c_o_d_e_c_s, _c_o_n_s_t _c_h_a_r _*_i_n, _c_o_n_s_t 
_s_i_z_e___t _i_n_s_i_z_e,
         _c_o_n_s_t _c_h_a_r _*_o_p_e_r_a_t_i_o_n, 
_v_o_i_d _*_o_u_t, _s_i_z_e___t _o_u_t_s_i_z_e);

     _i_n_t
     
ccooddeeccss__aalllloocc__ttrraannssffoorrmm(_c_o_d_e_c_s___t
 _*_c_o_d_e_c_s, _c_o_n_s_t _c_h_a_r _*_i_n,
         _c_o_n_s_t _s_i_z_e___t _i_n_s_i_z_e, _c_o_n_s_t 
_c_h_a_r _*_o_p_e_r_a_t_i_o_n, _v_o_i_d _*_*_o_u_t_p,
         _s_i_z_e___t _*_o_u_t_s_i_z_e);

     _i_n_t
     
ccooddeeccss__iinnppllaaccee__ttrraannssffoorrmm(_c_o_d_e_c_s___t
 _*_c_o_d_e_c_s, _v_o_i_d _*_i_n_p_u_t, _i_n_t _s_i_z_e,
         _c_o_n_s_t _c_h_a_r _*_o_p_e_r_a_t_i_o_n);

     _i_n_t
     ccooddeeccss__ssiizzee(_c_o_d_e_c_s___t 
_*_c_o_d_e_c_s, _c_o_n_s_t _c_h_a_r 
_*_o_p_e_r_a_t_i_o_n,
         _c_o_n_s_t _u_n_s_i_g_n_e_d _i_n_s_i_z_e);

     _i_n_t
     
ccooddeeccss__iinnppuutt__nneeeeddeedd(_c_o_d_e_c_s___t
 _*_c_o_d_e_c_s, _c_o_n_s_t _c_h_a_r 
_*_o_p_e_r_a_t_i_o_n);

     _i_n_t
     ccooddeeccss__bbeeggiinn(_c_o_d_e_c_s___t 
_*_c_o_d_e_c_s, _c_o_n_s_t _c_h_a_r _*_s_u_b_s_e_t, 
_._._.);

     _i_n_t
     ccooddeeccss__lloocckkddoowwnn(_c_o_d_e_c_s___t 
_*_c_o_d_e_c_s);

     _i_n_t
     ccooddeeccss__aadddd(_c_o_d_e_c_s___t 
_*_c_o_d_e_c_s, _c_o_n_s_t _c_h_a_r 
_*_o_p_e_r_a_t_i_o_n,
         _i_n_t _(_*_)_(_c_o_n_s_t _c_h_a_r _*_, 
_c_o_n_s_t _s_i_z_e___t_, _c_o_n_s_t _c_h_a_r _*_, 
_v_o_i_d _*_, _s_i_z_e___t_),
         _c_o_n_s_t _c_h_a_r _*_m_u_l_t_i_p_l_i_e_r, 
_c_o_n_s_t _i_n_t _i_n_p_u_t___n_e_e_d_e_d);

DDEESSCCRRIIPPTTIIOONN
     lliibbccooddeeccss is a library interface which implements 
various transformations
     from input data to output data.  Text is transformed by the 
lliibbccooddeeccss
     library, converting the input to the output format.  New transformations
     can be added to the table.  The table can also be locked to prevent fur-
     ther transformations being added.  A lot of these transformations are
     available at the system level already.  However, 
lliibbccooddeeccss provides a
     single, consistent interface to the transformations, in a way that is
     easy to provide as an interface for scripting languages and from the
     shell.

     The basic way of using the lliibbccooddeeccss library is to call 
the ccooddeecc() func-
     tion to transform the text.  Two alternate functions are provided,
     aaccooddeecc() which will dynamically allocate the space for the 
output array
     using calloc(3).  In-place transformations can be made using the
     iippccooddeecc() function.  An ``in-place'' transformation means 
that the trans-
     formation will be done using temporary storage which is allocated, and
     then the transformed text will be copied over the original input, thereby
     making the operation appear to have transformed the text in situ.

     The transformation table holding information on all the possible trans-
     formations can be initialised using the 
ccooddeeccss__bbeeggiinn() function.  The
     function can be used to limit the transformations which get loaded into
     the transformation table.  At the present time, the following subsets of
     transformations are defined:

     all      will load all the following subsets of transformations

     charset  will load all the transformations relating to character sets,
              including base64 and base85, EBCDIC, RAD50, etc.

     digest   will load all the transformations relating to message digests,
              including md5, sha1, etc

     fill     will load all the transformations relating to region fill,
              including zero and randomise

     format   will load all the transformations relating to formatting of out-
              put, such as hexadecimal dumping, rotation, etc

     edit     will load all the transformations relating to editing of output,
              such as sed and edit functionality

     hash     will load all the transformations relating to 32bit hashing.

     network  will load all the transformations relating to network name reso-
              lution

     It is not necessary to call this function prior to using any of the func-
     tionality in the lliibbccooddeeccss library -- if the table has 
not been ini-
     tialised by the time of the first call, then it will be called automati-
     cally.

     The internal transformation information carries information on the worst-
     case size of the output array.  This size can be calculated using the
     ccooddeeccss__ssiizzee() function, passing into the function 
the size of the input
     buffer.  The ccooddeeccss__iinnppuutt__nneeeeddeedd() 
function will return an indication
     whether an input buffer is needed.  Please note that an input buffer is
     needed for the iippccooddeecc() ``in-place'' transformation call.  
The
     ccooddeeccss__vvaalliidd__oopp() function is used to verify 
that the current operation
     is a known transformation.

     There are a number of pre-defined transformations provided:

     asa          [format] perform Fortran control character transformations
                  in the form of the POSIX asa(1) command.

     ascii2ebcdic
                  [charset] convert the input from ASCII character encodings
                  to EBCDIC character encodings.

     base64decode
                  [charset] perform atob, or base64, decoding.  Each sequence
                  of 4 bytes is transformed back into a 3 byte sequence.

     base64encode
                  [charset] perform atob, or base64, encoding.  Each sequence
                  of 3 bytes is transformed into a 4 byte sequence from the
                  pre-defined 64-byte set.

     base85decode
                  [charset] perform base85 decoding.  Each sequence of 5 bytes
                  is transformed back into a 4 byte sequence.

     base85encode
                  [charset] perform base85 encoding.  Each sequence of 4 bytes
                  is transformed into a 5 byte sequence from the pre-defined
                  85-byte set.

     bin2hex      [charset] encodes the input string as 4-character C-string
                  style hexadecimal constants.

     bswap16      [format] perform a bytewise swap of the 16-bit entity

     bswap32      [format] perform a bytewise swap of the 32-bit entity

     bswap64      [format] perform a bytewise swap of the 64-bit entity

     dos2unix     [format] DOS style line-endings are transformed into Unix
                  style line-endings.

     ebcdic2ascii
                  [charset] convert the input from EBCDIC character encodings
                  to ASCII character encodings.

     edit         [edit] edit the input text with the ``EDITOR'' or ``VISUAL''
                  editor, as defined in the environment.

     from-uri     [charset] convert from a percent-encoded URI to ASCII text.

     full-uuencode
                  [charset] convert the given text into uuencoded text (see
                  also the uuencode and uudecode transforms), adding a file
                  header and trailer.

     gethostinfo  [network] attempt to resolve the hostname, given the IP
                  address (either IPv4 or IPv6) as input.

     getipaddress
                  [network] attempt to reverse resolve the IP address (both
                  IPv4 and IPv6) given the hostname as input.

     gunzip       [compress] decompress the input buffer using zlib(3)

     gzip         [compress] compress the input buffer using zlib(3)

     hex2bin      [charset] decodes the input string from 4-character C-string
                  style hexadecimal constants to binary output.

     hexdump      [format] converts the input text to an ASCII-clean hexadeci-
                  mal dump format, including a printable representation of the
                  input text.

     md5          [digest] calculate the MD5 digest using MD5_Data(3)

     metaphone    [charset] calculate the metaphone phonetic value for the
                  input.

     rad50decode  [charset] converts the input text from DEC RADIX-50 format
                  to the original text. Due to the limited range of the
                  RADIX-50 character set, some of the original text may have
                  been lost.

     rad50encode  [charset] converts the input text to DEC RADIX-50 format
                  from the original text. Due to the limited range of the
                  RADIX-50 character set, some of the original text may have
                  been lost.

     randomise    [fill] fill the output with random values.

     rmd160       [digest] calculate the RMD160 digest using RMD160_Data(3)

     rot          [format] transform the input text with a circular rotation.
                  The most famous of these is the Caesar rot13(6) transforma-
                  tion, but this transformation allows any length of rotation
                  to be used.

     secs2str     [format] transforms the input value (as the ASCII-encoded
                  decimal value of seconds since the start of the epoch) to a
                  colon-separated value representing the date.

     sed          [edit] performs a sed(1) transformation on a regular expres-
                  sion. Please note that full, extended regular expressions,
                  as defined in re_format(7) are used to match.

     size         [digest] returns the size of the input as a decimal string

     sha1         [digest] calculate the SHA1 digest using SHA1Data(3)

     sha256       [digest] calculate the SHA256 digest using SHA256_Data(3)

     sha512       [digest] calculate the SHA512 digest using SHA512_Data(3)

     soundex      [charset] calculate the soundex phonetic value for the
                  input.

     str2secs     [format] transforms the input value (as the colon-separated
                  value representing the date) to an ASCII-encoded decimal
                  value representing seconds since the start of the epoch.

     strunvis     [charset] uses the unstrvis(3) transformation on the input
                  data.

     strvis       [charset] uses the strvis(3) transformation on the input
                  data.

     strvisc      [charset] uses the strvisc(3) transformation on the input
                  data.

     substring    [edit] extract a substring of the input string, and place it
                  in the output string.

     to-uri       [charset] convert from a percent-encoded URI to ASCII text.

     to-lower     [charset] change any uppercase letters in the input string
                  to lowercase.

     to-unicode   [charset] translate to unicode-16 from UTF-8

     to-upper     [charset] change any lowercase letters in the input string
                  to uppercase.

     to-utf8      [charset] translate from unicode-16 to UTF-8

     unix2dos     [charset] the Unix-style line-endings are converted to DOS
                  style line-endings.

     uudecode     [charset] transform the input text from uudecode(1) text to
                  the original text.

     uuencode     [charset] encode the input text as uuencode(1) text.

     zero         [fill] produce an area containing NUL bytes in the output.

     A number of hash functions have also been implemented, namely:

     dumbhash       [hash] implements a simple hashing scheme based on the
                    addition of the value of each character in the string.

     dumbmulhash    [hash] implements a simple hashing scheme based on the
                    addition of the value of each character in the string mul-
                    tiplied by its position in the string.

     lennart        [hash] implements a simple and fast generic string hasher
                    based on Peter K. Pearson's article in CACM 33-6, pp. 677.

     crchash        [hash] implements a hash used in CRC calculations

     perlhash       [hash] implements the addition-based hash algorithm used
                    internally in the perl interpreter.

     perlxorhash    [hash] implements the XOR-based hash algorithm used inter-
                    nally in the perl interpreter.

     pythonhash     [hash] implements the hash algorithm used internally in
                    the python interpreter.

     mousehash      [hash] implements an XOR-based hash algorithm from der
                    Mouse.

     bernstein      [hash] implements a multiplicative-based hash algorithm
                    from Daniel Bernstein.

     honeyman       [hash] implements an XOR-based hash algorithm from Peter
                    Honeyman.

     pjwhash        [hash] implements the so called `hashpjw' function by P.J.
                    Weinberger from Aho/Sethi/Ullman, COMPILERS: Principles,
                    Techniques and Tools, 1986, 1987 Bell Telephone Laborato-
                    ries, Inc.

     bobhash        [hash] implements another, more complex hash algorithm.

     torekhash      [hash] implements a hash algorithm due to Chris Torek, and
                    using Duff's device.

     byacchash      [hash] implements the hash function found in Berkeley
                    byacc(1) program

     tclhash        [hash] implements the hash algorithm used internally in
                    the tcl interpreter.

     gawkhash       [hash] implements the hash algorithm used internally in
                    the gawk interpreter, also using Duff's device.

     gcc3_hash      [hash] implements one of the hash algorithms found in gcc3

     gcc3_hash2     [hash] implements another of the hash algorithms found in
                    gcc3

     nemhash        [hash] implements another hash function

SSEEEE AALLSSOO
     asa(1), sed(1), uudecode(1), uuencode(1), calloc(3), MD5Data(3),
     RMD160Data(3), SHA1Data(3), SHA256_Data(3), SHA512_Data(3), strvis(3),
     strvisc(3), unstrvis(3), zlib(3), rot13(6), re_format(7)

HHIISSTTOORRYY
     The lliibbccooddeeccss library first appeared in NetBSD 6.0.

AAUUTTHHOORRSS
     Alistair Crooks <agc%NetBSD.org@localhost>

NetBSD 5.0                    September 18, 2010                    NetBSD 5.0


Home | Main Index | Thread Index | Old Index