pkgsrc-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

PR/43388 CVS commit: pkgsrc/biology/phylip



The following reply was made to PR pkg/43388; it has been noted by GNATS.

From: OBATA Akio <obache%netbsd.org@localhost>
To: gnats-bugs%gnats.NetBSD.org@localhost
Cc: 
Subject: PR/43388 CVS commit: pkgsrc/biology/phylip
Date: Sat, 10 Jul 2010 11:26:33 +0000

 Module Name:   pkgsrc
 Committed By:  obache
 Date:          Sat Jul 10 11:26:32 UTC 2010
 
 Modified Files:
        pkgsrc/biology/phylip: Makefile PLIST distinfo
        pkgsrc/biology/phylip/patches: patch-aa
 
 Log Message:
 Update phylip to 3.69.
 Based on PR#43388 by Wen Heping.
 
  version 3.69 (September, 2009)
 
         * If there are more than about 50 species in the tree, Treedist can
          fail to compute distances among the trees. This is due to an overflow
          problem inadvertently introduced in version 3.68. There is no
          workaround with the 3.68 executable, but if you can recompile you can
          fix it by replacing line 1179 of treedist.c, which is currently
 
             maxgrp = pow(2,tip_count);
 
           by
 
             maxgrp = 100000;
 
           This is fixed in version 3.69. Versions prior to 3.68 will not have
          this problem.
         * In Dnacomp, Pars, and Dollop, if the Shimodaira-Hasegawa test is
          performed and there are trees perfectly tied with the best tree, the
          P values were incorrect (being 0 instead of 1).
         * A team from Iowa State University noticed that time was being wasted
          in calculations in Dnapenny in the bound calculations. This has now
          been remedied and it should be noticeably faster.
         * In the molecular likelihood programs, ancestral state probabilities
          were being incorrectly calculated for user trees that had internal
          multifurcations. This has been corrected.
 
 version 3.68 (August, 2008)
 
         * We received some reports that Dnaml was freezing on some data sets in
          the Windows executables. This seems to have been because of incorrect
          handling of small increases in the log-likelihood, causing the
          algorithm to fall into loops. It was temporarily cured in version 3.67
          by changing the compiler optimization level, downwards from -O3 to
          -O1. Now the underlying problem of small differences of log-likelihood
          has been addressed too, so you should use the new Windows executables
          (3.68) to avoid having these problems on Windows systems.
         * We found that the .DMG (disk image) archive for Mac OS X contained
          executables for the Intel Mac but not universal binaries that would
          work on both Intel Mac and PowerPC systems. Oops. We recompiled and
          reposted the archives (on 23 August 2007). They should work on both
          kinds of systems now.
         * We were told that on a Linux computer with a 64-bit Intel Itanium 
chip
          the bootstrapping program Seqboot creates blatantly wrong bootstrap
          samples with characters sampled too many times (or none). On a 64-bit
          AMD processor the program works fine. The problem is in the random
          number function "randum" in phylip.c. It seems to be a problem with
          optimization on the GCC compiler. It is cured by dropping the compiler
          optimization level from -O3 to -O2.
         * In Protdist the program would blow up if it computes a distance
          greater than 100.0. This is owing to a subscript error in the code
          that writes out the distances, in line 1874 where
 
                       else if (d[j][k] < 1000.0)
 
           should have been
 
                       else if (d[i][j-1] < 1000.0)
 
           If you have this problem and cannot upgrade to version 3.68 or
          recompile the program with this change, and your data comes from
          bootstrapping, try omitting just that replicate, or else rerunning
          the bootstrapping with a different random number seed (which might not
          happen to drop as many of the sites that caused these two sequences to
          be so distant).
         * When Dnadist is used and the lower-triangular output format is 
chosen,
          the resulting file has headers at the top of columns and is human-
          readable but is not machine readable. The (temporary) solution is not
          to use this option for the time being.
         * In Mac OS X, Drawgram produces some alarming lines of text at the top
          of its terminal window when it first runs. These are just scripting
          commands that were not erased because we do not clear the screen at
          the right moment. The workaround is simply to ignore these commands.
 
 version 3.67 (July, 2007)
 
         * We had our first reports on the behavior of PHYLIP Windows 
executables
          on Windows Vista. The programs work fine. The only thing that did not
          work is the self-extraction program that unpacks the archives. For
          some reason it did not work on Vista. The work-around was that, after
          you got an archive file like phylipwx.exe onto your system, you had to
          change the file extension from "exe" to "zip". Then you had to click
          on the file. You were presented with options including "Extract all
          files". If you chose that the archive was unpacked. The programs would
          then work. Although we provided "zip" archive versions of the package,
          we have now got a new version of WinZip which is supposed to have a
          self-extractor that works on Windows Vista, and it was used to produce
          the self-extracting archive since 27 August 2007.
         * On Mac OS X systems, if our distributed executables are placed in a
          folder whose path contains a name with an internal blank, such as
          /Users/ianr/the files/ then the script that causes each of our
          programs to run when you click on the corresponding icon does not
          work, and there is an error message. This is a scripting error in our
          Mac OS X setup, and it was corrected in version 3.67. In the meantime,
          if you have this problem, the solution is to put PHYLIP in a folder
          whose path does not have any folder that has a blank in its name. In
          the above example, all that would be necessary is to rename the folder
          the files to the_files
         * We are still getting reports of stickiness of the tree, and
          occasionally of negative branch lengths, in Dnamlk and Promlk which
          do not do as good a job of searching for best trees as they should.
          This has turned out to be an issue of nodes getting stuck when they
          collide in moving them on the "time" scale. Some major changes were in
          the code in the 3.67 release to eliminate this stickiness and give a
          good search.
         * An error was made in putting together the matrices for the PAM
          mutation model in Protdist, Proml, and Promlk. These programs will
          give PAM calculations inconsistent with earlier (v3.65 and before)
          versions, and with other programs. The matrices were corrected in
          version 3.67. This does not affect JTT or PMB models.
         * The W (within-species varation) option of CONTRAST uses somewhat
          incorrect equations to infer within-species covariances and
          phylogenetic covariances. These were corrected in version 3.67.
          Anyone severely impacted by the problem in the meantime should contact
          me.
         * Protdist sometimes results in distances greater than or equal to
          100.000. When this happens, the distance can run together with the
          previous number in the output file. For example, a distance of 0.31766
          followed by one which is 127.43986 might look like this:
          "0.31766127.43986". This causes trouble in any program that tries to
          use this distance matrix. One symptom of this may be the program
          reporting that two distances which are expected to be equal are
          unequal -- but then printing them both out, and they appear to be
          equal! In this case it would print out a message warning you that
          0.31766 was not equal to 0.31766. It is doing so because one of them
          is actually seen by it as 0.31766127 and the other 0.31766. In all
          future versions, there will be a blank printed between the two
          numbers. For the present, use an editor to find them and insert the
          blank by hand. If this is difficult, a Sed script (which can be used
          on Linux or Unix machines) has been written by Doug Scofield, and is
          available from him at: this link. Many thanks to him for this. As you
          can see, this problem is the result of us not thinking of what happens
          when the distances are big, and the fix in the code is trivial -- just
          ensuring that there is at least one blank between successive
          distances.
         * Contml, with gene frequencies, has a bug in the transformation to
          variables that have approximate Brownian motion as their evolutionary
          process. This can lead to wierd trees. It might be preferable to go
          back to the 3.5c version if you need to use Contml for this. We
          believe that this will be correctly fixed in the 3.67 version. If
          people can recompile the source code, they replace the function
          transformgfs with this one and recompile (you should be able to save
          it from your browser using the Save As choice in its File menu.
 
 version 3.66 (August, 2006)
 
         * Program Treedist was found to compute the Branch Score Distance
          incorrectly. It will, in most cases, get the branch lengths in
          terminal branches incorrect and then be likely to find a nonzero
          distance between trees when they are really identical, and incorrect
          distances when they are not identical. Alas, there is no workaround to
          avoid this. All distances done with this option before version 3.66
          should be regarded as incorrect unless all terminal branches have the
          same length, or unless the order of species in the tree is the same as
          in the first tree in the file. The Symmetric Difference option, which
          does not use branch lengths, works properly.
         * Program Dnamlk, when run on Linux or Windows systems, sometimes gave
          negative branch lengths for some branches on the tree. This is bad.
          Although we at first thought that this was a compiler bug, it seems to
          be a lack of initialization of some pointers. Program Promlk may have
          the same problem, as they share code. If you have this problem you can
          work around it by not using the Global menu option when running Dnamlk
          (or Promlk). If you need more extensive tree search the J (Jumble)
          option may be your best bet.
         * On Windows (at least, on Windows xp), our executables for version 
3.65
          produce output files (outfile) and output tree files (outtree) that
          have end-of-line characters that result in their being hard to read on
          the Notepad editor. They appear as one big line. If you use the
          Wordpad editor, or Microsoft Word itself, the files will be readable.
          This is and end-of-line compiler setting we got wrong when compiling
          the programs.
         * Programs Dnaml and Proml sometimes failed to iterate branch lengths 
in
          trees enough -- this can result in them failing to find as good a tree
          as the molecular clock versions Dnamlk and Promlk, a phenomenon that
          is not supposed to occur. The problem results from the iteration code
          in function makenewv giving up too easily when branch lengths are very
          short. The resulting branches get "stuck" at length 0 when they should
          not. If you can recompile the programs, the problem can be solved by
          the following changes:
               o In file phylip.h change the value of the constant iterations to
                8 instead of 4.
               o In files dnaml.c and proml.c, change function makenewv to
                replace
 
                    done = fabs(y-yold) < epsilon;
 
                 by
 
                    done = fabs(y-yold) < 0.1*epsilon;
 
               o In dnaml.c, in function makenewv, also replace*
 
                      if (yold < epsilon)
                         yold = epsilon;
 
                 by
 
                      if (y < epsilon)
                         y = epsilon;
 
           We think these fix the problem. Some more thorough fixes are
          implemented in the 3.66 code.
         * The Mac OS X archives (in .dmg form) appeared at first sight not to
          have any executables directory in the package. This is owing to
          strange placement of icons once we package the files. The OS X
          executables are there -- their folder is just way down the window. Use
          the scroll bar to look for them. You should be able to use the
          View/Rearrange menus to make the folder icons appear in a more
          reasonable place. (Or this can be done once all of the contents of the
          .dmg archive are copied out to another folder).
         * Programs Dnaml and Proml (but not Dnamlk or Promlk), from version 
3.64
          on, crashed if the Categories (C) option is used, even if all
          categories are given the same rate of change. This unpleasant behavior
          does not occur if the menu option for "Speedier but rougher analysis"
          is changed to "No, not rough". That slows down the run but allows it
          to succeed.
 
           The fix turns out to be that all instances in dnaml.c of calls to
          function copynode (or all instances in proml.c of calls to
          prot_copynode) that involve an argument lrsaves should have the third
          argument be rcategs instead of categs.
         * In Seqboot, when menu item J is set to Permute species within
          characters it is impossible to change menu item W (character weights).
          This is a glitch in the menuing code. If you can change the source
          code and recompile, change at line 215 of seqboot.c:
 
                   ((permute || ild || lockhart)
                     && (strchr("ACDEFSJPRXNI%1.20",ch) != NULL)) ||
           to be:
                   (permute && (strchr("ACDEFSJPRWXNI%1.20",ch) != NULL)) ||
                   ((ild || lockhart) && (strchr("ACDEFSJPRXNI%1.20",ch) != 
NULL)) ||
 
           If you are stuck with our executables and need this feature, you can
          also work around it in the following devious way:
              1. Set menu item J to some other setting where menu item W appears
                in the menu, such as Bootstrap,
              2. Change menu item W
              3. Then change item J to Permute species within characters
              4. Our Makefile for Unix had some problem finding some of the
                X-windows libraries on Mac OS X systems on Intel Macs. This
                prevented the compilation of Drawtree and Drawgram. You might
                have had to use those two programs by using their PowerMac Mac
                OS X executables. All the other programs did compile and run
                correctly on Intel Macs.
 
 version 3.65 (August, 2005)
 
         * Protpars sometimes gave the result "0 trees found" or else simply
          hung and did not complete its run. This was a bug. The program should
          always get at least one tree -- if it does not, that is a bug and not
          a judgement on your data, provided the data file is in our format!
         * Proml and Restml, and maybe some others, seg-faulted when run on
          enough multiple data sets, as in bootstrapping. If you have a version
          that has this problem and can recompile the programs, here is a fix
          for Proml and Restml. In function "inputdata", replace the lines
 
             makeweights();
             if ( firstset ) alloclrsaves();
             else resetlrsaves();
 
           by
 
             if ( !firstset ) freelrsaves();
             makeweights();
             alloclrsaves();
 
           and you can also eliminate the now-unnecessary function 
"restlrsaves".
          (Thanks to Jacques Rougemont for this).
 
 version 3.64 (July, 2005)
 
         * Treedist had trouble on Windows systems reading trees. This was due 
to
          problems with the ftell command on CygWin. It has been fixed by having
          the files read as binary files.
         * Trees with branch lengths compared using Treedist may have incorrect
          distances when evaluated as unrooted trees, owing to miscalculation of
          branch lengths for the bottommost branches.
         * Runs of Seqboot on Mac OS X systems with gene frequencies data have
          showed incorrect results -- wrong numbers of loci sampled, for
          example. This is due to bad code generated by the Metrowerks
          Codewarrior compiler when set to higher levels of optimization (our
          source code is OK). We will recompile the program at a lower level of
          optimization in the next bug-fixing release. If you can follow our
          compiling instructions and have this compiler, you can produce a
          correctly working executable. Alternatively you can use the gcc
          compiler and use our Unix Makefile to recompile this program (by
          typing "make seqboot"). This is quite easy to do and all Mac OS X
          releases have the gcc compiler in them -- it only needs to be
          installed.
         * In runs of Proml, Dnaml or Restml with user trees, if one puts in a
          user tree with an internal multifurcation and asks the program to re-
          estimate the branch lengths for that tree, the branch lengths in only
          two of the furcs will be re-estimated if they already have branch
          lengths. This is due to a bug in the function "initrav" causing it to
          fail to enter one or more of the subtrees. A workaround until the next
          release is as follows: Use Retree to remove all branch lengths on the
          tree. The tree's branch lengths will then all be re-estimated when it
          is used as a user tree.
         * The example output in the Treedist documentation gives distances
          computed by version 3.62 or earlier, in which the tree distance is not
          square-rooted.
 
 version 3.63 (December, 2004)
 
         * The DNA and protein likelihood programs could have problems with
          underflow if very large numbers of sequences were analyzed. Underflow
          protection code was needed to make this much less likely to happen.
         * A number of programs had the problem that when M (multiple data set)
          runs are done, if the data sets differ in the number of characters
          from data set to data set, they only allocate enough memory for the
          first data set, and then can crash on subsequent, larger, data sets.
          For bootstrap and permutation runs this should not be a problem, but
          for jackknife runs it might be. One work-around until we fixed this
          was to move the data set with the most characters to the front, so
          that enough space is allocated. The programs we think had this problem
          are: Clique, Dnacomp, Proml, Promlk, Protdist, Dollop, Gendist, Pars,
          Restml, and Restdist.
         * When the Branch Score distances are computed in program Treedist, the
          sum of squares of differences between branches was not square-rooted,
          as the documentation web page says it is.
         * Fitch and Contml may die when asked to do Jumbling, in some cases.
         * Dnaml had inconsistencies in results when branch lengths of a user
          tree were estimated, and when the same numbers were provided in the
          user tree.
         * Trees fed into Contrast could cause trouble if they contained
          unifurcations (forks with only one descendant). The program did not
          complain about this, as it should have.
         * End-of-line characters in input files in certain cases caused trouble
          in Mac OS X (for example when the files came over from Windows).
         * When printing a rooted tree out in Kitsch, the root was not placed
          intermediate between its two decsendants.
         * The variable numtrees was sometimes used when still uninitialized in
          Pars.
         * Restdist had a site-aliasing bookkeeping bug that could lead to
          incorrect results.
         * Restml would not allow site lengths greater than 8, because an array
          was of fixed size when it should have been dynamically allocated.
         * The variable name howmany conflicts with predefined names in some
          older Sun compilers. It will henceforth be deliberately misspelled to
          avoid this.
         * With larger data sets being analyzed, Proml, Promlk, Dnaml, and
          Dnamlk have had to have underflow protection code installed, as
          likelihoods were getting too small.
         * Treedist was giving wrong answers when asked to compute all distances
          between trees in two files that had unequal numbers of trees. This
          was a bookkeeping error.
         * The variable scanned was uninitialized in the Drawtree and Drawgram
          programs, which could sometimes cause problems.
         * The lack of initialization of a variable, delta in Dnadist meant that
          different results could be obtained from interactive runs than were
          obtained in runs under the control of a command file.
         * Dnadist was sometimes stopping when encountering sequences that had
          an infinite or indeterminate distance (i.e. when the sequences were
          too different or when they had no sites in common), when it should
          have printed out "-1" and continued. When it was supposed to print
          "-1" in some recent versions of PHYLIP it printed "1.0000" instead.
 
 version 3.62 (September, 2004)
 
         * The ftp link used by our "Get Me PHYLIP" page to fetch the version
          3.62 Linux gzip'ed sources and documentation archive was incorrect
          until recently (I hadn't updated it to fetch version 3.62). If you had
          trouble fetching this archive in version 3.62, please try one more
          time. It will work now.
         * A number of people have found, with Fitch and with Contml, that
          version 3.61 crashes on multiple Jumbling (option J) or on bootstrap
          runs. This is fairly serious. It does not happen with versions of
          these programs earlier than 3.6 (such as 3.6a3 or 3.573c). This
          release fixes these problems.
 
 
 To generate a diff of this commit:
 cvs rdiff -u -r1.22 -r1.23 pkgsrc/biology/phylip/Makefile
 cvs rdiff -u -r1.6 -r1.7 pkgsrc/biology/phylip/PLIST
 cvs rdiff -u -r1.5 -r1.6 pkgsrc/biology/phylip/distinfo
 cvs rdiff -u -r1.2 -r1.3 pkgsrc/biology/phylip/patches/patch-aa
 
 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.
 


Home | Main Index | Thread Index | Old Index