2019.04.26 Moved Windows-specific code from src/cluster.c to windows/library.c. Fixed a memory leak in X11/gui.c. Expanded doctests in python/__init__.py. Fixed a bug in the treecluster routine in python/clustermodule.c that caused the weight argument to be checked if the distance matrix was provided. Cleaned up formatting in python/test/test_Cluster.py. 2018.11.22 Fixed memory leaks in kcluster and treecluster in src/cluster.c, and in Hierarchical and KMeans in src/command.c; these memory leaks occurred when memory allocation failed. Fixed a bug in the distancematrix function Bio/Cluster/__init__.py, where the weight argument was not properly initialized if it had the default value None, causing the function to fail. 2018.04.29 In the Python wrapper python/clustermodule.c, use PyTree_new to initialize Tree objects instead of PyTree_init, as Tree objects are immutable. Check if the argument passed are nodes by using PyType_IsSubtype instead of comparing against PyNodeType directly, as Node in __init__.py is a subtype of Node in clustermodule.c. Rewrote indexing code for PyTree for Python3. Fixed a bug in Tree.cut in Bio/Cluster/__init__.py, which calculated the indices but did not return them. Updated the documentation for Tree.cut and Tree.sort in Bio/Cluster/__init__.py. Added tests for Tree.cut. Updated the documentation for Pycluster. Fixed a bug in src/cluster.c where a NULL pointer was freed if memory allocation failed. 2018.04.14 The calculation of the Spearman correlation and Kendall's tau now take the values of the weight array into account (previously, the weights were ignored for the Spearman correlation and Kendall's tau). Let cuttree in the C Clustering Library return an int to be able to indicate memory allocation problems. The distancematrix function in the C Clustering Library now requires storage for the distance matrix to be allocated before calling distancematrix. The Python wrappers python/clustermodule.c, python/__init__.py were rewritten to allow compilation without requiring numpy; numpy is needed at runtime only. 2017.09.23 Fixed the kmedoids function, which may count the number of solutions incorrectly (and report an error message) if the initial clustering is given by the user. Using the C11-standard definition of "int main", where argv is not a const char* array. Save the Interface Builder files for the Mac version in XIB format. Update Controller.m in the Mac version to avoid warnings converting Objective-C strings to plain C strings. 2017.08.19 Include Michael Eisen's original demo.txt example data in the distribution. For k-means clustering, show the number of solutions found separately for genes and arrays. Let cuttree number the clusters incrementally in the left-to-right order of the leaves in the hierarchical clustering tree. Algorithm::Cluster::cuttree now returns an array. Removed mean, median from the Python tests. Updates for Unicode handling in Python3. Fix slice bug in Pycluster's Tree class. Write a new function sorttree that reorders a hierarchical clustering tree such that the nodes are ordered according to a user-specified order, while maintaining the structure of the hierarchical clustering tree. The sorttree function is used in the HierarchicalCluster function, and can also be called from Pycluster and from Algorithm::Cluster. 2014.10.11 Show the number of solutions found in kcluster in the GUI programs. Fixed a bug in X11/gui.c in which the result of fclose was checked instead of the result of Save, causing spurious error messages. Fixed a bug in python/clustermodule.c that caused integer overflows for large data sizes. Fixed Unicode issues for single characters. Removed the mean, median tests for Pycluster. Fixed a bug in command.c that caused meaningless files to be written if no clustering was requested. 2013.08.03 Added checks for all calls to malloc in src/data.c, and corresponding error messages in the GUI and command line programs. Using MinGW instead of Cygwin to create the installer for Windows. Using productbuild instead of packagemaker to create the installer for Mac. Removed the mean and median functions from Pycluster, as anyway these are available from Numerical Python. This also avoids problems with deprecated API usage. Added checks for calls to malloc in Algorithm::Cluster. 2010.12.27 Modified Pycluster/Bio.Cluster to be able to compile and run with Python 3. Changed the number of passes in perl/t/14_kmedoids to avoid spurious test failures. Some minor updates in the documentation. 2010.04.05 Remove the #include's of stdio.h where they are not needed (perl/Cluster.xs, src/cluster.c). Remove the tests on the eigenvectors calculated for PCA when the corresponding eigenvalue is zero, since those eigenvectors are strongly affected by roundoff error, and are not relevant for PCA anyway. 2010.03.29 Rewrote the code for Principal Component Analysis and made it accessible from Python, Perl, and the GUI and command line programs. The previous code for the singular value decomposition is no longer available. Fixed a bug in src/data.c, in which sizeof(char**) was used instead of sizeof(char*). Check memory allocation in X11/gui.c and src/command.c more carefully. 2009.09.11 Fix an error in the calculation of the number of times the optimal solution is found in the kmeans and kmedians routines. In Algorithm::Cluster, fix the routine identifying which values in the data matrix are missing. Try to be more consistent in the coding standards of Cluster.xs and Cluster.pm. Make the docstrings in Python/__init__.py consistent with the Python recommendations. Changed the tests for NumPy arrays in Python/__init__.py by tests for None. 2009.05.30 Rewrote the Algorithm::Cluster test scripts using Perl's testing framework. The text 02_matrix_parse.t was dropped. Rewrote two lines in perl/Cluster.xs to be ANSI compliant. Added a missing reference to numpy in python/__init__.py. Improved unsigned/int correctness in python/clustermodule.c. Added a cast from int to char in src/data.c. Fixed a bug in src/cluster.c in the calculation of the number of times the best solution was found in kmedoids. 2009.04.21 Fixed a bug in Perl/Cluster.xs, which caused the size of the array returned by somcluster to be incorrect if transpose==1. Modified python/__init__.py to be consistent with the Python style guide. Updated the docstrings. Removed the CALL (STDCALL) definitions in src/cluster.c; this doesn't seem to be needed any more. Updated windows/resources.rc to be consistent with the latest version of windres. 2009.03.22 Fixing a bug in Cluster 3.0 on Windows, which caused Cluster 3.0 to crash if a user attempts to read in a file that does not have a file extension. For Cluster 3.0 with X11, don't use the obsolete Xp library. In Pycluster, call test_Cluster.py from "python setup.py test" instead of from a separate script test.py. When initializing a Record, read the data file line by line instead of all lines at the same time. Updating the unit tests. In Algorithm::Cluster, removed the typemap as it was not being used. Replace Record by Algorithm::Cluster::Record for robustness. Introduced the Algorithm::Cluster::Tree and Algorithm::Cluster::Node classes to represent hierarchical clustering results. Adding cut and scale methods to the Tree class. This will affect Perl scripts calling treecluster. Make sure that all tests and examples are included in the distribution. 2008.10.01 Removed the print_matrix_dbl function from perl/Cluster.xs, since it was not being used. Removed the deprecated PyArray_FromDims function from python/clustermodule.c. Check memory allocations rigorously in src/command.c. 2008.09.12 Fixed a memory leak in the spearman function in src/cluster.c. Converted python/__init__.py to the new NumPy; this should have been done in the previous release. 2008.08.30 The command-line version of Cluster 3.0 now allows it to be used without actually clustering, for example for normalization only (patch by Jeff Chang). Better handling of memory allocation failures in Algorithm::Cluster. Fixed a memory leak in clustercentroids in Algorithm::Cluster. Converted Pycluster to the new Numerical Python (NumPy 1.1.1). 2008.07.29 In the command-line version, missing breaks after each case in a switch caused median-centering to be applied when mean-centering was intended. Updated the documentation for Algorithm::Cluster, which was still showing the old output for Algorithm::Cluster::treecluster. The documentation is now distributed with the Algorithm::Cluster and Pycluster distributions. The DataFile class is now removed from python/__init__.py; replaced by the Record class. 2008.07.05 Fixed a bug in the pairwise single-linkage clustering algorithm. The size of the hierarchical tree is one less than the number of items to be clustered. However, one more node is used during the calculation of the hierarchical tree. Therefore, memory for one more node should be allocated. Used Python's unit test framework for the Pycluster tests. Added a version() function to Pycluster and Algorithm::Cluster. 2008.03.08 The executable for the Cluster 3.0 GUI can now also be used as a command-line program. Having separate executables is no longer necessary. It is still possible to compile Cluster 3.0 as a command-line only program, though. The Makefile.am and configure.ac files were updated accordingly. In Pycluster, the DataFile class was renamed Record. The recommended way to read a data file is now to use the Pycluster.read(handle) function, which returns a Record object. The clustercentroids function was added to Algorithm::Cluster. This module now also contains a Record class, which stores the data in a Cluster/TreeView-type expression data file. The appropriate functions in Algorithm::Cluster can now also be called as methods on a Record object. The treecluster function in Algorithm::Cluster now returns a single array with three columns. The last column returns the distances between items; this information was previously in a separate linkdist variable. 2007.11.21 Updated X11/gui.c to correctly deal with 64-bits architectures. This affects only the GUI, not the calculation itself. Replaced Netscape by Firefox as the default browser to read the help files on Unix/Linux. Updated the documentation. In Algorithm::Cluster, fixed the argument checking for the method argument in clusterdistance. In Pycluster, added argument checking for the arguments method and dist in clusterdistance. In src/data.c, replaced the use of strtok by a new tokenizer function to deal correctly with the case in which UNIQID is an empty string. 2007.06.19 Updated contact address to RIKEN. Rewrapped the docstrings in python/__init__.py and python/clustermodule.c to make each line fit. Removed a spurious return in python/clustermodule.c that prevented the distance matrix from being freed. In py_distancematrix in python/clustermodule.c, avoid the first row of the distance matrix from being freed, since this row is always NULL. Avoid this first row also in find_closest_pair in src/cluster.c. Rewrote the k-means algorithm in src/cluster.c. The previous version used a floating-point comparison that caused this routine to hang on some platforms. 2007.02.28 In src/cluster.c, let spearman return 1.0 if all ranks are equal. Fix casting of the result of floor. 2007.02.26 Allow a list of rows to represent the distance matrix in the Python functions kmedoids and treecluster. Make the docstrings for Pycluster more readable. Change i to l in the format string in PyArg_ParseTuple* functions for Pycluster to behave correctly on 64-bits machines. Updated standard exceptions in Pycluster. One minor change in src/cluster.c. 2006.09.25 Replaced calls to pow() by exp(log()) or sqrt(). The function pow() caused crashes on AIX (see Google for more information about the pow() bug on AIX). Check for sqrt, log, and exp presence in the math library; don't check pow. Replaced the ranlib random number generator by a random number generator written from scratch. Hence, ranlib is no longer needed, and was removed from the C Clustering Library. The examples were updated accordingly. 2006.05.12 Updated the website of Java TreeView in the documentation. Removed skipped lines in Makefile.PL. Updated Makefile.am to account for the new build process on Mac OS X (building a universal binary with XCode 2.2.1). Fixed the Makefile.am files such that configure looks for the Motif libraries and its dependencies. Also, check the math library only once for the sqrt and pow functions. The routine PerformPCA in src/data.c now returns an error message if a memory allocation error occurred (NULL otherwise). The routine randomassign was removed from src/cluster.h, and declared static in src/cluster.c, since no external code uses it. The functions getclustermean and getclustermedian were replaced by a single function getclustercentroids; the function getclustermedoid was renamed getclustermedoids. The functions kcluster and kmedoids now return if a memory allocation error occurs, and set *ifound equal to -1. Hierarchical clustering solutions are now represented as an array of Node structs, where each Node struct contains the numbers of the subnodes that were joined as well as the distance between them. The treecluster routine now returns a pointer to a newly allocated array of Node structs; cuttree accepts an array of Node structs as input. The cuttree no longer checks its input; if the array of Node structs is inconsistent, a segmentation fault may occur. However, cuttree is called only from python/clustermodule.c, which guarantees that the input to cuttree is consistent. In Python, hierarchical clustering solutions are implemented as a Tree class, a read-only list of Node objects. The cuttree function is now a method of the Tree class. Removed unneeded code from perl/Cluster.xs. Fixed the Perl test case in perl/t/14_medoids.t; previously, the clustering problem resulted in two nodes with an equal distance, making the solution depend on roundoff. The Python file reading routines were moved from python/data.py to python/__init__.py. Reading Cluster/TreeView-type files is now implemented as part of the DataFile class. In python/clustermodule.c, None arguments are interpreted as missing arguments. The clustercentroid function was renamed clustercentroids. Makefile.PL now checks for 64-bits architectures, and adds the -fPIC flag if needed. Simplified the quicksort calls in src/cluster.c and src/data.c. The function getrank now returns NULL if it fails due to a memory error. In the k-means/k-medians/k-medoids routines, previously the iteration stopped if no item reassignments were made, or if a periodic loop was detected in the EM algorithm. Instead, we now monitor the within-cluster sum of distances, and stop the iteration if no further improvement is obtained. In the k-means/k-medians/k-medoids routines, previously the iteration stopped if no item reassignments were made, or if a periodic loop was detected in the EM algorithm. Instead, we now monitor the within-cluster sum of distances, and stop the iteration if no further improvement is obtained. The routine svd sets ierr to an error flag if a memory allocation error occurs. Rewrote the initial random cluster assignment in kcluster, requiring no extra memory to be allocated. 2006.02.26 In the Perl module Algorithm::Cluster, allow hierarchical clustering to be applied to a user-defined distance matrix. In the Python extension module Pycluster / Bio.Cluster: fix the glue code in python/clustermodule.c in order for the extension module to work correctly on 64-bits machines. 2005.10.15 The k-means clustering routine accepts all eight distance functions available in the C Clustering Library. However, using distance functions other than the Euclidean distance and the city-block distance is discouraged. The reason is that other distance functions (such as the Pearson distance) calculate distances between data vectors that are effectively scaled (by subtracting the mean and dividing by the standard deviation for the Pearson distance), whereas the centroid calculation is performed by averaging the data vectors without normalization. A more correct way to use these normalized distance functions is to normalize the data (using the "Adjust data" tab in the GUI program) before starting the k-means clustering calculation. To discourage the use of distance functions other than the Euclidean distance and the city-block distance, in the GUI-version the distance defaults to the Euclidean distance for k-means and SOM calculations SOM (other distances can still be chosen, though). A similar argument can be made against the use of distance functions other than the Euclidean distance and the city-block distance in pairwise centroid-linkage hierarchical clustering. Fixed a bug in the command-line version of the code that caused the -ng and -na flags to have an effect only if the -cg and -ca flags were also specified. Fixed the Load routine in src/data.c so that it doesn't crash if the users attempts to read an empty file. Fixed the reading of empty lines in the data file in the Load routine in src/data.c. Removed the AlwaysCreateUninstallIcon option from the Inno Setup configuration file, as it is no longer supported by Inno Setup. Fixed a bug in windows/gui.c that caused arrays to be centered if the "Center genes" checkbox is checked. Simplified the way in which the bitmap is displayed in the "File format" help window, and fixed its position (previously, it was partly covered by the text on Windows XP). Fixed a bug in FilterDialogProc in windows/gui.c that caused a NULL pointer to be freed the first time the filter is applied. Gave ID_KMEANS_ARRAY_METRIC and ID_KMEANS_BUTTON different identifier numbers in windows/resources.rc. Updated windows/resources.rc to comply with the latest version of windres. Modified somworker in src/cluster.c to take the mask into account. Changed my email address, as I'm now at Columbia University. 2005.04.27 Bug fix in py_treecluster in python/clustermodule.c: the variable nnodes was not set if the distance matrix is specified instead of the data matrix. Bug fix in py_treecluster in python/clustermodule.c: the routine returns if the linkdist array cannot be allocated. Further cleanup of perl/Cluster.xs. Input data are now checked more strictly; any error will cause the calculation to abort. Previously, some errors were ignored, for example rows of unequal size in the data matrix. The algorithm for pairwise single-linkage hierarchical clustering was replaced by the SLINK algorithm: Sibson, R. (1973). SLINK: An optimally efficient algorithm for the single-link cluster method. The Computer Journal, 16(1): 30-34. The clustering result produced by this algorithm is identical to the single- linkage hierarchical clustering result, but the SLINK algorithm is much faster and uses much less memory. Hence, it can be used for large data sets. Removed the alternative implementation (using a cache) of pairwise centroid-linkage hierarchical clustering, as the new single-linkage routine looks more useful. 2005.02.23 Removed the harmonically summed Euclidean distance from the set of available distance metrics. For the Euclidean distance and the city-block distance, the sum is now divided by the number of observations present. Previously, the sum was divided by the number of observations present and multiplied by the total number of present and missing observations. In k-means clustering, when calculating the centroid of a cluster, the normalized expression profiles of the elements are summed. Previously, the unnormalized profiles were summed. Normalization depends on the distance metric. For the Euclidean distance and city-block distance, no normalization is used. For the Pearson correlation and the absolute Pearson correlation, the mean is subtracted from each expression profile, and the expression profile is divided by its standard deviation. For the uncentered Pearson correlation and the absolute uncentered Pearson correlation, each expression profile is divided by its standard deviation. For the Spearman rank correlation and Kendall's tau, the expression profiles are replaced by the ranks. In k-means clustering, previously the order in which genes/microarrays are reassigned was randomized. Since all genes/microarrays are reassigned before the cluster centroids are recalculated, the effect of the randomization is minimal. The order has an effect only if the last remaining gene is to be reassigned to a different cluster, as those reassignments are prevented in our implementation of k-means clustering. In the present algorithm, the order in which genes/microarrays are reassigned is no longer randomized. Previously, the k-means clustering algorithm returned the expression profiles of the centroids of the k clusters. Some applications of the k-means algorithm simply discard these (e.g. the Cluster 3.0 program). Since the centroids can be recalculated trivially, the current implementation of k-means clustering does not return the centroid profiles. The hierarchical clustering routine treecluster previously sets the first two elements of the clustering result to (0,0) if the available memory was insufficient. In the present implementation, treecluster is a function that returns 0 if not enough memory was available, and 1 if the calculation was successful. Previously, the treecluster algorithm took an argument applyscale to indicate if the link distances should be scaled by usage by Java TreeView. Since such scaling can be applied trivially after the treecluster routine, this argument was removed. Added a memory-efficient implementation of pairwise centroid-linkage hierarchical clustering (currently accessible from Python only). Cluster 3.0 GUI, all platforms: o) Changed the layout of the "Adjust" tab page, such that users cannot choose both mean and median centering at the same time. o) For hierarchical clustering, if the user specifies to calculate the weights, then for the first calculation of the distance matrix the weights as stored in the data file are used. For the actual clustering, the distance matrix is recalculated using the calculated weights. Previously, a distance matrix calculated as part of the weights calculation was reused for the clustering calculation, leading to inconsistencies. o) Calculation of the weights for hierarchical clustering is now implemented as a separate function in src/cluster.c. Cluster 3.0 for Mac OS X: o) Send the retain message to the directory variable after setting it by calling NSHomeDirectory(); otherwise the directory is autoreleased. Cluster 3.0 for Unix/Linux: o) In the routine MenuFile, a pointer to the static variable directory is passed to the OpenFile and SaveFile routines via the client_data pointer. The OpenFile and SaveFile take care of saving the last accessed directory in this variable. This avoids a call back to MenuFile. o) The routine Cleanup was removed. Previously, the WM_DELETE_WINDOW message caused the callback function Cleanup to be called, which in turn called Free, Filter, and MenuFile to free allocated memory. The CMD_FILE_EXIT command in MenuFile called Cleanup. Now, the WM_DELETE_WINDOW message has MenuFile as the callback function, passing CMD_FILE_EXIT as the client_data argument. The CMD_FILE_EXIT case in MenuFile takes care of freeing allocated memory. Hence, exiting the program via the menu or by closing the window becomes equivalent. o) The routine InitFilemanager is replaced by the case ID_FILEMANAGER_INIT in the routine FileManager. o) Setting the FileMemo and Jobname is is now done by the case ID_FILEMANAGER_SET_FILEMEMO and ID_FILEMANAGER_SET_JOBNAME in the routine FileManager. Previously, the corresponding code was located in OpenFile. o) Updating the rows and columns in the file manager is done by the case ID_FILEMANAGER_UPDATE_ROWS_COLUMNS in the routine FileManager. Previously, this was done by the case ID_SOM_UPDATE in the routine SOM by calling the routines SetRows and SetColumns. These two have been removed. o) In the routine Filter, freeing the static pointer "use" is achieved via the ID_FILTER_FREE message. Previously, the Filter routine checked if the Widget pointer is NULL. When freeing "use", we now check it to make sure it is not NULL. o) All routines except main() are now declared static. o) To create the pull down menu, the defined values CMD_FILE_OPEN and CMD_FILE_SAVE are used instead of the hard-coded 0 and 1. Cluster 3.0, command-line version: o) Changed the definition of the command-line option -l (previously -l 0|1) and -s (previously -s 0|1). Added the command-line options -cg a|m (center each row in the data set by subtracting the row mean or median), -ng (normalize each row in the data set), -ca a|m (center each column in the data set by subtracting the column mean or median), -na (normalize each column in the data set). o) Allow the user to choose the number of repetitions for k-means clustering. Perl interface Algorithm::Cluster: o) Added the kmedoids function. o) Added the distancematrix function. o) Allow the initial cluster assignments to be specified by the user in the kcluster and kmedoids functions. o) Allow the parameters cluster1 and cluster2 in clusterdistance to be a single integer instead of an integer array. o) When converting Perl arrays to C arrays, function will return NULL if an error is detected (previously only a warning was raised). This has not yet been implemented for all conversion functions. Documentation: o) Equations (before shown as PNGs) replaced by HTML code. 2004.06.09 Replaced 1.e99 by DBL_MAX in src/cluster.c. Rewrote the Makefile and the Inno Setup Compiler script for the Windows version of Cluster 3.0 such that both an ANSI (Windows 95, 98, Me) and a UNICODE (Windows NT, 2000, XP) version is included. The installer determines on which version of Windows it is being run, and installs the appropriate version. Some minor changes were needed in windows/gui.c for it to compile correctly for both ANSI and UNICODE. The routine distancematrix in src/cluster.c now returns NULL if not enough memory can be allocated to store the distance matrix. In src/data.c, the function ClearDistanceMatrix now does not take any arguments. Previously, an argument specifying the size of the distance matrix was needed to free the matrix appropriately. As this is error-prone, the size of the distance matrix is now stored as part of the _distancematrix struct. The routines that make sure that genes and arrays are saved in the correct order in the .cdt output file are now static routines in src/data.c, except for the new routine ResetIndex. Added a routine GetWidgetItemInt to X11/gui.c to make reading integer values from edit boxes easier, as well as a routine ShowError to display error messages. Code cleanup in src/data.c in the PerformSOM routine. The calling code in src/command.c, windows/gui.c, mac/Controller.m, X11/gui.c was updated accordingly. In the new version, the SOM calculation is started after opening all output files, and no calculation is performed if the function fails to open any of the files. Code cleanup in the hierarchical clustering section of windows/gui.c, mac/Controller.m, X11/gui.c. Previously, some unnecessary steps in the calculation were performed. The treecluster routine now returns (0,0) as the first linking event if the routine fails due to lack of memory. This may occur if the treecluster needs to calculated the distance matrix but cannot allocate enough memory to store it. Added the code to check if the first clustering event is (0,0) to python/clustermodule.c and perl/Cluster.xs. In src/command.c, routines that are only used locally are now defined static. In mac/Controller.m, added a struct FileHandle and the routines OpenFile and CloseFile for file access management. In src/data.c, local variables are now defined static. In src/data.c, SetIndex, SetClusterIndex, SetOrderIndex were rewritten as ResetIndex, SetClusterIndex. Replaced empty argument lists by "void" in function declarations. 2004.05.09 Bug fix in python/clustermodule.c: In clusterdistance and clustercentroid, the TRANSPOSE variable was not initialized to 0. Check for missing gene names in the Load function in src/data.c. The function returns an error message if the gene name is missing in a data row. The Windows and Mac OS X versions of Cluster 3.0 can now handle UNICODE file names. Fixed a bug with the City Block distance measure. Previously, distances were not scaled correctly, causing errors with Java TreeView. 2004.04.02 Cleaned up python/clustermodule.c. Fixed an error in the test routine test_Cluster.py, which caused an incorrect result to be displayed in the results file test_Cluster. The kcluster routine in the Perl interface Algorithm::Cluster now includes the residual within-cluster sum of distances in the returned output. Updated the Perl example scripts accordingly, and added tests for this output variable to t/10_kcluster.t. Added the city-block distance to the Unix/Linux version of Cluster 3.0. In Pycluster, for the routines "treecluster" and "clusterdistance" the order of the arguments "method" and "dist" were interchanged for consistency with kcluster. Generalized the usage of clusterdistance in Pycluster such that clusters containing only one item can be represented as a list containing one item (e.g. index1==[17]) or as the item number itself (index1=17). Removed the Windows Registry keys that are specific to TreeView from the installer program for Windows. Removed the "Launch Java TreeView" button from Cluster 3.0. With the improved installers for Java TreeView, this button is no longer needed. Minor cleanup of the automake/autoconf files. This affects the configure script. Fixed the version number in the command-line version of Cluster 3.0. In the Perl interface Algorithm::Cluster, fixed a memory allocation error that caused core dumps when clustering microarrays (transpose==1). Code cleanup: removed casts in front of malloc. Updated the automake/autoconf files. For the GUI-version of Cluster 3.0 on Unix/Linux, use configure or configure --with-x For the command-line version of Cluster 3.0, use configure --without-x (so the --with-motif is no longer used). Added the -lXt and -lX11 libraries to the link command. 2004.01.27 Cleaned up the examples in the perl/examples subdirectory. Renamed the installer program for Cluster 3.0 for Windows from ctvsetup.exe to clustersetup.exe. Changed the file name of the Cluster 3.0 manual from ctv.pdf to cluster3.pdf (the name of the manual for the C Clustering Library is cluster.pdf as before). Updated the Makefile in the examples subdirectory. Added the cuttree routine to the C Clustering Library. This routine takes the tree structure generated by the hierarchical clustering routines, and groups the elements in the tree into a given number of clusters. Generalized the kcluster routine so that it can start from an initial clustering specified by the user. This is also available from Python, but not (yet) from Perl. Updated the manual. Added the city-block distance to the Perl interface. Fixed a bug in the Macintosh-version of Cluster 3.0, which prevented the arrays from being adjusted in the Adjust tab. Added the k-medoids algorithm the the C Clustering Library, and added the corresponding Python interface to the k-medoids routine. Rewrote the Python interface to the kcluster routine. Added a Python interface to the distancematrix routine. Added tests to Bio.Python / Pycluster. Modified the command-line version of Cluster 3.0 to enable users to specify which kind of hierarchical clustering is used (pairwise complete-, single-, centroid-, or average-linkage). 2003.09.07 Bug fix in the pairwise average-linkage hierachical clustering algorithm (palcluster in src/cluster.c). The last row in the distance matrix was not being copied properly. Thanks to Chris Torrence of Research Systems, Inc. for noticing this bug. For the Mac OS X version, the close button in the main window was disabled. 2003.07.17 Bug fix in the GUI for Cluster 3.0 on Linux/Unix. The bug led to a crash when users try to use the help window for the file format. The hierarchical clustering routine in Pycluster can now cluster a data set if only the distance matrix is available, and the original gene expression data are not available. This is also useful in cases where the distance is defined but the original data are not well defined, for example when clustering proteins based on the similarity of their shape. Clustering using the distance matrix only is not possible for centroid linkage clustering, which always needs the original data. The INSTALL file was added to MANIFEST for the Perl package Algorithm::Cluster. A command line version of Cluster 3.0 was written, with the command line options consistent with Gavin Sherlock's xcluster program, following a suggestion by QuangQiu Wang of Stanford University. For the compilation, "configure" now writes the Makefile for the command line program, while "configure --with-motif" generates the Makefile for the GUI program. There is no longer the option to compile the library by itself, as it is easier to do this by simply collecting the appropriate source files. 2003.06.13 The city-block distance was added as one of the measures of similarity in gene expression data. The manual was updated on how to use the clustering routines from Biopython. 2003.05.28 The license was changed to the Python License instead of the GNU Lesser General Public License for the C Clustering Library and Pycluster. Algorithm::Cluster is covered by the Artistic License (same license as Perl itself). Replaced the routine for the Singular Value Decomposition with a routine that is compatible with the Python License. Cleaned up the file reading routine in src/data.c. Bug fix in the PCA routine in src/data.c; the data matrix was not being scaled correctly. In the Perl test script 01_mean_median.t, write out floating point values with a limited number of decimals. Previously, the comparison of floating point values led to spurious errors when running this test script. 2003.05.06 Fix for a bug in GeneKCluster, where the temporary array cdata was not allocated enough space, leading to crashes. Also a speed improvement in the kcluster routine (thanks to Minkov Minsky). 2003.04.25 Several changes in version 1.17, including a fix for a bug in kcluster that significantly increased the running time, and a fix for a file reading error in data.c, affecting Cluster 3.0. The file reading error caused missing values to be interpreted as a zero. Thanks to Justin Klekota of Harvard University for noticing the bug in kcluster. Cleaned up the API for hierarchical clustering and self-organizing maps. This version also includes the updated documentation for the Perl interface. 2003.04.04 Fixed some memory leakage problems in somassign in cluster.c. 2003.03.30 Fixed a file reading problem in Cluster 3.0 due to the different end of line character on Macintosh computers. Files edited on Macs (e.g. with Excel) can now be read by Cluster 3.0. Thanks to Ivan Baxter of the Scripps Research Institute for noticing this error. Some cleanup in the perl interface (e.g., removing unused variables). 2003.03.22 Added the perl interface to the C Clustering Library. The perl interface was written by John Nolan. 2003.02.21 Updated the manual (Thanks to Timothy Chklovski of the MIT for pointing out an error in the description of pairwise complete-linkage clustering in the manual for Cluster 3.0). Fixed a typing error in the Python interface to the SOM routine. 2003.02.01 Cleaned up palworker (improved speed and memory requirements). 2003.01.20 Bugfix in palworker. Thanks to Jin Hu Huang, Flinders University, Australia, for noticing this bug. 2003.01.07 Change in the onepass routine in src/cluster.c. The number of elements in each cluster is tracked during the iteration. If a certain cluster has only one element left, no reassignment takes place in order to avoid empty clusters. 2003.01.03 Change in the onepass routine in src/cluster.c. Checking for periodic solutions is interrupted as soon as a different cluster id is found. 2002.12.12 Bug fix in src/data.c (gene weights and array weights were switched). 2002.12.05 Bug fix in X11/gui.c (unallocated string problem). Some changes in the documentation to refer to OpenMotif. 2002.11.05 Cluster 3.0 was ported to Unix/Linux using Motif. Some functions were added to Pycluster, mainly to read and write Cluster/ TreeView style files. The Java/C hybrid JavaCluster will no longer be maintained because of portability problems.