man mclfamily (Conventions) - a description of the mcl family of cluster applications.
NAME
mclfamily - a description of the mcl family of cluster applications.
mcl is the Amsterdam implementation of the Markov Cluster Algorithm. It is described in the mcl manual. Several other utilities are part of the MCL distribution. This manual pages gives an overview.
the cluster algorithm MCL Frequently Asked Questions
the graph/matrix input/output format
convert between interchange/binary storage types create matrices from raw data dump a matrix optionally with label substitions load label data into matrix and tab files transform array data to MCL matrices relabel indices in a graph/matrix
general matrix operations extracting submatrices in various ways
compute split/join distance between clusterings compute performance measure for clusterings compute intersection of clusterings find best matching clusters between clusterings fetch connected components from graphs or subgraphs interpret MCL iterand/matrix as clustering extend subgraph clustering display clusters as html or txt files reorder indices to represent blocks from different clusterings
parsing/assembly/clustering/display BLAST pipeline parse BLAST files
Entries marked * are not available if only a default install is done.
DESCRIPTION
mcl(1) - the clustering program. Since the 05-314 release it has the ability to read in and cluster label input.
mclfamily(7) - Frequently Asked Questions.
mcxio(5) - a description of the mcl matrix format.
mcxconvert(1) - convert matrices from interchange mcl format to binary mcl format or vice versa.
mcxassemble(1) - assemble a matrix/graph from partial edge weight scores. Useful intermediate format to be used when transforming application specific data into an mcl input matrix. However, mcl has now acquired the ability to read graphs directly from label input, removing the need for mcxassemble in most cases.
mcxdump(1) - dump matrices in a line-based format, optionally map indices to labels. Either a node pair (matrix entry) or a node list (matrix row) is output per line.
mcxload(1) - load matrices and tab files from a line-based ID1 ID2 format. It can load bipartite structures in which the two columns contain labels from different domains. It has many options to further symmetrize and transform the input.
mcxarray(1) - transform array data to MCL matrices. The data may be of rectangular M x N type. Either an M x M or an N x N dimensional matrix can be made, by computing correlation scores between the vectors in one of the to domains. The Pearson correlation coefficient and the cosine are supported, and further tearing and pruning options can be applied.
mcxmap(1) - relabel indices in a graph.
mcx(1) - an interpreter for a stack language that enables interaction with the mcl matrix libraries. It can be used both from the command line and interactively, and supports a rich set of operations such as transposition, scaling, column scaling, multiplication, Hadamard powers and products, et cetera. The general aim is to provide handles for simple number and matrix arithmetic, and for graph, set, and clustering operations. The following is a very simple example of implementing and using mcl in this language.
2.0 .i def # define inflation value. /small lm # load matrix in file 'small'. dim id add # add identity matrix. st .x def # make stochastic, bind to x. { xpn .i infl vm } .mcl def # define one mcl iteration. 20 .x .mcl repeat # iterate 20 times imac # interpret matrix as clustering. vm # view matrix (clustering).
One of the more interesting things that can be done is doing mcl runs with more complicated inflation profiles than the two-constant approach used in mcl itself.
mcxsubs(1) - compute a submatrix of a given matrix, where row and column index sets can be specified as lists of indices combined with list of clusters in a given clustering. Useful for inspecting local cluster structure.
clmdist(1) - compute the split/join distance between two partitions. The split/join distance is better suited for measuring partition similarity than the long-known equivalence mismatch coefficient. The former measures the number of node moves required to transform one partition into the other, the latter measures differences between volumes of edges of unions of complete graphs associated with partitions.
clminfo(1) - compute a performance measure saying how well a clustering captures the edge weights of the input graph. Useful for comparing different clusterings on the same graph, best used in conjunction with clmdist - because comparing clusterings at different levels of granularity should somewhat change the performance interpretation. The latter issue is discussed in the clmdist(1) entry.
clmmeet(1) - compute the intersection of a set of clusterings, i.e. the largest clustering that is a subclustering of all. Useful for measuring the consistency of a set of different clusterings at supposedly different levels of granularity (in conjunction with clmdist).
clmmate(1) - find best matching clusters between two different clusterings.
clmclose(1) - fetch connected components from graphs or subgraphs.
clmimac(1) - interpret MCL iterands as clusterings. The clusterings associated with early iterands may contain overlap, should you be interested therein.
clmresidue(1) - extend a clustering of a subgraph onto a clustering of the larger graph.
clmformat(1) - display clusters suitable for scrutinizing.
clmorder(1) - reorder indices to represent blocks from different clusterings.
mclpipeline(1) - set up a pipeline from data parsing stage unto clustering format/display stage.
mclblastline(1) - BLAST specific pipeline.
mcxdeblast(1) - BLAST parser. Can be used to directly stream a graph into mcl. Can also prepare input for mcxassemble or can be plugged into the heavy-weight mclblastline.