man msort (Commandes) - sort records in complex ways

NAME

msort - sort records in complex ways

SYNOPSIS

msort <options> [<input file>]

DESCRIPTION

msort is a program for sorting text files in sophisticated ways. It was developed initially for alphabetizing dictionaries of languages in which the ordering may be quite different from English but has many other uses.

msort allows you to sort blocks of text delimited in a number of ways rather than just lines and to specify particular fields of a record as sort keys using either their position, counted from either end, or by matching regular expressions to their tags.

msort is capable of sorting on multiple keys, so that when two records tie on one key, the tie may be broken on another. Any or all keys may be optional. How absent optional keys are ordered with respect to present keys may be set separately for each key.

msort can sort lexicographically, on numerical values, on dates and times, and on the length of strings.

msort allows you to specify arbitrary sort orders and to define virtually unlimited numbers of multigraphs of effectively unlimited length. The sort order and multigraphs are defined separately for each key.

msort can reverse the characters in a key, allowing it to be used to generate reverse dictionaries.

msort currently operates on 8-bit characters. Except for the fact that it uses whitespace characters as default record and field delimiters, it makes no assumptions about the character set other than that 0x00 (NULL) terminates a string.

For usage information, execute msort with no arguments.

Full information about msort is currently to be found in the reference manual, which is distributed as a PDF (Portable Document Format) file. If a copy is not available locally, you can download a copy of the manual by going to: br http://www.cis.upenn.edu/~wjposer/msort.html

OPTIONS

-h
Print usage message
-v
Print version message
-D
List defaults
-F
List command line options
-L
List limits
-b
A record is terminated by two or more newlines
-l
A record consists of a single line
-r <separator>
A record is terminated by separator character
-d <character>+
Fields are delimited by the named character(s)
-w
Sort on the entire text of the record
-M <records>
Set initial maximum number of records
-m
End-of-line in the input data is marked by Carriage Return (0x0D) as on the Macintosh rather than by Line Feed (0x0A) as on Unix systems.
-I
Invert sense of comparisons globally
-B
No characters fall outside the Basic Multingual Plane (that is, have values greater than 0xFFFF).
-p
Do not make internal use of the Private Use areas. By default, multigraphs are assigned internally to codepoints in the Supplementary Private Use areas if full Unicode is in use or to codepoints in the Private Use area if input is restricted to the Basic Multilingual Plane by means of the -B option. If your input makes of the Private Use areas, this option prevents interference with your input. In this case, multigraphs will be assigned to the Low and High Surrogate areas (0xD800-0xDFFF). Note that this limits the number of multigraphs to 2,048.
-q
Be quiet - do not chat while working

Key specific options:

-e <m,n>
Sort on characters m through n. Positive indices start from one. Negative indices indicate position with respect to the end of the record. For example, the range 3,-2 consists of the third character through the next-to-last character.
-n <field number>
Sort on the specified field (counting from one; negative means from right)
-t <tag regexp>
Sort on the field with the specified tag
-o <comparison>
Optional: compare as (<,=,>) to present key if absent
-C
Fold case
-c <key type>
l(exicographic), i(so8601 date/time), t(ime), d(ate), n(umeric), s(ize), h(hybrid), r(andom)
-f <date format>
Permutation of ymd with separators, e.g. y/m/d for international date format, m/d/y for American date format.
-W <file name>
Read the list of characters to be treated as separators in the sort order definition file.
-S <file name>
Read substitutions from named file
-s <file name>
Read sort order from named file
-x <file name>
Read exclusions from named file
-X <exclusions>
Exclude specified characters
-i
Invert sense of comparisons
-R
Reverse characters of key

SEE ALSO

AUTHOR

Bill Poser (billposer@alum.mit.edu)

LICENSE

GNU General Public License (http://www.gnu.org/licenses/gpl.txt), version 2.