man sreformat (Commandes) - convert sequence file to different format

NAME

sreformat - convert sequence file to different format

SYNOPSIS

sreformat [options] format seqfile

DESCRIPTION

sreformat reads the sequence file seqfile in any supported format, reformats it into a new format specified by format, then prints the reformatted text.

Supported input formats include (but are not limited to) the unaligned formats FASTA, Genbank, EMBL, SWISS-PROT, PIR, and GCG, and the aligned formats SELEX, Clustal, and GCG MSF.

Available unaligned output file format codes include fasta (FASTA format); embl (EMBL/SWISSPROT format); genbank (Genbank format); gcg (GCG single sequence format); gcgdata (GCG flatfile database format); strider (MacStrider format); zuker (Zuker MFOLD format); ig (Intelligenetics format); pir (PIR/CODATA flatfile format); squid (an undocumented St. Louis format); raw (raw sequence, no other information). The available aligned output file format codes include selex (SELEX/HMMER/Pfam annotated alignment format); msf (GCG MSF format); and a2m (aligned FASTA format, called A2M by the UC Santa Cruz HMM group).

Unaligned format files cannot be reformatted to aligned formats. However, aligned formats can be reformatted to unaligned formats -- gap characters are simply stripped out.

This program was originally named reformat, but that name clashes with a GCG program of the same name.

OPTIONS

-d
DNA; convert U's to T's, to make sure a nucleic acid sequence is shown as DNA not RNA. See -r.
-h
Print brief help; includes version number and summary of all options, including expert options.
-l
Lowercase; convert all sequence residues to lower case. See -u.
-r
RNA; convert T's to U's, to make sure a nucleic acid sequence is shown as RNA not DNA. See -d.
-u
Uppercase; convert all sequence residues to upper case. See -l.

EXPERT OPTIONS

--pfam
For SELEX alignment output format only, put the entire alignment in one block (don't wrap into multiple blocks). This is close to the format used internally by Pfam in Stockholm and Cambridge.
--sam
Try to convert gap characters to UC Santa Cruz SAM style, where a . means a gap in an insert column, and a - means a deletion in a consensus/match column. This only works for converting aligned file formats, and only if the alignment already adheres to the SAM convention of upper case for residues in consensus/match columns, and lower case for residues in insert columns. This is true, for instance, of all alignments produced by old versions of HMMER. (HMMER2 produces alignments that adhere to SAM's conventions even in gap character choice.) This option was added to allow Pfam alignments to be reformatted into something more suitable for profile HMM construction using the UCSC SAM software.
--samfrac <x>
Try to convert the alignment gap characters and residue cases to UC Santa Cruz SAM style, where a . means a gap in an insert column and a - means a deletion in a consensus/match column, and upper case means match/consensus residues and lower case means inserted resiudes. This will only work for converting aligned file formats, but unlike the --sam option, it will work regardless of whether the file adheres to the upper/lower case residue convention. Instead, any column containing more than a fraction <x> of gap characters is interpreted as an insert column, and all other columns are interpreted as match columns. This option was added to allow Pfam alignments to be reformatted into something more suitable for profile HMM construction using the UCSC SAM software.

SEE ALSO

alistat getseq seqstat sreformat

AUTHOR

This software and documentation is Copyright (C) 1992-1998 Washington University School of Medicine. It is freely distributable under terms of the GNU General Public License. See COPYING in the source code distribution for more details, or contact me.

Sean Eddy
Dept. of Genetics
Washington Univ. School of Medicine
4566 Scott Ave.
St Louis, MO 63110 USA
Phone: 1-314-362-7666
FAX  : 1-314-362-7855
Email: eddy@genetics.wustl.edu