ProGraphMSA project page
Introduction:
ProGraphMSA is a state-of-the-art multiple sequence
alignment tool which produces phylogenetically sensible gap patterns
while maintaining robustness by allowing alternative splicings and
errors in the branching pattern of the guide tree. This is achieved by
incorporating a graph-based sequence representation as in
POA and combines
it with the advantages of the phylogeny-aware algorithm in
Prank.
Further, we account for variations in the substitution pattern by using
estimated amino acid frequencies and by implementing context-specific
profiles as in
CS-Blast.
The latest versions of ProGraphMSA include an alignment
model supporting tandem repeat unit insertions and deletions.
Web-based Multiple Sequence Alignment:
Downloads:
If you experience any problems downloading/executing
ProGraphMSA contact me at
bladam@.szalkowsutki@##inf.ethz*.ch
Example:
Downloading and building ProGraphMSA from source on Linux:
# download and extract source
wget http://people.inf.ethz.ch/sadam/ProGraphMSA/files/ProGraphMSA-current.tar.gz
tar xzvf ProGraphMSA-current.tar.gz
# change into the source directory
cd ProGraphMSA
Running ProGraphMSA:
# perform an alignment and output stockholm format:
./ProGraphMSA.sh input_sequences.fasta -o output.stk
# perform an alignment and output fasta format:
./ProGraphMSA.sh --fasta input_sequences.fasta -o output.fasta
Downloading and installing tandem repeat detectors:
# download T-REKS
wget http://bioinfo.montp.cnrs.fr/t-reks/T-Reks.jar
# download and extract TRUST
wget http://www.ibi.vu.nl/programs/trustwww/trust.tgz
tar xzf trust.tgz
# adjust installation path in wrapper script
echo "Trust path: $(pwd)/Align"
${EDITOR} trust2treks.py
Running ProGraphMSA+TR:
# perform an alignment using T-REKS to detect tandem repeats and output stockholm format:
./ProGraphMSA+TR.sh input_sequences.fasta -o output.stk
# perform an alignment using T-REKS to detect tandem repeats and output fasta format:
./ProGraphMSA+TR.sh --fasta input_sequences.fasta -o output.fasta
# perform an alignment using TRUST to detect tandem repeats and output stockholm format:
./ProGraphMSA+TR.sh --custom_tr_cmd trust2treks.py input_sequences.fasta -o output.stk
Documentation:
Command line parameters:
Usage: ProGraphMSA [--ancestral_seqs] [--all_trees] [-i
<
iterations>] [-T] [-M] [-m] [-a] [-C
<
count>] [-F] [--custom_model <
file>] [-w]
[-c <
file>] [-r] [-R] [--custom_tr_cmd
$lt;
command>] [--trd_output <
filename>]
[--read_repeats <
T-Reks format output>] [--repalign]
[--repeat_indel_ext <
probability>] [--repeat_indel_rate
<
rate>] [-A] [-P <
distance>] [-p
<
distance>] [-D <
distance>] [-d
<
distance>] [-x <
distance>] [-l
<
distance>] [-E <
probability>] [-e
<
probability>] [-g <
rate>] [-f] [--dna]
[--codon] [-t <
newick file>] [-o <
filename>]
[--] [--version] [-h] <
fasta file>
Tandem-repeat related parameters:
| -i <iterations>, --iterations <iterations> |
number of iterations re-estimating guide tree [default: 2] |
| -R, --repeats |
use T-REKS to identify tandem repeats |
| --custom_tr_cmd <command> |
custom command for detecting tandem-repeats |
| --trd_output <filename> |
write TR detector output to file |
| --read_repeats <T-REKS format output> |
read TR detector output from file |
| --repalign |
re-align detected tandem repeat units |
| --repeat_indel_ext <probability> |
repeat indel extension probability |
| --repeat_indel_rate <rate> |
insertion/deletion rate for repeat units (per site) |
Guide tree, distances, and substitution model:
| -i <iterations>, --iterations <iterations> |
number of iterations re-estimating guide tree [default:
2] |
| -m, --mldist |
use distances estimated by a Maximum-Likelihood method |
| -a, --nwdist |
estimate initial distance tree from Needleman-Wunsch
alignments |
| -D <distance>, --max_dist <distance> |
maximum distance for alignment |
| -F, --estimate_aafreqs |
estimate equilibrium amino acid frequencies from input
data |
| -w, --darwin |
use model of evolution from Darwin (GONNET matrix and different indel model parameters, otherwise WAG will be used) |
| --custom_model <file> |
custom substitution model in qmat format |
| -c <file>, --cs_profile <file> |
path to library of context-sensitive profiles (we distribute a copy in the 3rd_party folder) |
| -A, --no_force_align_m |
do not force alignment of initial Methionine |
Parameters for adjusting the indel model:
| -l <distance>, --edge_halflife <distance> |
edge half-life (evolutionary distance at which the probability of re-using an unsused graph is halved) |
| -E <probability>, --end_indel_prob <probability> |
probability of mismatching sequence ends (set to -1 to disable this feature) |
| -e <probability>, --gap_ext <probability> |
gap extension probability |
| -g <rate>, --indel_rate <rate> |
insertion/deletion rate |
Input/Output:
| -f, --fasta |
output fasta format (instead of stockholm) |
| -t <newick file>, --tree <newick file> |
initial guide tree |
| -o <filename>, --output <filename> |
Output file name |
| -I, --input_order |
output sequences in input order (default: tree order) |
| --dna |
align DNA sequence |
| --codon |
align DNA sequence based on a codon model |
| --ancestral_seqs |
output all ancestral sequences |
| <fasta file> |
(required) input sequences |