T-Coffee is a fast algorithm optimised for aligning DNA and protein sequences.
ClustalW, Muscle, and T-Coffee are progressive alignment algorithms. Progressive alignments generally build a guide tree that represents the pairwise relationships between each possible pair of sequences in the alignment. A multiple sequence alignment is then built sequentially using the tree as a construction guide.
You can display this Guide tree for ClustalW alignments and T-Coffee alignments. You cannot show a Guide Tree with a Muscle alignment.
All three algorithms are integrated into the MSA editor. This means you can try all three algorithms on the same alignment to see the results.
T-Coffee builds a library of all pairwise alignments but also aligns each sequence in the pair with a third sequence in the sequence set before building the MSA.
T-Coffee is regarded as being slightly slower than ClustalW but will produce more accurate alignments for distantly related amino acid sequences. Here's the original publication. Incidentally T-Coffee stands for Tree based Consistency Objective Function For AlignmEnt Evaluation.
Hold it down until the popup menu allows you to choose the T-Coffee algorithm. This setting always defaults to the last used algorithm.
(-dp_mode)
gotoh_pair_wise: implementation of the gotoh algorithm (quadratic in memory and time)
myers_miller_pair_wise: implementation of the Myers and Miller dynamic programming algorithm ( quadratic in time and linear in space). This algorithm is recommended for very long sequences. It is about 2 times slower than gotoh and only accepts tg_mode=1or 2 (i.e. gaps penalized for opening).
fasta_pair_wise: implementation of the fasta algorithm. The sequence is hashed, looking for ktuples words. Dynamic programming is only carried out on the ndiag best scoring diagonals. This is much faster but less accurate than the two previous. This mode is controlled by the parameters -ktuple, -diag_mode and -ndiag
cfasta_pair_wise: c stands for checked. It is the same algorithm. The dynamic programming is made on the ndiag best diagonals, and then on the 2*ndiags, and so on until the scores converge. Complexity will depend on the level of divergence of the sequences, but will usually be L*log(L), with an accuracy comparable to the two first mode ( this was checked on BaliBase). This mode is controlled by the parameters -ktuple, -diag_mode and -ndiag
(-diag_threshold)
Sets the value of the threshold when selecting diagonals. The default value is 0
(-gapopen)
Indicates the penalty applied for opening a gap. The penalty must be negative. If no value is provided when using a substitution matrix, a value will be automatically computed. Here are some guidelines regarding the tuning of gapopen and gapext. In T-Coffee matches get a score between 0 (match) and 1000 (match perfectly consistent with the library). The default cosmetic penalty is set to -50 (5% of a perfect match). If you want to tune -gapoen and see a strong effect, you should therefore consider values between 0 and -1000. The default value is 0
(-gapext)
Indicates the penalty applied for extending a gap. Default value is 0
(-tg_mode)
0: terminal gaps penalized with -gapopen + -gapext*len
1: terminal gaps penalized with a -gapext*len
2: terminal gaps unpenalized.
(-clean_aln)
This flag causes T-Coffee to post-process the multiple alignment. Residues that have a reliability score smaller or equal to -clean_threshold (as given by an evaluation that uses - clean_evaluate_mode) are realigned to the rest of the alignment. Residues with a score higher than the threshold constitute a rigid framework that cannot be altered. The cleaning algorithm is greedy. It starts from the top left segment of low constituency residues and works its way left to right, top to bottom along the alignment. You can require this operation to be carried out for several cycles using the -clean_iterations flag. The rationale behind this operation is mostly cosmetic. In order to ensure a decent looking alignment, the gop is set to -20 and the gep to -1. There is no penalty for terminal gaps, and the matrix is blosum62mt.
Note: Gaps are always considered to have a reliability score of 0.
Note: The use of the cleaning option can result in memory overflow when aligning large sequences,
The default is off
(-clean_threshold)
See above for details.
(-clean_iteration)
See above for details.
-iterate
Sequences are extracted in turn and realigned to the MSA. If iterate is set to -1, each sequence is realigned, otherwise the number of iterations is set by -iterate.
-distance_matrix_mode
This flag indicates the method used for computing the distance matrix (distance between every pair of sequences) required for the computation of the dendrogram.
Slow: The chosen dp_mode using the extended library
fast: The fasta dp_mode using the extended library.
very_fast: The fasta dp_mode using blosum62mt.
ktup: Ktup matching (Muscle kind)
The default value is very_fast.