MacVector icon

Performing a Muscle alignment

Muscle is a fast algorithm optimised for aligning protein sequences.

ClustalW, Muscle, and T-Coffee are progressive alignment algorithms. Progressive alignments generally build a guide tree that represents the pairwise relationships between each possible pair of sequences in the alignment. A multiple sequence alignment is then built sequentially using the tree as a construction guide.

You can display this Guide tree for ClustalW alignments and T-Coffee alignments. You cannot show a Guide Tree with a Muscle alignment.

All three algorithms are integrated into the MSA editor. This means you can try all three algorithms on the same alignment to see the results.

Muscle does not do a pairwise alignment but instead uses an approximate method of comparing the number of short subsequences (k-mers, k-tuples or words) that each pair of sequences share. You can immediately see how this is much faster for alignments containing many sequences where the number of pairwise alignments needed to construct the tree is high.

Muscle is generally regarded as faster than Clustalw and T-Coffee at the penalty of being slightly less accurate.

To create an alignment with Muscle

Hold it down until the popup menu allows you to choose the Muscle algorithm. This setting always defaults to the last used algorithm.

Parameters

Maximum Iterations

(-maxiters, Default=16)

number of iterations - recommendation is 2 for large alignments.

Max Trees Considered

(-maxtrees, default =1)

max trees to consider. Usually only the first is needed.

Max Processing Time

(-maxhours, default=999Hrs)

give up after a certain length of time! (floating point)

Profile

The protein profile scoring functions (only selectable for amino acid alignments).

- log-expectation (-le ) - default

- PAM200 (-sp)

- VTML240(-sv)

DIAGONALS:

- OFF (Default)

- ON (-diags) toggle to control the use of 6-mer common words between 2 seqs as seeds for diagonals.

- 1st Iteration (-diags1) - only use optimization in the first iteration.

- 2nd Iteration (-diags2) - only in the 2nd iteration.

No Anchors (-noanchors) - disable use of "vertical blocks" to anchor alignments. Only use if very slow absolute accuracy is required.

Fastest Speed:

Here are the recommended options for the fastest possible alignment;

Protein

MAXIMUM ITERATIONS =1, DIAGONALS = ON, PROFILE=VTML240

DNA

MAXIMUM ITERATIONS =1, DIAGONALS = ON

Length:

There is a maximum usable length of around 10,000 residues. Its primarily a protein alignment tool.

Alignments without a common region

Because of the initial step of constructing the pairwise alignments tree, progressive alignment algorithms have difficulties with alignments where all sequences do not share a common region. For example take an alignment of three sequences where you have 5kb sequence and two regions of this sequence of around 1KB long. One subsequence aligns from 1,000 to 2,000 of the 10kb sequence and the other aligns at 4,000 to 5,000. Most progressive alignments will try to create initial pairwise alignments of all combinations of sequences and that skews the alignment so that it prefers to align the sequences so that there is at least one segment of overlap between all of the input sequence. Due to not creating pairwise alignments the "Muscle" algorithm is unique amongst these three algorithms as it will align this type of data as long as the "Diagonals" optimization parameter is set to "On".

Related Topics.

The MSA Editor

Creating an Alignment

T-Coffee

Displaying alignments

Performing a ClustalW alignment

Guide tree