MacVector icon

MSA Editor: Calculating the consensus line

Underneath an aligned sequence, you can display a line of residue codes to show the nature and extent of their consensus. To access the consensus calculation options, select the Consensus tab in the Multiple Alignment Options dialog box.

MacVector offers multiple modes for calculating the consensus line:

Nucleic Acid alignments

- Consensus Identity

- % Identity

- DNA Residues

Protein alignments

- ClustalW default groups

- Consensus Identity

- %Identity

and the property-based color groups:

- Functionality

- Alpha-helix

- Alpha-helix + P450

- Hydrophobicity + Charge

- Acidity + Basicity

- Smooth Scaling

- Structural Position

- Steric Bulk

- Gascuel & Golmard

- Chou-Fasman Alpha Helix

- Chou-Fasman Beta Sheet

- Levitt Alpha Helix Forming

- Levitt Beta Sheet Forming

- Levitt Turn Forming

- Dayhoff Matrix

- Helix Termintors

- Chemical Type

User-defined colour groups will also appear in these menus.

Algorithm options

MacVector offers a choice of two methods to deal with conflicting nucleic acids in the consensus calculation:

USE THRESHOLD mode

The Use threshold option generates a consensus residue code for each position based on the highest percentage agreement that can be found between the aligned sequences. For example, if in 10 aligned sequences there are 9 Gs and one C, there is 90% agreement with a consensus code of G. The percentage agreement must exceed a minimum threshold value. Use the Threshold text box to set this value, within the range 51-100%. If the consensus threshold is not met by any single-base code, MacVector then tests all 2-base ambiguities against the criterion (see "IUPAC key" in the Windows help menu). For example, if the separate percentages of G and C fail the criterion, but their combined percentage meets it, the consensus line will indicate S at that position. If no 2-base ambiguity meets the criterion, MacVector then tests all 3-base ambiguities. If all these fail, the consensus residue at that position will be N.

In Threshold mode, ambiguous base codes in sequences are treated as equal probabilities of the alternative bases (e.g. an S residue contributes 0.5 C and 0.5 G to the consensus base for its position).

IGNORE THRESHOLD mode

When the Ignore Threshold option is selected, the residue code on the consensus line must match every residue in that column, so that ambiguous residue codes may be required. For example, if in 10 aligned sequences there are 9 Gs and one C, the consensus code is S. The Threshold text box is disabled when you select this option, since the effective threshold is fixed at 100%.

For aligned protein sequences, the threshold method is always used to generate the consensus code.

TREATMENT OF GAPS in sequences

The Treat gaps as valid characters check box lets you choose whether to treat gaps as valid characters in the consensus calculation, or to ignore them. If gaps are valid characters, then the consensus line will show a gap wherever the percentage of gaps exceeds the threshold. For example, if the threshold is set at 51% and there are 11 aligned sequences, with 6 gaps, 3 Gs and 2 Cs, then the consensus code for that position will be "-" (gap), with 55% agreement. If gaps are ignored, agreement is calculated as a percentage of the remaining total. In the example above, the consensus code will be G, with 60% agreement.

Appearance of the consensus line

The No gaps in consensus check box allows you to replace the gap character "-" in the consensus line with a space.

The No spaces in consensus check box lets you replace undefined characters in the consensus by N (in nucleotide windows) or X (in protein windows).

The No ambiguities in consensus check box is used only with nucleotide alignments. It allows you to display only unambiguous residue codes (A, G, C, T or U) on the consensus line.

Related Topics.

The MSA Editor

Setting MSA Editor display options

Displaying a consensus line

Working with color groups