The consensus shown below the reference sequence in the Align to Reference Editor is calculated according to the following rules:
- For each consensus residue, the algorithm counts the number of overlapping A, G, C, T and gap residues. Scoring is as follows:
A non ambiguous residue scores 12
A two residue ambiguity scores 6
A three residue ambiguity scores 4.
An 'N' scores 3 for each residue.
A gap scores 12 for gap.
For example Y would score 6 for C and 6 for T whereas B (not A) would score 4 for C, G and T..
- If there are no overlapping residues or gaps, the consensus is set to a space. Otherwise, the required consensus threshold is calculated as the user defined threshold% multiplied by the maximum achievable score. So if there are 10 overlapping residues and the threshold is set to 75% the threshold score would be 0.75 * 120 = 90.
- If any single residue or the gap score meets or exceeds the threshold, then that residue is chosen as the consensus.
- Otherwise, if a single residue plus the gap score meets or exceeds the threshold, then the lower case of the residue is used as the consensus.
- Otherwise, we repeat with all the two-residue ambiguities. If a combination of two residues meets or exceeds the threshold, that IUPAC ambiguity is used as the threshold. If two (or more) different two residue ambiguities exceed the threshold, but have the same score, then we use 'N', otherwise we return the higher scoring ambiguity.
- Otherwise, if the highest scoring two residue ambiguity (not three residue ambiguities) plus the gap score meets or exceeds the threshold, we return the lower case ambiguity.
- If all else fails, we return 'N'.
For example if 10 reads are aligned the maximum score is 120. If the threshold is set to e.g. 75% then we look to see if any individual residue (or gap) has a score of 90 or above. If not, then we look for the highest combination of two residues (or a residue and a gap). If that exceeds 90, then that ambiguity is called (a residue + a gap is called as the lower case residue). If there is a tie, I *think* we call the triple ambiguity. If no combination of 2 residues exceed (or match) the threshold then we try a 3way score. Remember that a 3way score could be A+T+gap which would then be called as a lower case "w". Then we try a 4-way score or even a 5way. e.g. for 2A+2G+2C+2T+2gap you would get a consensus of lower case "n".
Related Topics.
The main window
Automatic assembly
SNP Report
Editing assemblies
Saving assemblies