The heterozygote analysis tool allows you to either view heterozygotes in Sanger trace files or to permanently change the basecalled sequence with an ambiguity representing the called heterozygote. The tool works on multiple trace files in the Assembly project manager or the Align to Reference editor. You can also run it on a single trace file in the Single Trace Editor
How to run Heterozygote Analysis
To view putative heterozygotes
Select Trace files in the Align to Reference editor or Assembly Project Editor or open a trace file in the the single trace sequence editor.
- Run ANALYZE - HETEROZYGOTE ANALYSIS
- An Options dialog will appear. Change the options and click OK
- A summary dialog will appear showing the number of heterozygotes found across how many sequences.
- Click OK
- A new tab will appear in the Results window showing the location of the possible heterozygotes.
You can click on the highlighted blue position to be taken to the heterozygote. Note if you are in Assembly Projects or Align to Reference editors then the heterozygote will be displayed in the Single Trace Editor.
To permanently basecall the sequence
Assembler
- Add trace files to your Assembly Project
- Select one more more trace files
- Select ANALYZE | BASECALL | USING HETEROZYGOTE ANALYSIS
- An Options dialog will appear. Change the options and click OK
- A summary dialog will appear showing the number of heterozygotes found across how many sequences
- Click OK
Align to Reference
- go to FILE | NEW | ALIGN SEQUENCES TO A REFERENCE
- Choose your reference sequence
- Add trace files to your Assembly Project
- Select one more more trace files
- Select the BASECALL toolbar button or ANALYZE | BASECALL | USING HETEROZYGOTE ANALYSIS
- An Options dialog will appear. Change the options and click OK
- A summary dialog will appear showing the number of heterozygotes found across how many sequences
- Click OK
Trace File Editor
- Double click to open a trace file.
- Select ANALYZE | BASECALL | USING HETEROZYGOTE ANALYSIS
- An Options dialog will appear. Change the options and click OK
- A summary dialog will appear showing the number of heterozygotes found.
- Click OK
Once run there is a new BASECALL line showing the new sequence. In the Assembler Project editor an H will appear in the status column. Heterozygotes are indicated by an ambiguity.
Ensure the BASECALL toolbar button is toggled to green to view basecalled lines.
Settings
The default settings should be enough for the majority of trace files. However, there are a number of settings that can be adjusted where needed. The default value is in brackets.
- Percent of each peak width to use (50%)
- Minimum Heterozygote threshold (35%)
- Minimum Base Call threshold (75%) This setting has two effects.
- Use peak normalization (on)
- Normalise peak heights using a window of (25) residues
- Minimum number of normalised residues (3)
- When not normalized, the calls are purely based on the areas under the curve for each residue, where the mid-point of the curve is set by the original basecall, we do NOT try to adjust that. We also do not take the "baseline noise" into account. Both algorithms still use a "massaged" set of trace curves where for each peak, we subtract the expected "bleed-over" from the adjacent peaks to better isolate the true signals. Then, we take the area under the curve for each trace for x% either side of the peak. In reality, peaks are typically only 11-12 "units" apart. So a peak is usually the histogram values for 5 residues either side of the peak, so you might only be adding 11 values together to get the area for each trace. If you set the "peak width" to be 50%, then you are just looking at potentially 5 values - the peak, 2 units before and 2 residues afterwards.
- Non-normalized values should always add up to 100% for all 4 traces. In any event, a Heterozygote of 2 residues should NEVER exceed 100%. Normalized % can easily exceed 100% if both residues have a stronger signal than would be "expected", based on the surrounding residues used for the normalization.
- If an individual trace peak exceeds 75% (the default) of the expected value, then it is assumed this is a clean single residue call and ignore the residue as a potential heterozygote and skip to the next residue. It does NOT check that the called residue actually matches the trace track! This helos with primer "blobs" where more than one trace might be very high at a position (particularly at the beginning of reads). This will skip those and not try to treat it as a heterozygote as they are basically just noise.
- In order to call a heterozygote, the combined signal of the 2 residues must exceed 75% of the expected value. So if A was 35% of the expected value and G was 35% of the expected value, combined they would just be 70% and thus fall below the threshold.
- Ignore low quality regions (yes)
- Where a window of (21) residues
- Has an average quality value of less than (20)
Related Topics.
Assembler
The project window
Automatic assembly
Saving assemblies
Assembling sequences
Vector trimming
Assembling
Assembler Parameters