The SNP tab in the Align to Reference editor determines possible and probable SNPs in the reference sequence and aligned reads.
Initially SNPs in each aligned read are determined. Where a read contains a single base that differs from the reference sequence then it is considered as a SNP. If the read does not contain trace data then all mismatches are considered as possible. However, if the read contains trace data then the putative SNP is scored by comparing the area under the curve for the called base versus the total area under the curves for all 4 bases. The thresholds are set to 90% for probable and 75% for possible. If the score is below 75% it is not regarded as a SNP. For example if "A" accounts for 90% or more of the signal then it is a "Probable" SNP or if it contains more than 75% it is a "Possible" SNP.
These are listed at the bottom of the SNP tab below the SNP analysis of the reference sequence. An example output for a single read called B02b is as follows:
B02b length 745 1 probable SNPs, 4 possible SNPs Probable SNPs 998 T -> C (92.0) Possible SNPs 518 T -> G (79.5) 705 C -> T (87.2) 1119 C -> G (89.8) 1120 T -> G (89.9)
The number is the position of the putative SNP in the reference sequence. The first base indicates the reference base and the second the putative SNP. The percentage score is also shown.
For the reference sequence SNPs are grouped according to whether they are possible or probable. All SNPs identified in a read are shown, even if only a single read contains that SNP. A SNP is regarded as probable if one or more reads contains a probable SNP at that point. Note: the presence of a SNP that has been identified in a read without trace data means it will be regarded as a probable SNP, even if any corresponding SNPs in trace files are only possible SNPs.
Probable SNPs in the Reference Sequence 616 G -> T = 20% (1/5) 705 C -> T = 33% (1/3) 998 T -> C = 16% (1/6) 2813 T -> C = 12% (1/8) 3080 T -> G = 16% (1/6) Probable+Possible SNPs in the Reference Sequence 518 T -> G = 16% (1/6) 554 T -> G = 20% (1/5) 616 G -> T = 20% (1/5) 705 C -> T = 100% (3/3) 998 T -> C = 66% (4/6) 1119 C -> G = 14% (1/7) 1120 T -> G = 14% (1/7) 1395 C -> T = 14% (1/7) 1410 C -> T = 14% (1/7) 2128 G -> T = 14% (1/7) 2813 T -> C = 12% (1/8) 3080 T -> G = 16% (1/6) 3196 T -> G = 20% (1/5)
The number is the position of the putative SNP in the reference sequence. The first base indicates the reference base and the subsequent bases are ones that differ to the reference. The percentage score and the figure in brackets are the number of reads that contain this base whose score exceeds the cut-off values mentioned above.