Dotplots are one of the earliest pairwise alignments tools used in Molecular Biology. For finding repeats in a sequence, or for finding areas of similarity between two sequences, they are invaluable. The MacVector implementation of DNA matrix analysis uses techniques developed by Pustell. The method enables you to:
- look for regions of similarity between two sequences using a dot matrix plot
- display matching regions as either diagonal lines or as diagonally arranged characters whose values indicate the degree of similarity.
- look for regions of similarity at the amino acid level between a nucleic acid sequence and a protein sequence using a dot matrix plot
- Look for repeats and inverted repeats
- Choose Analyze | Pustell Protein - DNA.
- In the left-hand scrolling list, select a protein sequence to display along the X-axis. Only protein sequences appear in this list.
- In the right-hand scrolling list, select a DNA sequence to display along the Y-axis. Only DNA sequences appear in this list.
- You can limit the analysis to a region of each of the sequences by typing in the numbers that bracket the region in the X-Region and Y-Region text boxes, or by selecting a region from the features table drop-down menu that appears to the right of the text boxes.
- Select the Scoring Matrix button to choose a scoring matrix file.
- Enter a window size in the window size text box. Whenever MacVector finds an exact match, it examines the segment (or window) of the aligned sequence that surrounds the matching region. The length of the segment is the value typed in for window size.
- Enter a minimum score in the min. % score text box. Whenever MacVector finds an exact match, it computes a total score for the window using the match / mismatch scores in the scoring matrix. It then determines a percent score by dividing the window's score by the score that would occur if all of the bases in the window matched. If this percent score equals or exceeds the value of min. % score, the window is saved.
- Choose a value from the hash value drop-down menu. The hash value is a measure of how long an exact match between two sequences must be before MacVector will attempt to score and align that matching region. A hash value of 1 is the most sensitive, 2 is the least sensitive.
- Use the strand drop-down menu to choose whether to use the plus strand of the DNA sequence (++), the reverse complement strand (+-), or both. For the initial analysis, we recommend that you choose both.
- Choose the genetic code for the translation of the DNA from the genetic code drop-down menu.
- Select OK to perform the analysis.
Alternatively, select Defaults to restore the default settings, or Cancel to close the dialog box without performing the analysis.
TIP: For most comparisons, start with a hash value of 2, because it is unusual for two sequences to possess significant similarity without having regions of that size that match exactly.
Related Topics.
Analyze Menus
Performing a DNA matrix analysis
Displaying DNA matrix results
Performing a protein matrix analysis