TO ALIGN sequences using a scoring matrix pre-filter
- Choose Database | Align to Folder
- Select Folder to Search.
- Choose a value from the hash value drop-down menu. The hash value is a measure of how long an exact match must be between two sequences before MacVector will attempt to score and align that matching region. For protein sequences, a hash value of 1 is the most sensitive, and 2 is the least sensitive. For DNA sequences, a hash value of 1 is the most sensitive, and 6 is the least sensitive. TIP: For most comparisons, start with a hash value of 2 for a protein query sequence, or 6 for a DNA query sequence. This is because it is unusual for two sequences to possess significant similarity without having regions of those sizes that match exactly.
- Enter the number of matching sequences to retain in the scores to keep text box. If the program finds more matches than this, the list will be trimmed by dropping sequences with the lowest scores.
- Choose from the processing drop-down menu whether an optimal alignment should be done on-the-fly or at the end of the database search, as follows:
- choose none to save the sequences in order of the initial score. If more matches are found than you wanted to keep, the sequences with the lowest initial scores are dropped. At the end of the search, MacVector performs an optimal alignment of each of the saved sequences with the query sequence. Indels and gaps are introduced if they will improve the optimized score. The matches are then listed by optimized score in the results windows.
- choose align to perform an optimal alignment for any sequence whose initial score exceeds a minimum cut-off score. The matching sequences are saved in order of the optimized score rather than the initial score. If more matches are found than you wanted to keep, the sequences with the lowest optimized scores are dropped.
- Select Scoring Matrix to choose a scoring matrix file.
- You can limit the query sequence to a region of the entire sequence by typing in the numbers that bracket the region in the Region text boxes, or by selecting a region from the features table drop-down menu that appears to the right of the text boxes.
- If you have a protein query sequence, select the align to DNA check box to compare the query only to the nucleic acid sequences that are in the folder. NOTE: The align to DNA check box only appears when you have a protein query sequence. Each nucleic acid sequence is translated on-the-fly in all six reading frames and the resulting amino acid sequences are compared with the protein query sequence. Do not select this check box if you want to compare the query with protein sequences.
- If the align to DNA check box is selected, use the genetic code drop-down menu to choose the genetic code that will be used to translate the nucleic acid sequences in the folder.
- Select OK to perform the alignment.
When the analysis is complete, the Folder Sequence Query dialog box is displayed. This dialog box is described in the "Displaying alignment results" help topic.
Retrieving hits
- Highlight the hit, or hits, that you want to retrieve.
- If you want to open the sequences on your Desktop choose DATABASE | Retrieve and Open. The hits will be opened but not saved.
- If you want to save those files to your disk choose DATABASE | RETRIEVE TO DISK. The files will be saved but not opened.
- If you want to save hits to a single FASTA or FASTQ file then choose DATABASE | RETRIEVE to FILE. All hits will be saved direct to a multi sequence file. This is very useful for retrieving reads from large fastq files.
Related Topics.
Align to folder
Displaying alignment results
Importing Fastq data