MacVector icon

Scoring matrix file

You can make the following modifications to the scoring matrix files to customize both the matrix comparisons and database searches:

- change the match/mismatch scores in the scoring matrix

- change the hash codes used during the hashing step

- change the deletion penalty

- change the gap penalty

- change the parameters used to calculate the cut-off score for performing an optimized alignment.

Match and mismatch scores

The scoring matrix is displayed as a lower-triangular matrix. The number located at coordinate (H,V) in the matrix is the score that is assigned for an alignment between the two residues H and V.

From left to right across the horizontal axis and from top to bottom down the vertical axis, the order for nucleotide codes is:

- A C M G R S V T W Y H K D B N

and for amino acid codes is:

- A C D E F G H I K L M N P Q R S T V W Y B Z X *

The match and mismatch scoring is symmetric, so any change to the score for a pair (H,V) is automatically applied to (V,H) in the matrix.

TO EDIT the match/mismatch scores

1. Open the scoring matrix that you want to edit.

A window opens, displaying the lower-triangular scoring matrix.

2. Select the value to be edited using one of the following methods:

- click directly on a value

- type the first letter of a pair, then hold down the Shift key and type the second letter.

3. Type the new score (it may be positive or negative) and press the Enter key.

The scores are symmetrical, so if you change the score for the aligned pair A by C, the score for the pair C by A is changed simultaneously. Values assigned to the scoring matrix must lie between -99 and 99.

4. Select OK to save the changes.

5. Choose File | Save to make the changes permanent.

Hash codes

All residues that have the same hash code are treated as identical residues by the program. As an example, look at the hash codes for the nucleic acid scoring matrix DNA database matrix. The ambiguous bases W (which stands for A or T) and D (not C) are arbitrarily assigned the same hash code as T. There are 21 possible hash codes for amino acids (0 to 20) and four possible for nucleic acids (0 to 3).

To edit the hash codes

1. Open the scoring matrix that you want to edit.

A window opens, displaying the lower-triangular scoring matrix.

2. Click on the button in the shaded header at the top of the window that is labeled with two interlocking rectangles.

The Hash Code Editor dialog box is displayed.

3. Type values in the text boxes as required to make the residue types equivalent.

4. Select OK to save the changes.

5. Choose File | Save to make the changes permanent.

Tweak values

The tweak values are used to evaluate alignment scores, and can be edited if required.

The cut-off score parameters are substituted into an equation that is used to calculate the minimum score that an initial alignment must have before an optimized alignment is performed. Ordinarily, these parameters should be left at their default values. Accepted values are from 1 to 200.

The deletion penalty (single residue indel) is the value subtracted from the score of an aligned region for every one-residue insertion or deletion that was introduced in order to improve the alignment.

If an insertion or deletion longer than one residue was introduced into an alignment, the score is reduced by the gap penalty (continuing indel) times the number of bases in the insertion or deletion after the first one.

Accepted values for the deletion and gap penalties are from 0 to 100. By altering the values for these penalties, you can control the number and size of the insertions or deletions that you will permit when aligning two sequences.

A large deletion penalty coupled with a small gap penalty will make it difficult to introduce an insertion or deletion into an alignment, but will make it fairly easy to extend one when it is introduced. Alignments performed using this combination will not contain many insertions or deletions, but those that do occur may be fairly long. This type of setting should be used if you think that sequences in the database may contain insertions or deletions of blocks of residues when compared to your query sequence.

A small deletion penalty coupled with a large gap penalty will tend to produce alignments containing many short insertions or deletions. You would use this type of setting if you expect that differences between the sequences will involve single-residue insertions or deletions.

TO EDIT tweak values

1. Open the scoring matrix that you want to edit.

A window opens, displaying the lower-triangular scoring matrix.

2. Click on the button in the shaded header at the top of the window that is labeled with a mouse icon.

The Tweak Editor dialog box is displayed.

3. Type values in the text boxes as required to modify the tweak values.

4. Select OK to save the changes.

5. Choose File | Save to make the changes permanent.

Related Topics.

Analyze Menus

Align to folder