MacVector icon

Performing a ClustalW alignment

ClustalW, Muscle, and T-Coffee are progressive alignment algorithms. Progressive alignments generally build a guide tree that represents the pairwise relationships between each possible pair of sequences in the alignment. A multiple sequence alignment is then built sequentially using the tree as a construction guide.

You can display this Guide tree for ClustalW alignments and T-Coffee alignments. You cannot show a Guide Tree with a Muscle alignment.

All three algorithms are integrated into the MSA editor. This means you can try all three algorithms on the same alignment to see the results.

ClustalW aligns sequences in two stages. The first, pairwise, stage compares each sequence individually with every other sequence. The Pairwise Alignment parameters control the speed and sensitivity of this stage. The second, multiple alignment, stage progressively merges the sequences, starting with the pair that scored highest in the pairwise stage. The Multiple Alignment parameters control this stage.

The parameters that can be set within each panel vary depending on the type of sequences being aligned and the alignment speed that is selected. To perform your alignment, set parameters in each panel according to your sequence type and alignment requirements.

  1. TO PERFORM a ClustalW alignment

  2. Choose Analyze | ClustalW Alignment.

    The ClustalW Alignment dialog box is displayed.

    SETTING options in the Pairwise Alignment panel

  3. If you are aligning protein sequences, choose the protein scoring matrix from the Matrix drop-down menu.
  4. Choose from the Alignment Speed drop-down menu:

    - Slow means that the initial pairwise alignments are performed using a full dynamic programming algorithm

    - Fast uses the Wilbur & Lipman (1983) method.

  5. For Slow alignments, do the following:

    - enter a value between 0 and 100 in the Open Gap Penalty text box. This is the score that is subtracted when a gap is inserted in an alignment. Increasing the gap opening penalty makes gaps less frequent

    - enter a value between 0 and 10 in the Extend Gap Penalty text box. This is the penalty for extending the gap by 1 residue. Increasing the extend gap penalty makes gaps shorter. Terminal gaps are not penalized.

  6. For Fast alignments, do the following:

    - choose a value from the Ktuple drop-down menu. This is the number of consecutive residues that must match the query sequence exactly before MacVector will attempt to score the matching region. Increasing the value speeds the alignment, but if the value is too large, significant homology between sequences may be missed. Throughout MacVector, Ktuple size is also referred to as hash size or word size

    - enter a value between 1 and 500 in the Gap Penalty text box. This is the penalty for each gap. The parameter has little effect on speed or sensitivity unless extreme values are entered

    - enter a value between 1 and 50 in the Top Diagonals text box. This value determines the number of Ktuple matches on each diagonal in an imaginary dot-matrix plot. A large value increases sensitivity, a small value makes the alignment faster

    - enter a value between 1 and 50 in the Window Size text box. This value specifies the window size around each top diagonal. Diagonals that fall inside this window are used in the alignment. Increasing the windows size results in a more sensitive, but slower, alignment. Decreasing the window size increases the speed, but may result in small regions of homology being missed.

    SETTING options in the Multiple Alignment panel

  7. If you are aligning protein sequences, choose the protein scoring matrix series from the Matrix drop-down menu.
  8. Enter a value between 0 and 100 in the Open Gap Penalty text box. This is the score that is subtracted when a gap is inserted in an alignment. Increasing the gap opening penalty makes gaps less frequent.
  9. Enter a value between 0 and 10 in the Extend Gap Penalty text box. This is the penalty for extending the gap by 1 residue. Increasing the extend gap penalty makes gaps shorter. Terminal gaps are not penalized.
  10. Enter a value between 0% and 100% in the Delay Divergent text box. By delaying the alignment of the divergent sequences until after the most closely related sequences have been aligned, a more accurate alignment is achieved. The default value is 40%, which means that if a sequence is less than 40% identical to any other sequence, its alignment is delayed.
  11. Choose from the Transitions drop-down menu as follows:

    - weighted to give higher weightings to transitions (A <-> G, T <-> C) compared to transversions (A <-> T, A <-> C, G <-> T, G <-> C)

    - unweighted to treat transitions and transversions equally.

    NOTE: This option is only present for nucleic acid alignments.

    SETTING options on the Protein Gap Parameters panel

    If you are aligning protein sequences, you have a number of additional parameters to set.

  12. Enter a value between 0 and 100 in the Gap Separation Distance text box. This allows you to discourage gaps being opened too close together by increasing the open gap penalty for a specific distance from existing gaps. For example, if the gap separation distance is set to 8, the open gap penalty is increased if the new position is within 8 residues of an existing gap.
  13. Select the End Gap Separation check box to treat end gaps as internal gaps, to avoid gaps opening too close together, as specified by Gap Separation Distance. If disabled, end gaps are ignored.
  14. Select the Residue-specific Penalties check box to modify the gap opening penalty at each position in the alignment or sequence. The Open Gap Penalty is multiplied by a residue-specific gap modification factor. If there is a mixture of residues at the position, the multiplying factor is the average of all the contributions from each sequence.
  15. Select the Hydrophilic Penalties check box to increase the chances of a gap opening within a hydrophilic stretch, as defined by Hydrophilic Residues.
  16. In the Hydrophilic Residues text box, enter the IUPAC one-letter code for each residue that is to be considered hydrophilic. Any run of 5 hydrophilic residues is considered to be a hydrophilic stretch.

    ALIGNING sequences

  17. From the scrolling list in the Sequences To Align panel, highlight the sequences that you want to align. Use Shift-click or command(Apple)-click to toggle selections. Sequences are aligned in the order they appear in the list.
  18. If you want to change the order in which sequences are aligned, do the following:

    - put the mouse pointer over the arrow at the left of the sequence you want to move

    - hold the mouse button down and drag the sequence to the required position

    - release the mouse button

  19. Select OK to perform the alignment.

    A dialog box is displayed, informing you of the progress of the alignment. When the alignment is complete, the ClustalW Alignment Results dialog box and the Multiple Sequence Alignment Editor window are displayed.

Related Topics.

Analyze Menus

ClustalW

Performing a Muscle alignment

Performing a T-Coffee alignment

Displaying alignments

Guide tree