MacVector icon

Vector trimming

Typical sequencing projects use a directed or shotgun sub-cloning approach to generate short overlapping sequences that are then assembled into a single longer sequence. It is common for the reads to have vector sequences at the beginning and/or end which can interfere with the assembly. cross_match is an algorithm that can be used to mask out any vector sequences to prevent this interference. This is not an absolutely essential step as phrap (the assembly algorithm we will use) can often detect the vector sequences in a collection of similar sequences. However, using cross_match is highly recommended to reduce the likelihood of anomalous assemblies.

USING CROSS_MATCH

Make sure you have no selections in the project window. To toggle a selection off, click on the selection while holding down the command ( ) key.

- Choose Analyze | Vector Trim (cross_match).

The cross_match parameters dialog will appear. The algorithm needs to know which vectors were used in the sequencing experiments, so the first time you use this, dialog will initially display the empty Vectors tab.

- In the Vectors tab, click on the Add… button to bring up the file selection dialog. Select the files containing your vectors.

The tab will refresh to reflect the new vectors that have been added. You can also click on the "Add Vector:" popup menu to add one of the recent vector files that you used. The menu remembers the last 20 vector files you added to any project, so you can use this as a shortcut to rapidly import common vectors into any new projects you create. The files you select should be in either MacVector single sequence or FastA text format. FastA format files can have multiple sequences in them, so you can create just one file with all of the vectors you use on a regular basis to further simplify vector importing.

- Click on the Parameters tab in the dialog to view the other cross_match parameters. The default values are usually adequate for most projects.

- Click on the OK button to dismiss the dialog and run the algorithm.

The algorithm should complete within a few seconds. The project window then updates with the new data.

VIEWING TRIMMED SEQUENCES

The status of each sequence in the assembly project gets changed to include an "X" to indicate that they have been trimmed with cross_match. Typically, the ClipL entries will show values other than "1" indicating that vector sequences were masked at the beginning.

- Double-click on any vector trimmed trace sequence to open up a trace editor window. Note that viewing the trim is currently not supported for plain sequence files. Masked residues are shown in gray italics.

ADDITIONAL DOCUMENTATION

The phrap.pdf file in the MacVector/Documentation/ folder contains the original University of Washington documentation for the cross_match algorithm.

Related Topics.

Assembler

The project window

Short Read Assembly

Saving assemblies

Assembling sequences

Base calling

Assembling