MacVector icon

Phrap

phrap is the assembly algorithm from the University of Washington that has been incorporated into Assembler. It is designed to work in concert with phred and cross_match - in particular it understands quality values and will use them to make better assemblies, particularly in areas with repetitive sequences. Phrap also calculates quality values for each residue in the consensus sequence using the same scale as phred. However, for assemblies a value of 40 (1 error in 10,000) is considered an acceptable value.

phrap is described in more detail in the phrap.pdf document that can be found in the MacVector/Documentation folder. This is the original documentation from the University of Washington. It is somewhat technical in places, but it describes the assembly algorithmic strategy and the effects of changing various parameters in great detail.

USING PHRAP

Make sure you have no selections in the project window. To toggle a selection off, click on the selection while holding down the command (Apple) key.

Choose Analyze | Assemble (phrap).

The phrap parameters dialog will appear. Not all of the parameters described in phrap.pdf are available in the dialog. However, it is unlikely that you will ever need to adjust any parameters other than those displayed in the Basic tab. If you are assembling Short Read data, such as those produced by next generation sequencers, then you will need to select the short read defaults button. However, you will likely achieve better results if you consider using Velvet instead.

Click on the OK button to dismiss the dialog and run the algorithm using the default values. Again, you can close the progress dialog and carry on working elsewhere in MacVector if you expect assembly to take a long time. The job is submitted to the Job Manager and can be monitored from there.

Once assembly is complete, the project window is updated to reflect the data change. In this case, all of the reads should be assembled into a single contig.

VIEWING CONTIGS

In the assembler project window, click on the disclosure triangle next to a contig to reveal the contents of the contig.

The items within the contig are grayed out to indicate that you cannot open them individually. This is to prevent you from inadvertently changing the sequence of a trace that has been carefully aligned in a contig. However, you do have full editing control from within the The Contig Editor. Note how the #, Start and Stop columns have been updated to display additional information. The number of reads assembled in the contig is indicated on the top line, while the orientation of each read in the contig is indicated on the other lines. The start and stop locations of each read within the contig are also indicated in the appropriate columns.

Double-click on a contig to open up the contig in the The Contig Editor.

You can edit the contig as described in the The Contig Editor section. In addition, you can directly run any MacVector nucleic acid analysis on the consensus sequence of the contig.

ADDITIONAL DOCUMENTATION

The phrap.pdf file in the MacVector/Documentation/ folder contains the original University of Washington documentation for the phrap algorithm.

Related Topics.

Assembler

Quick Start

Automatic Assembly of Sub-projects with Phrap

Saving assemblies

Assembling sequences

Short Read Assembly

Bowtie

Velvet

Base calling

Importing Fastq data

Vector trimming

Assembler Parameters