There are two ways to open a Contig Assembly Project window;
- choose Assembly Project from the File | New menu.
- Open a previously saved assembly project.
The Assembly Project window contains two views, the Project Tab and the Properties Tab
The window contains a toolbar and a scrollable list of the contents of the project. The list will initially be empty, but you can add sequences, reads, reference sequences and chromatogram files.
PROJECT WINDOW
PROJECT WINDOW TOOLBAR
The toolbar has the following controls/buttons from left to right;
- The Add Seqs button lets you add trace files, MacVector sequence files or FastA format sequences to the project. This is a shortcut to the main menu item Edit | Add Sequences from File... When adding reads in Fastq format MacVector will ask which type of reads the file contains and also the quality score.
- Clicking the Remove button lets you remove sequences from the project. If you have assembled sequences into a contig, clicking on the minus button will dissolve any selected contigs and move the assembled sequences back to the main level of the project.
- Clicking the Reset button will delete all clipping information from each trace file. This means any "bad sequence" from either end of the trace will be reset to be "good sequence". All other filetypes are ignored.
- Clicking on the prefs button opens up the project Preferences dialog (see below)
- The Phred button will basecall any trace files in the project. If no trace files are present this button is disabled.
- The Crossmatch button will allow you to do vector trimming.
- The Phrap button will bring up the phrap dialog for de novo assemblies.
- The Velvet button will bring up the Velvet dialog for de novo assembly.
- The SPAdes button will bring up the SPAdes dialog for de novo assembly.
- The Bowtie button will bring up the Bowtie dialog for reference assemblies. Note at least one reference sequence and one reads file must be added to the project for this button to be enabled.
- The Add Ref button allows you to add a reference sequence for use in reference assemblies.
PROJECT PREFERENCES DIALOG
This has two tabs; (a) The Settings tab lets you adjust a number of default display properties for the Contig Editor window.
- Use Tiled Mode - when selected, all assembled sequences are displayed in the contig editor, with each sequence placed on a separate line. When not selected, only the assembled sequences that overlap the current visible range are displayed.
- Consensus at top - in "non-tiled" mode, the consensus may be in the center of the screen, with the forward sequences above the consensus and the reverse sequences below. When this checkbox is selected, the consensus is placed at the top of the screen. In tiled mode, the consensus is always at the top of the screen.
- Enable mouse-over qualities - when this is selected, you can hover the mouse pointer over a residue in the contig editor to display the phred quality value of that residue. Deselect this if you find the automatic display of the values to be annoying.
(b) The Vectors tab displays the vector sequences associated with the current project. The vectors are used by the cross_match vector trim algorithm to mask out any vector sequences in each sample file.
- Add Vector: - this popup menu has a list of the last 20 vector sequence files you added to any project. You can select items from this list to add the sequence(s) in that file to the current project.
- Add - click on this button to select one or more files containing vector sequences to be added to the project. You can import sequence from files in MacVector and/or FastA formats.
- Remove - select one or more vectors in the list, then click on this button to remove those vectors from the project.
PROJECT WINDOW COLUMNS
The project window has a number of columns that display information about the individual sequences and contigs. Most of the columns can be sorted by clicking on the column header.
- Name the name of the sequence. All sequences and contigs in a project MUST have a unique name. If you try to import sequences with duplicate names, you will be prompted to choose how they should be handled. The icon next to the name indicates if the object is a contig, a trace or a plain sequence. You can directly edit this field to change the name.
- Status initially blank, the status field indicates different messages dependent on the sequence it is associated for. For single reads it will show whether it has been base called with phred ("P") or masked for vector sequences with cross_match ("X"). This will show the document type for file-based sequence collections and show "REF" for reference sequences. It will be blank for contigs, reference contigs and child contigs.
- Length the length of the sequence or contig.
- # - for contigs, this field indicates the number of reads that have been assembled. For sequences in a contig, the field indicates orientation using "->" for forward reads and "<-" for reverse reads.
- ClipL the first residue from the 5' end that is not masked. Typically this will be "1", although cross_match or phrap may change this.
- ClipR the last valid residue at the 3' end of a sequence. Initially, this is simply the last residue of the sequence, but cross_match and phrap may change this.
- Start for sequences in a contig, the start location of the sequence within the contig.
- Stop for sequences in a contig, the location of the last residue of the sequence within the contig.
- Definition any descriptions associated with a sequence.
You can double-click on an item to open up the editor associated with the object, e.g. the trace editor or the contig editor. Note that you can only directly edit sequence that are supported by the MacVector Single Sequence Editor by double-clicking on them. For example you cannot edit fastq, fasta or plain text files. You cannot see individual reads associated with a child or reference contig, or indeed any file-based sequence collection You should complete any editing on these before adding them to the project.
SEQUENCES LIST
- Newly added sequences will appear in the list with a suitable icon next to their names indicating the type of sequence imported.
Note that you can hold down the <shift> key to select multiple sequences to import. Also note that reads in a fastq file that have less than 10 bases, have mismatched quality and sequence lines, or have missing quality lines will be ignored.
- Fastq files are added as a file based sequence collection. This stores a reference to the original file rather than importing and storing this file. This is done to save disc space, as fastq filesizes may be many gigabytes. If you move the original file you will need to use the Locate button in the Assembly Project window to restore the new filepath.
- Once a read has been incorporated into a contig (of whatever type) then it is removed from the individual or file based sequence list.
- Clicking on the disclosure triangle next to reference contig will show the individual child contigs that belong to the reference contig.
- Clicking on the disclosure triangle next to a contig will show the individual reads that make up that contig.
PROPERTIES WINDOW
- Clicking on the Properties Tab shows a summary for the project.
Related Topics.
Assembler
Automatic Assembly of Sub-projects with Phrap
Align to Reference
Importing Fastq data
Editing contigs
SPAdes; de novo assembly
Heterozygote Analysis of Sanger Trace files
Map View