Bowtie
Bowtie is an gapped aligner capable of extremely fast alignments of short and long sequences against a much larger reference sequence.
Using Bowtie
- Chose File | New | Assembly Project to create a new empty Assembly Project window.
- Click on the Add Reads tool bar button, then select the sequence files you wish to assemble and click on the Open button.
Note that you can hold down the <shift> key to select multiple sequences to import. You can also drag and drop reads files onto the assembly project window. Also note that reads in a fastq file that have less than 10 bases, have mismatched quality and sequence lines, or have missing quality lines will be ignored. This filtering step is currently disabled for paired end reads.
Fastq files are added as a file based sequence collection. This stores a reference to the original file rather than importing and storing this file. This is done to save disc space, as fastq filesizes may be many gigabytes. If you move the original file you will need to use the Locate button in the Assembly Project window to restore the new filepath.
- Click on the Add Ref tool bar button, then select the reference sequences you wish to align the reads against and click on the Open button.
- Choose Analyze | Bowtie to run the Bowtie algorithm on all of the sequences in the project. Note that if no sequences are selected, Bowtie will be run on ALL of the files in the project. However, if any sequences are selected then at least one reference sequence and at least a single file containing reads must be selected.
In the dialog you'll see an important setting called presets. There are four presets from Very sensitive to Very Fast.
You can also disable the generation of child contigs. A child contig is defined as a region of the reference contig that is bounded by two regions without overlapping reads and at least a single base with no coverage (or either end of the reference sequence). For many tasks, such as RNA-Seq analysis or variant analysis they are not needed, and will increase performance.
Paired Reads
Paired reads can be aligned against a reference. By default if two reads files are selected this will be enabled. If three or more files are selected this will be disabled. However, it can manually be turned back on. The read files must be sequentially numbered so that when they are submitted the pairs will be together.. For example "READSFILE_A_1.fastq", "READSFILE_A_2.fastq", "READSFILE_B_1.fastq" and "READSFILE_B_2.fastq" will work for two pairs called READSFILE_A and READSFILE_B.
Assembly
There are 11 steps to a Bowtie assembly
- Determining the read file encoding (step 1 of 11)"
- Creating reference FASTA file (step 2 of 11)
- Creating read FASTQ files (step 3 of 11)"
- Running Bowtie indexing and analysis... (step 4 of 11)"
- Extracting the consensus sequence and contigs (step 5 of 11)
- Generating coverage data for (step 6 of 11)"
- Generating contig for " (step 7 of 11)"
- Generating INDELs for ? (step 8 of 11)"
- Generating child contigs for " (step 9 of 11)"
- Generating SNP report for "(step 10 of 11)"
- Gathering unassembled reads (step 11 of 11)"
Viewing Contigs
- In the Assembly Project window, double click on the reference contig to open it within the The Contig Editor
- Click on the disclosure triangle next to the reference contig to reveal the individual child contigs. Note that a child contig is defined as a region of the reference contig that is bounded by two regions without overlapping reads and at least a single base with no coverage (or either end of the reference sequence).
- Double click on a child contig to open it in the Contig Editor. Note how the #, Start and Stop columns have been updated to display additional information. The number of reads assembled in the contig is indicated on the top line, while the orientation of each read in the contig is indicated on the other lines. The start and stop locations of each child contig is also indicated in the name of the child contig.
- Reference Contigs may be renamed by OPTION - Clicking on the Reference Contig name.
- You can edit the contig as described in the The Contig Editor section. In addition, you can directly run any MacVector nucleic acid analysis on the consensus sequence of the contig.
- The settings used to produce a Bowtie alignment are stored within the Comment Annotations field of the Reference Contig.
- Contigs can be easily exported to use in other applications or for further assembly.
- The BAM files produced by Bowtie are contained within the File Package that the Assembly Project is saved as.
Variant reporting
Assembler has multiple reports showing variants in the assembly. These are described in Contig Editor for Reference Assemblies.
The INDEL reporting is taken from the VCF tab. These are determined using the FQ column (consensus quality) in the VCF file. If positive, FQ equals the phred-scaled probability of there being two or more different alleles. If negative, FQ equals the minus phred-scaled probability of all chromosomes being identical
Related Topics.
Assembler
Quick Start
Assembling sequences
Bowtie
Importing Fastq data
Bowtie Preferences
Short Read Assembly
Importing existing assemblies to an Assembly Project
Base calling
Saving assemblies