MacVector icon

Assembling reads against a reference with Bowtie

Bowtie

Bowtie is an gapped aligner capable of extremely fast alignments of short and long sequences against a much larger reference sequence.

Using Bowtie

Note that you can hold down the <shift> key to select multiple sequences to import. You can also drag and drop reads files onto the assembly project window. Also note that reads in a fastq file that have less than 10 bases, have mismatched quality and sequence lines, or have missing quality lines will be ignored. This filtering step is currently disabled for paired end reads.

Fastq files are added as a file based sequence collection. This stores a reference to the original file rather than importing and storing this file. This is done to save disc space, as fastq filesizes may be many gigabytes. If you move the original file you will need to use the Locate button in the Assembly Project window to restore the new filepath.

In the dialog you'll see an important setting called presets. There are four presets from Very sensitive to Very Fast.

You can also disable the generation of child contigs. A child contig is defined as a region of the reference contig that is bounded by two regions without overlapping reads and at least a single base with no coverage (or either end of the reference sequence). For many tasks, such as RNA-Seq analysis or variant analysis they are not needed, and will increase performance.

Paired Reads

Paired reads can be aligned against a reference. By default if two reads files are selected this will be enabled. If three or more files are selected this will be disabled. However, it can manually be turned back on. The read files must be sequentially numbered so that when they are submitted the pairs will be together.. For example "READSFILE_A_1.fastq", "READSFILE_A_2.fastq", "READSFILE_B_1.fastq" and "READSFILE_B_2.fastq" will work for two pairs called READSFILE_A and READSFILE_B.

Assembly

There are 11 steps to a Bowtie assembly

  1. Determining the read file encoding (step 1 of 11)"
  2. Creating reference FASTA file (step 2 of 11)
  3. Creating read FASTQ files (step 3 of 11)"
  4. Running Bowtie indexing and analysis... (step 4 of 11)"
  5. Extracting the consensus sequence and contigs (step 5 of 11)
  6. Generating coverage data for (step 6 of 11)"
  7. Generating contig for " (step 7 of 11)"
  8. Generating INDELs for ? (step 8 of 11)"
  9. Generating child contigs for " (step 9 of 11)"
  10. Generating SNP report for "(step 10 of 11)"
  11. Gathering unassembled reads (step 11 of 11)"

Viewing Contigs

Variant reporting

Assembler has multiple reports showing variants in the assembly. These are described in Contig Editor for Reference Assemblies.

The INDEL reporting is taken from the VCF tab. These are determined using the FQ column (consensus quality) in the VCF file. If positive, FQ equals the phred-scaled probability of there being two or more different alleles. If negative, FQ equals the minus phred-scaled probability of all chromosomes being identical

Related Topics.

Assembler

Quick Start

Assembling sequences

Bowtie

Importing Fastq data

Bowtie Preferences

Short Read Assembly

Importing existing assemblies to an Assembly Project

Base calling

Saving assemblies