The Coverage tab shown in reference contigs assembled by Bowtie displays detailed information about the read coverage across your reference sequence.
This report contains useful data for RNASeq expression analysis. This contains very detailed information about each feature (CDS) needed for such analysis such as RPKM and TPM. For a graphical view of the coverage over multiple references take a look at the Coverage tab of the main Assembly Project window.
The Coverage tab has a number of columns.
- Name: this is the preferred name of the feature. For CDS features it is typically the contents of the /gene qualifier, but MacVector will use other qualifiers if /gene is not present.
- Type: the type of feature. By default MacVector only displays CDS and gene features but other feature types can be requested.
- Start: the start location of the feature.
- Stop: the stop location of the feature.
- Length: the length of the feature.
- Depth: the average depth of coverage across the entire length of the feature (rounded down).
- Reads: the total number of reads that aligned to the feature.
- RPKM: Reads Per Kilobase of transcript per Million mapped reads. This is a common calculation used to normalize the data to facilitate comparison of expression levels between genes. It is calculated as follows;
- Count up the total reads in a sample and divide that number by 1,000,000 – this is our “per million” scaling factor.
- Divide the read counts by the “per million” scaling factor. This normalizes for sequencing depth, giving you reads per million (RPM)
- Divide the RPM values by the length of the gene, in kilobases. This gives you RPKM.
- TPM: Transcripts Per Kilobase Million. This is a variation on RPKM that is calculated slightly differently;
- Divide the read counts by the length of each gene in kilobases. This gives you reads per kilobase (RPK).
- Count up all the RPK values in a sample and divide this number by 1,000,000. This is your “per million” scaling factor.
- Divide the RPK values by the “per million” scaling factor. This gives you TPM.
The advantage of using TPM is that this normalizes the data between different experiments so that you can directly compare the values for the same gene between different runs.
Exporting Data to Excel
The data in the Coverage tab is formatted to simplify exporting the columns into Microsoft Excel for further analysis. Specifically, the columns are tab-separated so that when you copy and paste into Excel, each value gets pasted into a separate cell.
Related Topics.
Assembler
Align to Reference
Editing Assembler contigs
Saving assemblies
Map View
Bowtie
Bowtie: Technical details