Flye is a de novo assembler tuned for error-prone long reads such as those produced by Pacific Biosciences and Oxford Nanopore Technologies sequencers. It is remarkably fast and can often produce full length complete genome assemblies of smaller genomes (e.g. less than 20 Mbp) though, in the absence of additional high quality reads, the final consensus sequences tend to have a fairly high error rate, particularly in homopolymers runs. Nonetheless, Flye can be a very valuable tool for analyzing genomes as the long reads can get past typical repeat sequences (Insertion Sequences, rRNA operons etc) to generate the correct structure of a genome.
The general steps to run Flye are;
- Create a new Assembly Project FILE | NEW | ASSEMBLY PROJECT
- Click on the Add Reads button to add PacBio or Oxford Nanopore reads in fasta, fastq or gzipped (.gz) format.
- Double-click on the Status column of the imported read file(s) and set the data type to "PacBio" or "Oxford Nanopore" as appropriate.
- Select the file(s) you want to assemble and click on the Flye toolbar button.
- Set the Expected genome size to the approximate size of the genome size you are using. If in doubt, err on the small side.
- Choose a suitable Initial minimum coverage. This, along with Expected genome size, is one of the two most critical parameters. The default is 50, but you may try as low as 20 or as high as 500. Sometimes, depending on your input data, even the difference between 50 and 60 can be important. In general, larger is typically better, at the expense of longer processing time, but there are many occasions where smaller values do best.
- There are two polishing options - Flye has its own internal polishing option, but MacVector also includes the option of polishing with Racon. Polishing refers to re-aligning the input data against the calculated consensus to generate a more accurate consensus. Each round of polishing offers decreasing improvements and sometimes too much polishing can actually introduce more errors.
- Because the values of Expected genome size and Initial minimum coverage can be so critical, it is often useful to run Flye assemblies where the additional post-assembly steps such as polishing and calculating the read coverage across each contig are not performed. Select this option to speed up assemblies so that you can try different values for Expected genome size and Initial minimum coverage. Once you have determined the optimal parameters to give you a full length assembly, uncheck the box to run the full suite of polishing tools.
Expect Flye to take anywhere from less than 5 minutes (a small bacterial genome with minimal options) to overnight for larger genomes with multiple rounds of polishing. It is not currently realistic to expect to assemble large mammalian or plant genomes using Flye on a single Macintosh computer.
See the Flye help topic for more details.
Related Topics.
Assembler
Quick Start
Assembling sequences
Short Read Assembly
Base calling
Importing existing assemblies to an Assembly Project
Bowtie
SPAdes
Velvet
Importing Fastq data