Assembler supports short read data in the Fastq format. This is a widely accepted format, that only contains basecalled sequence and quality data for each read. Fastq does not contain any of the image data that raw read formats (such as SRF and SFF) contain and is therefore small enough to be practically used on a desktop Mac. For example an SRF file that contained the same number of reads as a 200Mb Fastq file would likely exceed 20Gb in size.
Most Sequencers software supports the export of short read data in the Fastq format. Additionally the Short Read Archives of the EBI and NCBI allow data to be downloaded in Fastq format, or their own short read archive format which can be quickly converted to fastq.
Reads in the Fastq format may also contain a quality score for each read. There are three variations of encoding and two different quality scoring schemes commonly found in Fastq format. The original and most common format contains the standard Phred quality score. The Phred quality score has a range of 0 to 93. Since this is a two digit score it is encoded by using the ASCII characters codes from 33 to 126, and so it is commonly called Phred33. The second most common quality score found in Fastq files is the Illumina 1.3 format that is also a Phred quality score from 0 to 40. This is encoded using the ASCII characters from 64 to 104, and so is commonly called Phred64. The third format is the Solexa/Illumina 1.0. This format is now deprecated. It uses a Solexa/Illumina quality score from -5 to 40 using ASCII 59 to 104.
When adding reads in Fastq format MacVector will ask which type of reads the file contains and also the quality score. Reads are represented by a different symbol in the Map View depending on whether the read is from a 454, SOLiD, Illumina or a Sanger sequencer. This is useful for hybrid assembly as you can easily visualize the provenance of the reads. Fastq files are added as a file based sequence collection. This stores a reference to the original file rather than importing and storing this file. This is done to save disc space, as fastq filesizes may be many gigabytes. If you move the original file you will need to use the Locate button in the Assembly Project window to restore the new filepath.
MacVector will import single read files containing up to 15,000,000 reads, separate paired files can have up to 30,000,000 reads (15,000,000 pairs) and interleaved files can have up to 15,000,000 reads (7,500,000 pairs).