MacVector icon

How do I extract specific reads from an NGS dataset

You can use the Database | Align To Folder tool to scan large fasta or fastq files containing NGS data to find and retrieve just those reads that match a specific target sequence.

This will scan folders on your file system containing sequences (including any reads in gzipped fasta and fastq files) searching for matches to an input sequence. You can then retrieve the matching sequences/reads of interest into a much smaller fasta/fastq file. The search is aware of paired-end reads, so when you retrieve hits, both reads of a pair will be saved into a pair of fasta or fastq files, even if only one of them matched the query sequence.

How to use Align to Folder to extract reads that match your target sequence.

  1. Database | Align to Folder.
  2. Set the Search Folder to the location of your data and select the Folder.
  3. Click OK
  4. To retrieve the hits, select the numbered rows in the Folder Description List results tab (Use SELECT ALL)
  5. Database | Retrieve to File...

For single ended (SE) reads a single file will be produced. If you used paired-end data )PE), two files will be produced, -1.fastq and -2.fastq.

Related Topics.

Align to Folder

Mapping sequences against a reference.

How do I? - videos