MacVector icon

How do I automatically sort a sequencing dataset into subprojects

Automatic Assembly of Sub-projects with Phrap (Sub-Assemblies)

New to MacVector 18.6 is the ability to sort and assemble reads from different datasets into individual sub-projects. This functionality is located in the phrap parameters dialog. When enabled and configured appropriately for your dataset it will automatically break out the input reads into sub-projects to be assembled separately.

A simple pattern-matching text box lets you define which characters in the input filenames should be treated as project names, and which should be treated as read names. After assembly, contigs can be exported (to a variety of file formats, including fasta and fastq) retaining the project name in the contig names.

This function can be a great time saver if you do a lot of related small sequencing projects as long as you use a well-defined naming convention.

Pattern Matching

The reads in your datasets must have a defined naming standard. You need to construct a pattern that will match the project name and read name. There are a set of characters that you can use to construct a pattern that defines what is the read name and what is the project name. As an aid to construction a pattern when you type these in the dialog the sub-project name will be dynamically updated to show what the sub-projects will be named. These characters are:

This is best demonstrated with an example. Here we have a sequencing dataset called BASENAME. Each individual sample that had been sequenced was numbered 1000 to 1100. Typical read names are:

List of read names

Your pattern for this could be:

PPPPPPPP-PPPPxxxx

We can break this down as follows for the first readname:

BASENAME-1001g07_0x00.s01_1.scf

The above set of reads would produce the following three sub-assemblies:

How to sort reads into sub-projects

  1. File | NEW | ASSEMBLY PROJECT
  2. click >ADD SEQS to add your dataset
  3. Click ASSEMBLE | PHRAP
  4. Click the Sub-Assemblies tab in the Phrap dialog.
  5. Toggle the Enable Sub-assemblies setting to on.
  6. Ensure your separator character is listed in the Valid Separators box.
  7. Construct a suitable matching patter (see above)
  8. Click OK.

Related Topics.

Assembler

The project window

Automatic assembly

Saving assemblies

Assembling sequences

Vector trimming

Assembling

Assembler Parameters