Only consider visible features This option is useful if you want to avoid cluttering your sequences with large numbers of hidden features. For example, the standard GenBank pBR322 vector sequence has around 60 features assigned to it, but only four are displayed in most plasmid maps.
Discard shorter duplicates Discards any features that lie entirely within another feature of the same type, so that only the longest feature is retained. Normally, if features have different start or stop locations, they are considered to be different. However, many vectors have slight differences in the extent of the replication origin or in features such as T7 or SP6 promoters. So, when comparisons are made with the numerous annotated sequences in the sequence folder, it is possible for the same feature to be annotated several times, with each annotation differing by one or two residues at either end. Checking this option removes such duplicates.
Note that a "duplicate" is any feature of a type that has the same start and stop as another feature of the same type, regardless of qualifiers. If there are two CDS features that have the same start and stop, they are considered to be the same even if they have different qualifiers, descriptions etc. If a shorter CDS lies within a longer CDS, this flag will discard the shorter CDS
Allow gaps in CDS features Permits gaps in CDS features when searching for matches between features in the sequence folder and the new sequence. In general, MacVector incorporates some fuzziness in the identification of matching features, allowing a limited number of gaps and mismatches in the alignment. However, it does not usually permit gaps in CDS features because these give rise to frameshifts in the encoded protein, potentially leading to something completely different to that which was encoded by the original annotated feature. Checking this option removes this restriction, which could be useful if you suspect that the new sequence may have sequencing errors, since allowing gaps in CDS features should ensure that they are annotated as expected.
Minimum feature length The auto-annotation algorithm uses sequence similarity to determine if a feature is present in the new sequence. This matching method can, sometimes, lead to very short features being incorrectly added. Consider, for example, a 4 base pair misc_feature being used to label an important MboI site in a sequence in the sequence folder. If no minimum feature length is specified, then that feature will be added at every MboI (GATC) site in the new sequence ? every 256bp on average.
Residues around point feature A point feature is one where the start and stop location are the same. These might be SNP locations, a replication start site, or just a particular point of interest. MacVector treats these as a special case. When the algorithm encounters a point feature, it examines the region containing the specified number of residues centered on that point and uses it to determine if the point feature is present in the new sequence. Note. This parameter is used only when the Include point features option in the Point Feature Characteristics section (described below) is selected.
Maximum allowed mismatches MacVector incorporates some fuzziness in the identification of matching features, allowing a limited number of gaps and mismatches in the alignment. This parameters controls the number of mismatches permitted in matching features. The default value of 1 means that only 1 residue in 100 can be mismatched.
Maximum allowed gaps This parameters controls the number of gaps permitted in matching features. The default value of 0.5 means that only 1 gap in 200 residues is permitted.
Include point features The Residues around the point feature setting provided in the Feature Characteristics section (described above) will be used to identify matches.
Include point features enclosed by other features Will include point features only if the feature that encloses them is added. For example, suppose you have SNPs annotated within a CDS feature. If the CDS feature gets added to the new sequence, then so do all the SNPs that lie within it.
The Feature Modifications section enables you to adjust the way the auto-annotate algorithm handles any existing features in the new sequence.
Leave existing qualifiers and graphics unchanged Discards matching features if the new sequence already has features of the same type at the same location. Use this option to make sure that no existing features are changed while still allowing new features to be added.
Replace qualifiers and graphics for existing features Replaces existing features in the new sequence with matching features of the same type at the same location. Note. Features are replaced but not removed. If there are existing features that do not match any features in the sequence folder, then they are retained unchanged.
Replace only graphics for existing features Retains all of the qualifier and description information associated with any existing features but replace the graphical appearance information with that of the matching feature. This option is particularly useful if you have downloaded a sequence from Entrez or imported a GenBank or EMBL format file, since it ensures that the feature takes on the graphical appearance you prefer, without losing any textual annotations.
Limitations of auto-annotation A limitation of the auto-annotation function is that the entire feature from the scanned folder must be present in the target sequence uninterrupted before it will be added. If even a single residue is missing from one end, it will not be considered a full-length match and will not be annotated. Similarly, if a feature has been interrupted, e.g. because you inserted a fragment of DNA into a gene, neither part of the feature will be annotated.