Annotation Attributes
A standard set of attributes with strictly defined meanings are added to the Vega annotation. Where present these are shown on Gene Summary and Transcript Summary Panels.
Transcript level attributes
- bicistronic
- transcript contains two confidently annotated CDSs. Support may come from e.g. proteomic data, cross-species conservation or published experimental work
- CAGE supported TSS
- transcript 5' end overlaps ENCODE or Fantom CAGE cluster
- dotter confirmed
- transcript checked using DOTTER dotplot alignment of homology evidence to genomic sequence to confirm exon structure
- inferred exon combination
- transcript model contains all possible in-frame exons supported by homology, experimental evidence or conservation, but the exon combination is not directly supported by a single piece of evidence and may not be biological. Used for large genes with repetitive exons (e.g. titin (TTN)) to represent all the exons individual transcript variants can pool from
- inferred transcript model
- transcript model is not supported by a single piece of transcript evidence. May be supported by multiple fragments of transcript evidence or by combining different evidence sources e.g. protein homology, RNA-seq data, published experimental data
- low sequence quality
- transcript supported by transcript evidence that, while mapping best-in-genome, shows regions of poor sequence quality
- not organism-supported
- mRNA, EST or protein homology evidence from orthologous loci from other species can be used to build variants on the condition that the homology is perfectly co-linear and all normal splicing rules are upheld
- non-submitted evidence
- transcript supported by sequence evidence from as yet unpublished experimental study
- readthrough
- transcript connecting two independent loci, i.e. transcript has exons that overlap exons from transcripts belonging to 2 or more different loci
- retained intron CDS
- CDS codes through an internal retained intron (compared to a reference variant)
- retained intron final
- CDS ends in, or downstream of, a retained intron that, compared to a reference variant, is immediately downstream of the last coding exon
- retained intron first
- CDS starts in, or upstream of, a retained intron that, compared to a reference variant, is immediately upstream of the first coding exon
- RNA-Seq supported only
- transcript either supported in full by RNAseq data or has unique splice feature that is only supported by RNAseq data
- RP supported TIS
- transcript contains a CDS that has a translation initiation site supported by Ribosomal Profiling data
- upstream ATG
- an upstream ATG exists, but the ATG for the current CDS has been chosen taking into account factors like cross-species conservation, strength of Kozak sequence, signal peptides, experimental evidence, and ribosome profiling
- 3' nested supported extension
- 3' end extended based on RNA-seq data
- 3' standard supported extension
- 3' end extended based on RNA-seq data
- 454 RNA-Seq supported
- annotated based on RNA-seq data
- 5' nested supported extension
- 5' end extended based on RNA-seq data
- 5' standard supported extension
- 5' end extended based on RNA-seq data
- RNA-Seq supported only
- annotated based on RNA-seq data
- RNA-Seq supported partial
- annotated based on mixture of RNA-seq data and EST/mRNA/protein evidence
- nested 454 RNA-Seq supported
- annotated based on RNA-seq data
Gene level attributes
- fragmented locus
- locus consists of non-overlapping transcript fragments either because of genome assembly issues (i.e., gaps or mis-assemblies), or because supporting transcripts (e.g., from another species) cannot be completely mapped, or because the supporting transcripts are non-overlapping end pairs (i.e., 5' and 3' ESTs from a single cDNA)
- orphan
- protein-coding locus with no paralogues or orthologs
- overlapping locus
- exon(s) of the locus overlap exon(s) of a readthrough transcript or a transcript belonging to another locus
- reference genome error
- locus overlaps a sequence error or an assembly error in the reference genome that affects its annotation (e.g., 1 or 2bp insertion/deletion, substitution causing premature stop codon). The main effect is that affected transcripts that would have had a CDS are currently annotated without one
- retrogene
- protein-coding locus created via retrotransposition
- ncRNA host
- locus is host to ncRNAs such as piRNA, miRNA, snoRNA, etc.