MouseVega Home

Mouse Assembly and Annotation Information

Mus musculus


This site presents data from the manual annotation of the C57BL/6J reference strain mouse genome by the Havana group at the Welcome Trust Sanger Institute. Vega also shows artificial loci generated by the mouse Knockout programs.

The loci chosen for manual annotation are spread throughout the genome but some regions have neccesarily received more focus than others. As part of the Genocode project, the annotation is now being extended to the whole of the genome. To date chromosomes 1, 2, 3, 4, 5, 6, 7, 11, X and Y have been fully annotated. Grey shading is used to identify unannotated regions of other chromosomes.

Additional mouse strains

Several candidate Insulin Dependent (Type 1) Diabetes (Idd) suseptibility loci from Non-Obese Diabetic (NOD) mouse strains NOD/MrkTac, NOD/ShiLtJ and 129 strains, as well as from the C57BL/6J reference, have been annotated and mapped to chromosomes 1, 3, 4, 5, 11, and 17 to investigate the role of these regions in Idd. Further information.

Manual annotation of the mouse genome is being undertaken primarily by the Havana group at the Wellcome Trust Sanger Institute. Full acknowledgmenets, including contributions of other groups.

External database identifiers

  • Vega mouse has CCDS identifiers assigned to translations where appropriate. Transcripts that have CCDS identifiers attached are highlighted in light blue on Location based views. The actual CCDS identifiers are accesible on Gene Summary, Gene External References, and Transcript Summary Pages. More information about CCDS.
  • Records are downloaded from MGI and associations between the Vega Gene names and identifiers in the downloaded file are made. External sources added are MGI, EntrezGene, Pubmed and RefSeq.
  • Uniprot records, Gene Ontology (GO) terms and Gene Ontology Annotation (GOA) records are imported into Vega. These are generated by the EBI Uniprot and GOA teams.

Genome Summary

Last Full Update 7 February 2017
Datafreeze Date 18 October 2016
Total Bases 3,515,031,163
Golden Path Length 2,725,521,370
Annotated bases 2,121,804,135

GRCm38 assembly genes

Havana: 41,175
Protein coding 19,208
lncRNAs: 7,054
lincRNA 4,174
antisense 2,484
sense intronic 279
bidirectional promoter lncRNA 89
sense overlapping 25
3prime overlapping ncRNA 2
macro lncRNA 1
ncRNAs: 2
scRNA 1
rRNA 1
Unclassified processed transcripts 765
Pseudogenes: 10,821
processed pseudogene 7,601
unprocessed pseudogene 2,460
transcribed unprocessed pseudogene 220
transcribed processed pseudogene 211
IG pseudogene 161
polymorphic pseudogene 76
TR pseudogene 44
unitary pseudogene 28
translated processed pseudogene 12
transcribed unitary pseudogene 8
IG 264
TR 226
Other: 2,835
TEC 2,835
Readthrough genes 263

Other strain genes

Havana: 1,454
Protein coding 858
lncRNAs: 190
lincRNA 99
antisense 85
sense intronic 6
Unclassified processed transcripts 22
Pseudogenes: 384
processed pseudogene 215
unprocessed pseudogene 157
transcribed unprocessed pseudogene 8
transcribed processed pseudogene 3
polymorphic pseudogene 1
Readthrough genes 7

About this species