Mouse Assembly and Annotation Information

Mus musculus

Summary

This site presents data from the manual annotation of the C57BL/6J reference strain mouse genome by the Havana group at the Welcome Trust Sanger Institute. Vega also shows artificial loci generated by the mouse Knockout programs.

The loci chosen for manual annotation are spread throughout the genome but some regions have neccesarily received more focus than others. As part of the Genocode project, the annotation is now being extended to the whole of the genome. To date chromosomes 1, 2, 3, 4, 5, 6, 7, 11, X and Y have been fully annotated. Grey shading is used to identify unannotated regions of other chromosomes.

Additional mouse strains

Several candidate Insulin Dependent (Type 1) Diabetes (Idd) suseptibility loci from Non-Obese Diabetic (NOD) mouse strains NOD/MrkTac, NOD/ShiLtJ and 129 strains, as well as from the C57BL/6J reference, have been annotated and mapped to chromosomes 1, 3, 4, 5, 11, and 17 to investigate the role of these regions in Idd. Further information.

Manual annotation of the mouse genome is being undertaken primarily by the Havana group at the Wellcome Trust Sanger Institute. Full acknowledgmenets, including contributions of other groups.

External database identifiers

Vega mouse has CCDS identifiers assigned to translations where appropriate. Transcripts that have CCDS identifiers attached are highlighted in light blue on Location based views. The actual CCDS identifiers are accesible on Gene Summary, Gene External References, and Transcript Summary Pages. More information about CCDS.
Records are downloaded from MGI and associations between the Vega Gene names and identifiers in the downloaded file are made. External sources added are MGI, EntrezGene, Pubmed and RefSeq.
Uniprot records, Gene Ontology (GO) terms and Gene Ontology Annotation (GOA) records are imported into Vega. These are generated by the EBI Uniprot and GOA teams.

Genome Summary

Last Full Update	7 February 2017
Datafreeze Date	18 October 2016
Total Bases	3,515,031,163
Golden Path Length	2,725,521,370
Annotated bases	2,121,804,135

GRCm38 assembly genes

Havana:	41,175
Protein coding	19,208
lncRNAs:	7,054
lincRNA	4,174
antisense	2,484
sense intronic	279
bidirectional promoter lncRNA	89
sense overlapping	25
3prime overlapping ncRNA	2
macro lncRNA	1
ncRNAs:	2
scRNA	1
rRNA	1
Unclassified processed transcripts	765
Pseudogenes:	10,821
processed pseudogene	7,601
unprocessed pseudogene	2,460
transcribed unprocessed pseudogene	220
transcribed processed pseudogene	211
IG pseudogene	161
polymorphic pseudogene	76
TR pseudogene	44
unitary pseudogene	28
translated processed pseudogene	12
transcribed unitary pseudogene	8
IG	264
TR	226
Other:	2,835
TEC	2,835
Readthrough genes	263

Other strain genes

Havana:	1,454
Protein coding	858
lncRNAs:	190
lincRNA	99
antisense	85
sense intronic	6
Unclassified processed transcripts	22
Pseudogenes:	384
processed pseudogene	215
unprocessed pseudogene	157
transcribed unprocessed pseudogene	8
transcribed processed pseudogene	3
polymorphic pseudogene	1
Readthrough genes	7

Select a species

Recent locations