Mouse Assembly and Annotation Information
Mus musculus
Summary
This site presents data from the manual annotation of the C57BL/6J reference strain mouse genome by the Havana group at the Welcome Trust Sanger Institute. Vega also shows artificial loci generated by the mouse Knockout programs.
The loci chosen for manual annotation are spread throughout the genome but some regions have neccesarily received more focus than others. As part of the Genocode project, the annotation is now being extended to the whole of the genome. To date chromosomes 1, 2, 3, 4, 5, 6, 7, 11, X and Y have been fully annotated. Grey shading is used to identify unannotated regions of other chromosomes.
Additional mouse strains
Several candidate Insulin Dependent (Type 1) Diabetes (Idd) suseptibility loci from Non-Obese Diabetic (NOD) mouse strains NOD/MrkTac, NOD/ShiLtJ and 129 strains, as well as from the C57BL/6J reference, have been annotated and mapped to chromosomes 1, 3, 4, 5, 11, and 17 to investigate the role of these regions in Idd. Further information.
Manual annotation of the mouse genome is being undertaken primarily by the Havana group at the Wellcome Trust Sanger Institute. Full acknowledgmenets, including contributions of other groups.
External database identifiers
- Vega mouse has CCDS identifiers assigned to translations where appropriate. Transcripts that have CCDS identifiers attached are highlighted in light blue on Location based views. The actual CCDS identifiers are accesible on Gene Summary, Gene External References, and Transcript Summary Pages. More information about CCDS.
- Records are downloaded from MGI and associations between the Vega Gene names and identifiers in the downloaded file are made. External sources added are MGI, EntrezGene, Pubmed and RefSeq.
- Uniprot records, Gene Ontology (GO) terms and Gene Ontology Annotation (GOA) records are imported into Vega. These are generated by the EBI Uniprot and GOA teams.
Genome Summary
Last Full Update | 7 February 2017 |
Datafreeze Date | 18 October 2016 |
Total Bases | 3,515,031,163 |
Golden Path Length | 2,725,521,370 |
Annotated bases | 2,121,804,135 |
GRCm38 assembly genes
Havana: | 41,175 |
Protein coding | 19,208 |
lncRNAs: | 7,054 |
lincRNA | 4,174 |
antisense | 2,484 |
sense intronic | 279 |
bidirectional promoter lncRNA | 89 |
sense overlapping | 25 |
3prime overlapping ncRNA | 2 |
macro lncRNA | 1 |
ncRNAs: | 2 |
scRNA | 1 |
rRNA | 1 |
Unclassified processed transcripts | 765 |
Pseudogenes: | 10,821 |
processed pseudogene | 7,601 |
unprocessed pseudogene | 2,460 |
transcribed unprocessed pseudogene | 220 |
transcribed processed pseudogene | 211 |
IG pseudogene | 161 |
polymorphic pseudogene | 76 |
TR pseudogene | 44 |
unitary pseudogene | 28 |
translated processed pseudogene | 12 |
transcribed unitary pseudogene | 8 |
IG | 264 |
TR | 226 |
Other: | 2,835 |
TEC | 2,835 |
Readthrough genes | 263 |
Other strain genes
Havana: | 1,454 |
Protein coding | 858 |
lncRNAs: | 190 |
lincRNA | 99 |
antisense | 85 |
sense intronic | 6 |
Unclassified processed transcripts | 22 |
Pseudogenes: | 384 |
processed pseudogene | 215 |
unprocessed pseudogene | 157 |
transcribed unprocessed pseudogene | 8 |
transcribed processed pseudogene | 3 |
polymorphic pseudogene | 1 |
Readthrough genes | 7 |