COSMIC Release v90
5 Sep 2019
COSMIC version 90 is here. This is a special release for COSMIC because we have focused on updating and upgrading the architecture and systems behind COSMIC, substantially updating and standardising our variant content across current sets of genes, transcripts & proteins. In particular, systematically reannotating our coding and non-coding mutations across current transcripts has revealed an additional 3 million coding mutations, some of which are likely to have substantial protein consequences. As a result, there are now 9 million coding mutations available for exploration in this new release. In addition, current standards of nomenclature, annotation and description are adopted to enhance COSMIC's support for FAIR principles, increasing its accessibility and interoperability.
The field of cancer genetics has changed substantially in the 15 years since COSMIC's inception. We always strive to capture and combine some of the least accessible cancer mutation data from a wide variety of sources, to create a broad genetic resource across all forms of human cancer. However, bioinformatics resources across the world update to an irregular schedule making it very difficult to ensure any information in this inter-dependent field is fully up-to-date. With the v90 release we are beginning the process of ensuring COSMIC is at the forefront of fully standardised and FAIR bioinformatic resources, with future releases revealing the most recent and novel consequences of somatic mutations in cancer.
The significant changes in v90 include:
- The genes, transcripts and proteins have been updated from Ensembl release 93 independently for both the GRCh37 and GRCh38 assemblies.
- There has been a full reannotation of COSMIC variants with known genomic coordinates using Ensembl's Variant Effect Predictor (VEP). This provides accurate and standardised annotation of variants with known genomic location uniformly across all relevant transcripts and genes.
- The new stable genomic identifiers (COSV) indicate the definitive position of the variant on the genome. These unique identifiers allow variants to be mapped between GRCh37 and GRCh38 assemblies and displayed on a selection of transcripts.
- The cross-references between COSMIC genes and other widely-used databases, such as HGNC, RefSeq, UniProt and CCDS, have been updated.
- Complete, standardised representation of COSMIC variants, following the most recent HGVS recommendations, where possible.
- The gene fusions have been remapped on the updated transcripts on both the GRCh37 and GRCh38 assemblies, along with the genomic coordinates for the breakpoint positions.
- Where duplicate variants exist at a single genomic location they have been merged into one representative variant.
What will look different?
We have updated our definition of a mutation, introducing a parent-child relationship between the variant as described at the genomic level (the parent) and the variant as described by the annotation on one or more transcripts (the child or children). The introduction of the new relationship further necessitates the introduction of a new genomic mutation identifier, COSV, for the parent variant.
Inevitably, we have had to make changes to the website in order to accommodate the new parent-child model, but we have tried to restrict the number and extent of the changes in order to minimise disruption. The website fully supports the existing identifiers (COSM and COSN), which are now referred to as legacy mutation identifiers, and where variants have been updated or merged, for example, the browser will be automatically redirected. Full details of these changes and their effects can be found on the variant updates page.
The updates to the model also required changes to the download files. Several files now include significantly more rows, due to the additional data generated by the comprehensive annotation across multiple transcripts. We have also added extra columns to some files to accommodate the new identifiers. Full details of the changes to the download files can be found on the variant updates page.
Note that because the mutation model has changed in v90, the statistics describing the release, such as mutation count, are no longer directly comparable to those in previous releases.
For full details of the v90 release, please see the release notes. For more technical details, check the variant updates page. And don't forget you can keep up to date with us via Twitter.