This blog post is a joint contribution by Joannella Morales, Jane Loveland, Adam Frankish, Fiona Cunningham and Astrid Gall.
We are pleased to introduce the Matched Annotation from the NCBI and EMBL-EBI (MANE) project. This new joint initiative between EMBL-EBI’s Ensembl project and NCBI’s RefSeq project aims to release a genome-wide transcript set that contains one well-supported transcript per protein-coding locus. All transcripts in the MANE set will perfectly align to GRCh38 and will represent 100% identity (5’UTR, coding sequence, 3’UTR) between the RefSeq (NM) and corresponding Ensembl (ENST) transcript.
You might remember that we conducted a survey about transcripts during the Spring. We asked the community if Ensembl and RefSeq should define a primary transcript for all protein-coding genes. We suggested that this transcript could be useful as a default for displays, for variant reporting and for comparative genomics, for example. We received numerous responses (thank you!), and were amazed to see the varied (and sometimes strong!) opinions about the subject from the community. We have taken this feedback and spent the summer formulating strategies, designing pipelines, refining outputs and working together towards this goal.
We envision the adoption of the MANE set as a default set across genomics resources. Our effort to select one high-quality transcript at all protein-coding loci, and to have this be consistent across all genomics resources, will give a consistent starting view of biology for researchers, whether the intent is to use it for reporting variants, comparative genomics or any other endeavour. That said, all the transcripts we annotate should always be considered and we are certainly NOT saying that biology can be simplified to a single transcript at each genomic locus. We anticipate expanding the project to include a larger set of transcripts that are well-supported, predicted to be functional or relevant to specific user groups. Also, in recognition of the fact that clinical data are more often reported using RefSeq (NM) transcripts, whereas large-scale genomics projects more often default to Ensembl’s ENSTs, this project will eliminate differences between RefSeq and Ensembl annotation for the MANE subset of transcripts to encourage and facilitate bi-directional exchange of data across RefSeq and Ensembl transcripts. Finally, the perfect alignment of all MANE transcripts to GRCh38 makes the set compatible with NGS-based sequencing technologies and other resources which default to GRCh38. Our goal is to release a “beta” set by the end of the year for testing by stakeholders.
We are holding several community outreach initiatives to obtain feedback on our efforts. Last week, we hosted a transcript workshop at the Global Alliance for Global Health (GA4GH) plenary meeting in Basel, Switzerland. Next week we will be hosting, jointly with GRC, a workshop at the American Society for Human Genetics (ASHG) annual meeting in San Diego, CA, USA: “Getting the Most from the Reference Assembly and Reference Materials: Updates and Developments from the Genome Reference Consortium (GRC) and Genome in a Bottle (GIAB)”, presented jointly by Terence Murphy (NCBI) and Joannella Morales. It will be held 1-4pm on 16th October and is open to all ASHG attendees. There will also be a platform presentation about the MANE project 9:15-9:30am on Saturday the 20th October (Session #94) by Jane Loveland: “Converging on transcript annotation from Ensembl/GENCODE and RefSeq.”. Please join us in San Diego at the workshop and talk, or get in touch via the Ensembl helpdesk. Representatives of the project will also be available at other times during the ASHG meeting for discussion. We look forward to your input on our new initiative.