The number of genes and transcripts we have in Ensembl can make your VEP results very big. Filtering your results after running the VEP is the best way to make this more manageable, but you can also reduce the results in your run itself, to only get one result per variant or variant/gene combo.
We will make changes to the directory layouts of both the Ensembl Genomes FTP server (ftp://ftp.ensemblgenomes.org/pub/) and the Ensembl GRCh37 FTP server (ftp://ftp.ensemblorg.ebi.ac.uk/pub/grch37/) that may affect your pipelines. These changes will come into effect in Ensembl Genomes release 43/Ensembl release 96, which are scheduled for April 2019. Here are the details, so that you can plan any required updates to existing scripts and pipelines ahead of the releases.
As the community’s capacity for genome sequencing expands, so do its ambitions. Recently, many exciting global genomics projects have been launched, including the Vertebrate Genomes Project (VGP), Darwin Tree of Life (DToL), Earth Biogenome Project EBP, i5K (insects) and 10KP (plants). Between them, they aim to sequence the genomes of every eukaryote on Earth, and Ensembl are excited to take on the annotation of some of those genomes.
Joannella Morales, Jane Loveland and Adam Frankish contributed to this post.
Back in October, we introduced you to our new joint initiative with the NCBI — the Matched Annotation from the NCBI and EMBL-EBI (MANE) transcript set. We are now pleased to update you on our progress so far.
The goal of this project is to share annotation and converge on a high-confidence, genome-wide transcript set, with a matched transcript in both RefSeq and Ensembl/GENCODE. We are doing this in two phases. During phase 1, we will release the “MANE Select” transcript set to include one well-supported transcript for every protein-coding locus. We envision the adoption of the MANE Select set as a default set across genomics resources. In phase 2, we intend to release an expanded set (“MANE Plus”) to include additional transcripts per locus that are well-supported or of particular user interest.
In the next release of Ensembl (Ensembl 96) we will remove our database patches script from the main Ensembl repository.
There is now a dedicated module using the EBI OLS service to load Ensembl required ontologies. Considering this module is now in charge of loading the required data, the previous databases patches have been moved to the ols-ensembl-loader repository.
If you need to update your system with future patches, please now refer to the ols-ensembl-loader repository sql directory where files are already available.
Please contact the Ensembl Helpdesk if you have any questions or want to find out more about how this might affect your work.
Today we are meeting Guy, who works in the Plants team of Ensembl Genomes. He talks about how he came to Ensembl, his interests and experiences so far.
We’re looking for a software development manager to lead our infrastructure team, maintaining our database, API infrastructure and internal genome analysis tools. We’re looking for MScs, PhDs or equivalent in Computational, Physical or Biological Sciences with experience developing APIs, communicating technical information, software development and working with large datasets. Closes 10th April.
Did you know you can upload your own data for display alongside the reference genomes in Ensembl? For some file types, and files larger than 20MB in size you will need to create a URL to attach the data, rather than uploading from your local directory. It’s not difficult to create these URLs, but there are quite a few steps, so read on to find out how!