From release 70 we store and display information on the type of consequence a variant has on overlapping regulatory regions (Ensembl regulatory features and Ensembl motif features) for human and mouse.
One of the major benefits of this is that we can highlight the predicted consequence types for a variant overlapping regulatory regions in the region in detail view. The Variation – Genes and regulation page gives more information on the type of consequence a single variant has on a specific regulatory region.
We store the data in two new tables: regulatory_feature_variation and motif_feature_variation. Both are populated in a similar way to the transcript_variation table. You can find further information on the table structures on our Variation database schema description page.
You can access the data using the Ensembl Variation Perl API. Please check the API documentation for examples of how to use the RegulatoryFeatureVariationAdaptor and the MotifFeatureVariationAdaptor. These new modules allow you to fetch MotifFeatureVariations or RegulatoryFeatureVariations on a VariationFeature, MotifFeature or RegulatoryFeature. This is in addition to the existing functionality for getting all RegulatoryVariationFeatures and MotifFeatureVariations using the VariationFeatureAdaptor.
If you have any questions please email helpdesk.
If you are interested in knowing the evolutionary history of your preferred Ensembl gene, you are in luck. Starting from this release (69) Ensembl has a new gene gain/loss tree view just for this purpose. This view shows the evolutionary history of a gene family by showing gains (expansions) or loss (contractions) on the number of members belonging to a given gene family.
The example below shows a detail of the evolutionary history of the human gene ZNF235 as displayed by this new Ensembl view. As you can see, it is a species tree with annotated branches showing significant expansions (in red), contractions (in green) or no significant changes (in blue). The nodes representing each extant species or ancestral node is labelled with the number of members of the family and the statistical significance of this change (or the lack of it).
View the example in Ensembl
If you want to know more about this view and how the data is generated check out its help page.
Please, try it out with your preferred genes and let us know your impressions (helpdesk contact form). We are working to include more useful information in the view and your input is important!
Did you ever wish you could resize our images/views to make them bigger? We now have a new icon on the blue image toolbar in beta.ensembl.org, and you can resize the image on one click.
On clicking the icon, a menu to choose the size will appear with your current size greyed out (see figure below). There is also a best fit option which will resize the image according to your screen resolution.
Have a look and let us know your thoughts by sending them to firstname.lastname@example.org or by clicking on the black feedback button at the right of the views in Ensembl Beta.
All feedback, improvement/suggestions are welcome, specifically:
Is it useful?
Is the menu clear enough?
Any other improvements?
Many thanks for your feedback!
Have you ever spent time changing your favourite Ensembl view (for example adding new tracks, changing the track order, or uploading custom data) and wished you could easily send the configured display to a colleague through one simple url? You can now do this on beta.ensembl.org.
Configurable images now have a link icon in their toolbars. If you click on this, it will give you a link to share with another user.
If you have any custom tracks turned on for the image, you will get the option to share these too (this is opt-in via checkboxes). This works with uploaded files, attached URLs, DAS and data hubs.
Custom tracks will only be shareable if they are displayed on the image (or in the case of data hubs, if any of the tracks in the hub are displayed).
If you send the url to a colleague, he/she will see the image configured in the same way that you have it.
You can also share configurations for a whole page by using the Share this page button in the left menu.
Please try it out. If you encounter any problems, please use the Feedback button on the beta site to tell us about them (or email email@example.com), making sure to include the link you are trying to share.
We are pleased to announce that we are now providing access to the ENCODE integrative analysis data from within Ensembl. These analyses bring together a multitude of experiments targeted at determining functional elements in the human genome sequence. This data is provided from an external source (a track hub at the EBI). Although the Ensembl code supporting track hubs is still in preliminary form, we considered this ENCODE set sufficiently important to release the code early to enable us to provide access to this set.
Important: Please read the instructions below before activating this data!
As this dataset is very large (over 2800 tracks) it is not configured on by default in the Ensembl browser. To add the ENCODE hub tracks, click on the link below. Warning: users of IE6 or IE7 should not do this because performance in those browsers is inadequate and the page will not load.
Link to add ENCODE integrative analysis hub
No tracks from the hub are switched on by default. To turn on tracks from ENCODE, go to ‘Configure this page’ and click on one of the submenus under ‘ENCODE data’, for example ‘ENCODE genome segmentations’. It will take a few seconds to bring up the track list. Then switch tracks on or off by clicking on the box next to the track name and choosing a track style. For genome segmentations the ‘Compact’ track style looks good. More information on configuring the display is available in our recently released video tutorial on region in detail view. Here’s an example of a region showing a few ENCODE tracks (HepG2 and K562 genome segmentations and cytosolic RNASeq tracks):
If you no longer need access to the ENCODE set of tracks, the hub can be turned off by going to the ‘Manage your data’ link in the left hand menu, and clicking on the trash bin icon for the ‘ENCODE data’ source to delete it from the ‘Configure this page’ menu.
We will be working over the next few months to extend our track hub support, including improving the performance and adding features of configuration interface.
From release 68, we are using Sequence Ontology (SO) terms for the variation consequences, in an effort to standardise terms across the different browsers, making it easier for users to do a cross comparison of variation annotation. The UCSC Genome Browser will use these terms on their SNP details page around mid-August, dbSNP will update their web display in the next few weeks and the ICGC also intend to standardise on SO terms for describing somatic mutation consequences.
At the same time, we have added a couple more specific consequences for SNPs and in-dels (splice donor variant and splice acceptor variant for example) and consequences for larger structural variants are now available through the Variant Effect Predictor (VEP). The complete list of terms and definitions are in our documentation.As you will see, the SO equivalents for our old terms are fairly straightforward. The most notable difference is that we have replaced “non-synonymous” with the more specific term “missense”, for changes in amino acid which do not include stop gained, as we already have a specific term for stop gained.
The old Ensembl terms are still available on the website (using”Configure this page”) and if you have text files or VEP output files with our old Ensembl terms, you can easily update these to using the SO terms by running the following script.
For release 67 we changed how we store the protein function predictions from SIFT and PolyPhen so that they also can be used for more than just Ensembl transcripts, including RefSeq transcripts. We use these tools to compute the predicted effect of every possible amino acid substitution in the human proteome (over 2 billion predictions!). Now, the complete set of predictions for a particular protein are retrieved using the protein sequence itself as an identifier rather than an Ensembl stable identifier (we actually use the MD5 hash of the sequence). This means that you can retrieve predictions for any protein that has the same amino acid sequence as an Ensembl translation. So if you work with RefSeq transcripts, you can now get SIFT and PolyPhen predictions for any missense variants that fall in the 95% of RefSeq transcripts that match an Ensembl transcript exactly, using both the Variant Effect Predictor (VEP) and the Variation API.
New in release 67 are also predictions from both classifier models supplied with PolyPhen. Previously we provided predictions using a classifier trained on the HumVar dataset which is intended to distinguish between severely deleterious alleles against the background of abundant variation with milder effects. This is still the default, but when using the API you can now also opt to use predictions from the classifier trained on the HumDiv dataset which is intended to help evaluate rarer alleles potentially involved in complex disease. For more details on how these datasets are composed, please refer to the PolyPhen website.
The Variant Effect Predictor (VEP) software can predict the consequence of genomic variants using the genomic annotations provided by Ensembl. In release 63 of Ensembl we have added new features to both the script and web versions of the VEP.
Regulatory consequences have made their return; the VEP now reports if a variant falls within a regulatory region or a transcription factor binding motif, and furthermore if the variant falls in a high information locus within the motif.
The VEP now also has a dedicated area of the Ensembl website documentation.
To improve performance for users in the USA, we have now deployed a mirror of the public database server; to use this simply pass the flag “–host useastdb.ensembl.org” when running the script.
We have also implemented a caching system in the VEP, such that is possible to use almost all of the functionality of the script without the script querying the database at all. Simply download and unpack a pre-built cache, run the script with the flag “–cache”, and hey presto! No more network dependencies.
We have now made “whole genome mode” the default run mode of the script – this code has been rewritten and optimized such that it should be suitable for all use cases. We’ve also improved the status output of the script as it runs, so users with lots of data can easily track their progress.
See the new documentation for further details on all of these new features, or just download the script!
It is now possible to filter your input variants by their frequency as observed in the 1000 genomes or HapMap populations. You can either include or exclude input variants that are co-located with existing variants, based on frequencies in any particular population or across a range of populations.
As before, you can access the web VEP through the tools page, or via the “Manage your data” link on any species-specific page.
Alongside our website, ensembl provides direct access to our databases through our public MySQL server ensembldb.ensembl.org and as of today, we are pleased to announce the availability of a second MySQL mirror hosted on the east coast of the US. The new server is running on Amazon Cloud with the hostname
it can be directly direct accessed with the mysql client using port 5306 and username anonymous.
mysql -h useastdb.ensembl.org -u anonymous -P5306
It may also be accessed through our perl API with the following registry incantation:
my $registry = 'Bio::EnsEMBL::Registry';
$registry->load_registry_from_db( -host => 'useastdb.ensembl.org',
-user => 'anonymous');
useastDB will provide the current ensembl release alongside the previous on a rolling basis. This means that useastdb is currently hosting release 63 with 62 databases only, this will then become release 64 with 63 databases after our next release. Our full set of older releases will continue to to be hosted on ensembldb.ensembl.org
We hope that our users enjoy the faster access to our data that this new MySQL mirror should provide.
With some satisfaction, I am happy to announce the arrival of a new documentation resource with Release 63, intended to assist programmers in getting the most out of EnsEMBL.
Using a custom filter and the open source tool Doxygen we now bring you a more pleasing perspective on the EnsEMBL API, with the following features:
- Search box – find the class you need fast
- Inherited methods – no more hunting for superclasses
- Class and dependency diagrams – see how the API is structured
- Multiple perspectives – view by class, namespace, directory or method
The new reference can be found through the website, so update your bookmarks and have a look around. You might see some artifacts in the automated documentation, but we will be aiming to remove these as part of an ongoing effort to standardise code comments. I hope you enjoy the advantages of this new modern view of our API.