The human pangenome: a more diverse human reference genome in Ensembl

The human pangenome, a high-quality collection of reference human genome sequences that better captures diversity from different human populations compared to the current human reference genome, is now available through Ensembl. 

The work was led by the international Human Pangenome Reference Consortium (HPRC), a group funded by the National Human Genome Research Institute (NHGRI), part of the National Institutes of Health (NIH) and consisting of 14 institutes, including EMBL’s European Bioinformatics Institute (EMBL-EBI).

Researchers have released a new human pangenome reference, a high-
quality collection of reference human genome sequences that captures
substantially more diversity from different human populations than what was
previously available. Credit: Darryl Leja, NHGRI.

Genome sequences differ only slightly among individuals. In the case of humans, any two genomes are, on average, more than 99% identical. Small genomic differences contribute to each person’s uniqueness and can provide insights about their health, helping to diagnose disease and guide medical treatments. 

These small differences mean that using one standard reference genome, as many studies currently do, can have limitations. While the previous reference genome sequence was single and linear, the pangenome represents many different versions of the human genome sequence at the same time. This gives researchers a wider range of options for using the pangenome in analysing other human genome sequences. 

The new pangenome reference is a collection of different genomes from
which to compare an individual genome sequence. Like a map of the subway
system, the pangenome graph has many possible routes for a sequence to take,
represented by the different colors.  
The detouring paths at the top of the image represent single nucleotide variants
(SNVs), which are single letter differences. The yellow path that loops around itself
and repeats the same nucleotides represents a duplication variant. The pink path
that loops counterclockwise and follows the nucleotide sequence backwards
represents an inversion variant. At the bottom, the green and dark blue paths miss
the C nucleotide in its route and represent a deletion variant. The light blue path,
which has extra nucleotides in its route, represents an insertion variant.
Credit: Darryl Leja, NHGRI.

Expanding the range of genomes to increase the diversity present in the human reference genome will help progress personalised medicine by enabling clinicians to better tailor treatment to individual patients. This draft human pangenome reference includes the maternal and paternal genome sequences from 47 people, and the researchers are aiming to increase this number to 350 by mid-2024. The work, published in the journal Nature, is one of several papers published today by HPRC members. The majority of the genomes used to create the human pangenome reference were collected as part of the 1000 genomes project, the largest public catalogue of human variation and genotype data from a wide range of populations.

Accessing the human pangenome data 

In order to understand the differences in the genes present across the individual genomes represented in the human pangenome, Ensembl have mapped the high-quality annotations on the reference human genome generated as part of the GENCODE project, across the pangenome.

The human pangenome sequences and annotation are openly accessible on the Ensembl human pangenome project page and through Ensembl Rapid Release.

More about the Human Pangenome Reference Consortium

The Human Pangenome Reference Consortium (HPRC) is a project funded by the National Human Genome Research Institute to sequence and assemble genomes from individuals from diverse populations in order to better represent the genomic landscape of diverse human populations.

Institutions involved in the HPRC can be found on the project’s main page.

Information about the range of populations included in the project can be found on the project’s population sampling and representation page

This blog was adapted from the NHGRI and EMBL-EBI press release. 


  1. The Female Pan-Genome: Why were Males excluded?

    In their Article “The draft human pangenome reference” (W.-W. Liao et al., Nature 617, 312-324; 2023), the authors report their results on 47 diverse human diploid genomes. They analyzed samples of different Homo sapiens-populations (51 % from Africa, 34% Americas, 13% Asia, 2 % Europe) , which represent members of 4 of the 5 continental groups, which were formerly assigned to the so-called “Five Human Races”, i.e. Africans, American Indians, Asians, Caucasians and Pacific Islanders (see Q. Spencer, Phil. Stud. 175, 1013-1037; 2018).

    Accordingly, Liao et al. 2023 noted differences between the genomes from African ancestry vs. European descent, and vice versa, which corroborates earlier findings, see the ref. quoted above.

    However, the members of the “Human Pangenome Reference Consortium” excluded the genetic information of the male X and Y chromosomes, as mentioned on p. 314 of the Article. Specifically, as detailed on p. 35 of the Methods-Addendum of the Research paper (p. 14-63), the reader is informed that the X and Y sex-chromosomes from the 19 male samples were excluded from their analysis.

    Hence, only the 28 female tissue samples, inclusive of the XX-sex chromosomes, were analysed completely, and male-specific, Y-chromosome-encoded genetic information was removed. No explanation is given why the authors acted in this way.

    Accordingly, only the “Female Pangenome” was studied and published. This exclusion of the male germ line is not acceptable. The genetic difference between human males (XY) vs. females (XX) of ca. 2-3 %, based on the ca. 20 000 protein-coding genes of the H. sapiens- genome, was ignored by the authors.

    As mentioned by other researchers, we must distinguish between the male and female human genome., so that one mixed “Pan-Genome of both Sexes” does not represent the evolved sexual dimorphism, and therefore the nature, behaviour and physiology, of our species.

    In my opinion, Gender-bias has no place in the natural sciences, and in the next step, both the “XX- and the XY-Pan-Genome” should be analysed and revealed.

    U. Kutschera, AK Evolutionsbiologie, 79104 Freiburg i.Br., Germany

    1. Hi Dr. Ulrich Kutschera,
      Thank you for your valuable feedback on The Human Pangenome project.

      Unfortunately, we are not able to provide information on questions around the rationale for the HPRC sequencing. I would advice you to contact Human Pangenome coordinating centre ( for further information.

      Best wishes,

Leave a Reply

Your email address will not be published.