Pangenomes: incorporating diversity into genomic analyses

Abstract

The human reference genome is one of the most widely used resources in biological research. It is the basis for studying the functional biology of the human genome, genetic variations and their implications in disease, evolutionary relationships between humans and other species, and countless other basic biological and clinical questions. As a ``reference'', the reference genome serves as a standard scaffold against which new genomic data is compared. In order for a reference to be effective, it must be similar enough to the sample that they can be compared and differences between them identified and interpreted. However, the human reference genome is a “linear” genome that represents just one copy of a genome and has no information about genetic diversity. Because of its lack of diversity, the reference genome can differ significantly from an individual's genome and can bias new samples to appear more similar to the reference than they actually are. In its current form, the human reference genome is not representative of the human population and analyses that use it can be inaccurate for people who are dissimilar to the reference. One emerging alternative to a linear reference genome is a “pangenome” reference that represents a collection of genomic sequences. A pangenome incorporates information about genetic variants and can therefore better represent the genetic makeup of a population. The Human Pangenome Reference Consortium (HPRC) is working to produce a new pangenome reference that incorporates the genomes of hundreds of individuals from around the globe, selected to best represent global human genetic diversity as we currently understand it. With an improved reference, we can reduce biases in genetic analyses to make genomic research and genetic testing more accurate and useful for diverse populations.

Date
Event
Google Genomics Deep Dive
Location
Palo Alto, CA, USA