Tim Stephens, UC Santa Cruz
The UC Santa Cruz Genomics Institute has received a $2 million grant from the W.M. Keck Foundation for ongoing research to develop a comprehensive map of human genetic variation. The Human Genome Variation Map will be a valuable new resource for medical researchers, as well as for basic research on human evolution and diversity.
The Keck grant provides funding over two years for UC Santa Cruz researchers to create a full-scale map, building on the results of a one-year pilot project funded by the Simons Foundation.
"We've been experimenting with pilot regions of the genome and evaluating a variety of methods. The next steps will be to take it from a prototype to a full-scale genome reference that we can release to the community," said Benedict Paten, a research scientist at the Genomics Institute and co-principal investigator of the project.
The Human Genome Variation Map is needed to overcome the limitations of using a single reference sequence for the human genome. Currently, new data from sequencing human genomes is analyzed by mapping the new sequences to one reference set of 24 human chromosomes to identify variants. But this approach leads to biases and mapping ambiguities, and some variants simply cannot be described with respect to the reference genome, according to David Haussler, distinguished professor of biomolecular engineering and scientific director of the Genomics Institute at UC Santa Cruz.
Global Alliance
Haussler and Paten are coordinating their work on the new map with the Global Alliance for Genomics and Health (GA4GH), which involves more than 300 collaborating institutions that have agreed to work together to enable secure sharing of genomic and clinical data. The overall vision of the global alliance includes a genomics platform based on something akin to the planned Human Genome Variation Map, along with open-source software tools to enable researchers to mine the data for new scientific and medical breakthroughs. In the long run, the map will be used to identify genomic variants encountered in precision medical care as well, Haussler said.
The UCSC team has been collaborating with leading genomics researchers at other institutions to develop the map, which Paten began working on in 2014 as co-chair of the GA4GH Reference Variation Task Team. The new Human Genome Variation Map will replace the current assortment of isolated, incompatible databases of human genetic variation with a single, fundamental representation formalized as a very large mathematical graph. The clean mathematical formulation is a major strength of this new approach, Paten said.
The primary reference genome is a linear sequence of DNA bases (represented by the letters A, C, T, and G). To build the Human Genome Variation Map, each new genome will be merged into the reference genome at the points where it matches the primary sequence, with variations appearing as additional alternate paths in the map.
Mathematical structure
This mathematical graph-based structure will augment the existing human reference genome with all common human variations, providing a means to name, identify, and analyze variations precisely and reproducibly. "The original human reference genome project gave us a detailed picture of one human genome. This map will give us a detailed picture of the world's variety of human genomes," Paten said.
In the spirit of the original human genome project, the Human Genome Variation Map will be publicly and freely available to all. Haussler's team at UC Santa Cruz made the first human genome sequence publicly available on the Internet 15 years ago. This new project has many parallels with that earlier work, in which UCSC genomics researchers assembled and posted the first human genome sequence and went on to create the widely used UCSC Genome Browser.
"This is an infrastructure project for genomics that everyone agrees is important," Paten said. "It is ambitious, and it requires a fundamental shift from thinking of the reference as one sequence to thinking of it as this structure that incorporates all variation. But now is the time to do it. We need to build a model that works, and make it easy enough to use to get community acceptance."