1000 Genomes Project Releases Pilot Data
The completion of three pilot projects designed to determine how best to build an extremely detailed map of human genetic variation begins a new chapter in the international 1,000 Genomes project, which began in 2008. Completion of the pilots launches the full-scale effort to build a public database of human genetic variation from the genomes of 2,500 people from 27 populations around the world. With the announcement, groups involved in the project placed their final data in freely available databases that can be used and accessed by the worldwide research community.
"Mapping all the shared normal variation in human populations is a critical step to interpreting medically actionable genetic changes," said Richard Gibbs, director of the Baylor College of Medicine (BCM) Human Genome Sequencing Center, a major contributor to the effort, and also a professor in the department of molecular and human genetics at BCM.
"The 1000 Genomes project has a simple goal: peer more deeply into the genetic variations of the human genome to understand the genetic contribution to common human diseases," said Eric D. Green, director of the National Human Genome Research Institute, which provides major funding to the effort. "I am excited about the progress being made on this resource for use by scientists around the world and look forward to seeing what we learn from the next stage of the project."
Variation accounts for majority of diseasesRecent studies looking for variations that contribute to common human ailments, such as heart disease and diabetes, indicate that a host of rare variations account for much of the burden of disease in the human population. Complex and detailed maps, such as those to be assembled from the project, provide a potent tool for identifying those rare variations.
The pilot program tested the viability of three strategies. BCM designed and coordinated the strategy that involved targeting the sequencing to gene coding regions. This project provided the most complete data for the exons (or coding regions) of 1,000 genes, as it was designed to deeply sample the DNA in each of nearly 700 people. An estimated two percent of the human genome is composed of protein-coding genes.
"We also developed new methods to target variation in genes, and showed that this approach gave maximum information about this important class of human variation," said Fuli Yu, an assistant professor in the BCM Human Genome Sequencing Center and coordinator of the study.
Fast-paced projectThe project's fast pace was made possible by next-generation sequencing technology, which can produce thousands or million of sequences rapidly. The techniques involved allow researchers to evaluate all the rare variants found in areas of the genome known to be associated with human disease.
Another of the pilot projects involved using a variety of sequencing technologies to sequence the genomes of six people (two nuclear families including parents and one daughter) at high coverage (meaning in exacting detail). Each sample was sequenced from 20 to 60 times, uncovering a more complete picture of DNA variation in these families. Using different technologies,scientists also obtained a better understanding of the strengths of each sequencing platform.
The other pilot project sequenced the genomes of 179 people in less detail — subjecting each sample to an average of approximately four sequencing passes. Researchers then combined the data from different people to discover which genetic variants they share. This technique will provide valuable information in uncovering those genomic variations shared among people or populations.
Researchers can obtain the data freely through the 1,000 Genomes Web site or from the NCBI or the EBI. Researchers with limited computing power will be able to access the data through Amazon Web services' Elastic Compute Cloud. The database contains all forms of variation found in the genome from single changes,called single nucleotide polymorphisms (SNPs), to small insertions and deletions (of genetic material) to the large changes in the structure and number of copies of chromosomes called copy number variations.
In addition to Baylor College of Medicine's Human Genome Sequencing Center, much of the pilot work was carried out by researchers at the Wellcome Trust Sanger Institute in the United Kingdom, BGI Shenzhen in China, the Broad Institute of MIT, Harvard in Massachusetts; the Washington University Genome Sequencing Center at the Washington University School of Medicine in St. Louis, and Boston College.