Wednesday, October 7, 2015

Type 2 Diabetes knowledge portal goes live

Human genetic data contains valuable clues for solving the mystery of how human disease works, and the amount of data is growing as major projects around the world engage patients in large genomic studies. These projects would be most powerful if researchers worldwide collaborated with each other, sharing their findings and making results available to scientists from many disciplines including molecular biology and drug development. However, there are few frameworks for sharing the data in a way that properly credits researchers and protects patient privacy -- and as a result, today, only a relative handful of highly specialized geneticists can analyze the data or access the full results of studies.

What if anyone could?

Making that kind of sharing possible is the ultimate goal of a user-friendly web portal and knowledgebase launched October 7 at a special session of the American Society of Human Genetics annual meeting, by an international team that hopes to dramatically expand the number of researchers who can use human genetic data to study type 2 diabetes (T2D).

Research on other common diseases faces the same challenges and could therefore benefit from similar web portals.

“The investigators involved in the Accelerating Medicines Partnership in Type 2 Diabetes believe that genomic data should benefit all of humankind, as long as proper confidentiality protections are in place,” said Jose Florez, Chief of the Diabetes Unit and an Institute Member at the Broad Institute, who is one of the lead scientists in the project. “In this way the collective brainpower of all those with bright ideas, whether they come from academia, big Pharma, the biotechnology sector or government can be brought to bear on these difficult problems.”

The T2D portal ( -- which is open to the world free of charge -- is being developed by a team of scientists and software engineers at the Broad Institute, the University of Michigan, Oxford University, and many other collaborators as part of a worldwide scientific consortium with contributors from academia, industry, and non-profit organizations. Financial support is provided by the Accelerating Medicines Partnership in Type 2 Diabetes -- a collaboration of the National Institutes of Health, five major pharmaceutical companies, and three large non-profits -- as well as the Carlos Slim Foundation.

“Through AMP, we have an unprecedented opportunity to advance international research in type 2 diabetes,” said NIH Director Francis S. Collins in a press release. “Our hope is that this portal – and this partnership – will lead to better disease targets and a shorter, less expensive drug development process, enabling companies to get safe and effective medications to patients who need them faster.”

The current knowledgebase contains genetic and clinical data from dozens of academic institutions and partners worldwide, and researchers hope to add greatly to it over the next few years. The data already include detailed information from more than a hundred thousand participants in studies of T2D, as well as results from some of the largest genetic studies ever conducted on related traits such as obesity and glucose and insulin levels. Some of the datasets were mined by collaborating investigators in recent years to identify the genetic variants that influence T2D risk in new or unexpected ways, such as a set of variants near the gene SLC16A11 that are present in almost half of Mexicans with Native American ancestry, and another set of variants that inactivate the gene SLC30A8 and protect variant carriers against T2D.

The studies represented in the portal -- and those that researchers hope to add -- use different technologies and formats. As a result, they must be not only aggregated but also "harmonized" -- rendered compatible with each other. Those two challenges represent major work needed to make publicly accessible knowledgebases a reality for T2D or other complex traits, said Jason Flannick, a Research Associate at Massachusetts General Hospital and Technical Lead at the Broad Institute, who leads the Broad portal development team.

“There is a wealth of genetic data that has been and continues to be generated by investigators around the world that could be hugely valuable to many of the people focused on downstream biological or therapeutic research,” said Flannick. “But enabling these heterogeneous and scattered datasets to be accessible via one place, and building a sustainable model that will scale to the much larger datasets that are being produced as next-generation sequencing becomes more and more routine, requires a major investment in software and bioinformatics.”

Scientists can use the portal to easily browse results from many studies simultaneously, a process that has previously required specialized software and analysts. In addition, the portal is designed to anticipate and answer biological questions from users who may not already be familiar with statistics or genomics, enabling a broader community to learn from these results for the first time.

Users can:
  • Retrieve all genetic variants identified within a gene of interest -- learning which types of variants are present in different populations, what molecular effects are predicted for them, and how strongly those variants have been associated with T2D or related conditions in human studies.
  • Explore data for an entire chromosomal region, visualizing the effects across datasets of hundreds of variants via easy-to-use modules including a custom, web-only version of the Broad's Integrative Genomics Viewer.
  • Identify sets of variants that are associated with different combinations of traits, using a query builder that can construct custom searches across multiple datasets.
  • Investigate whether deactivating a gene in humans is likely to increase or decrease T2D risk by using an “on-the-fly” custom analysis feature that has recently been released for beta-testing.

At no point are users allowed to see protected genetic data that could be used to identify research participants -- they see only the results of analyses.

Over the coming years, the team hopes to contribute to a federated network of portals that would house many more datasets representing other studies, phenotypes, and data types -- for instance, results from large studies of gene expression and function across different human cell types and tissues, such as GTEx and ENCODE. Collaborators, including Daniel MacArthur and Benjamin Neale at the Broad as well a dedicated team led by Michael Boehnke and Gonçalo Abecasis at the University of Michigan, are already developing new methods and software tools destined for the knowledgebase.

“Combining the most extensive T2D genetic data possible with cutting-edge tools for data analysis, reporting, and visualization will accelerate genetic discovery and enable scientists to ask and answer the questions needed to identify new targets for treatment,” said Boehnke.

Ultimately, the foundation of the portal is the data; it will need collaborators from many countries and projects to succeed, said Mark McCarthy from Oxford University. Already data from over a dozen countries are in the portal. “The aspiration is to engage with researchers around the world," he said. "We will continue to reach out so that the portal contains data that are relevant to researchers and patients everywhere."