Monday, July 11, 2016

World-wide cooperation to address a world-wide problem

If you’re reading this post, you’re likely well aware that type 2 diabetes (T2D) is one of the biggest health problems we face and that its incidence is rising. Clearly, we need a better understanding of how T2D develops and what the risk factors are, along with more effective treatments.

Along with environmental and behavioral factors, variation in the human genome plays an important role in susceptibility to T2D. Mutations that alter gene expression or affect the function of proteins and noncoding RNAs can lead to differences in physiology and, ultimately, to differences in T2D risk. To begin to understand this, we first need to know which variants contribute to T2D and by how much. And for that, we need genetic association data—lots of it. Large amounts of data allow us to refine the genetic association map: reconfirming some previous signals, establishing that others are not significant, and adding evidence for or against the causal roles of variants.

Addressing this need, a study published today in Nature (Fuchsberger, Flannick, Teslovich, Mahajan, Agarwala, Gaulton et al.) presents the results of an international collaboration that has generated an unprecedented amount of T2D genetic data. As befits an approach to a huge problem, everything about this study is huge: the number of collaborators (more than 300, from 22 countries), the number of individual genomes sampled (120,000), the number of variants analyzed (tens of millions); and the number of funding organizations (more than 60). The result is the most comprehensive look at the genetics of T2D available to date.

One of the major projects described in the paper, led by the Genetics of Type 2 Diabetes (GoT2D) Consortium, was whole-genome sequencing for 2,657 people, half T2D cases and half controls. Whole-genome sequence analysis is the only way in which the influence of rare variants can be assessed comprehensively.

An open question in the T2D genetics community has been whether rare variants account for most of the T2D risk, or whether it is due to the effects of many common variants of small effect. This study begins to answer this question. It shows that most T2D risk can be ascribed to the modest effects of a large number of common alleles, and that there is likely no treasure trove of rare variants of large effect waiting to be found.
This project uncovered more than a dozen loci that were associated with T2D at genome-wide significance. Most were common variants, and some, such as the variant rs11759026 near CENPW, had not been seen before in genome-wide association studies. This study also called into question the previously identified associations of some variants and supplied better candidates for the actual T2D risk variant. For example, the noncoding variant rs10401969 had been associated with the CILP2 locus, but the additional data from this project now point to a linked missense variant in TM6SF2 as causal—an exciting finding, since TM6SF2 is involved in fat metabolism and could have a direct role in the development of T2D.
In another project reported by Fuchsberger and colleagues, combining exome sequence data from the T2D-GENES (Type 2 Diabetes Genetic Exploration by Next-generation sequencing in multi-Ethnic Samples) Consortium with the exome sequences obtained by the GoT2D project resulted in a data set of sequences from nearly 13,000 individuals, from five different ethnic groups.   Data sets stratified by different ancestries allow investigation of population-specific associations that might otherwise be obscured. The larger sample size and the focus on coding variation, with presumably larger effects on protein function, was another approach to maximize discovery of rare variants if such were present. Another benefit was to help implicate specific genes in previously associated genomic regions.
One variant identified by this approach has an immediately understandable relationship to diabetes: the rs2233580 variant causes a missense mutation in the PAX4 gene, which encodes a transcription factor that has been implicated in pancreatic islet differentiation. Interestingly, this is a common variant in East Asian populations but is nearly absent in the other ancestries studied. Other variants in the same gene have previously been associated with early-onset monogenic diabetes, so this result is a reminder that different mutations in same gene can have very different effects on the disease process. Other work in this study reaffirmed this conclusion for other genes.
The scale of this study is unprecedented, and we’ve only touched upon a small piece of it here. But something else is unprecedented about these data: they are available for anyone to explore, right now, in the T2D Knowledge Portal. Researchers don’t need to go to various sites to gather bits and pieces of the data, harmonize them, and analyze them; the data sets are globally accessible in the Portal along with pre-computed analyses and sophisticated tools for custom analyses.
The data sets from this study in the Portal are:

  • GoT2D WGS - whole-genome sequence data
  • GoT2D WGS + replication – whole-genome sequence data plus imputed genotypes
  • 13K exome sequence analysis
  • 82K exome chip analysis

All of these are described in more detail on our Data page. You can see a list of the cohorts and even view their case/control selection criteria. Our Variant Finder tool may be applied to all of these sets, and the Genetic Association Interactive Tool (GAIT) accesses the 17K exome sequence analysis data set that includes the 13K exome sequence analysis data from this study along with additional data from the SIGMA Consortium, previously published by Estrada et al. in JAMA. You’ll also see results from these data sets in various tables and displays on the Gene and Variant pages of the Portal.

In a review article that was also published today in Nature Reviews Genetics, Flannick and Florez advocate for the aggregation of genetic data in general, and the T2D Knowledge Portal in particular, as a way to democratize the study of T2D and accelerate discoveries that will improve patient care.

“Data from human genetics is highly valuable in identifying and validating the role of specific targets for development of new medicines,” said David Altshuler, who was previously the principal investigator at Broad for the T2D genetics studies and Portal at Broad, and is now Chief Scientific Officer at Vertex Pharmaceuticals.  “When government, non-profits and companies work together with patients to increase our knowledge of the genetic causes of disease, everyone benefits.”  

The Accelerating Medicines Partnership in Type 2 Diabetes funds the T2D Knowledge Portal as a means to facilitate collaboration, with the goal of benefitting patients with T2D world-wide. “Whether you are a biologist exploring a specific pathway in a model system, a pharmaceutical investigator examining an appealing drug target, or a clinician pondering whether a newly identified variant is the cause of a patient’s symptoms, having well curated human genetic data matched to carefully defined phenotypes at your fingertips should provide rapid insight and accelerate discovery,” said Jose Florez, the Chief of the Diabetes Unit at the Massachusetts General Hospital and a human geneticist at the Broad Institute, who leads one of the groups developing the Knowledge Portal. The deposition of the huge data sets from the Fuchsberger et al. study into the Portal has demonstrated that the processes in place for data intake, harmonization, and quality control are functional and can work at scale. We hope that other researchers and consortia will follow suit and help to make the Portal an even more powerful catalyst for new insights into T2D.