If you’re reading this post, you’re likely well
aware that type 2 diabetes (T2D) is one of the biggest health problems we face
and that its incidence is rising. Clearly, we need a better understanding of
how T2D develops and what the risk factors are, along with more effective
treatments.
Along with environmental and behavioral factors,
variation in the human genome plays an important role in susceptibility to T2D.
Mutations that alter gene expression or affect the function of proteins and
noncoding RNAs can lead to differences in physiology and, ultimately, to differences
in T2D risk. To begin to understand this, we first need to know which variants
contribute to T2D and by how much. And for that, we need genetic association
data—lots of it. Large amounts of data allow us to refine the genetic
association map: reconfirming some previous signals, establishing that others
are not significant, and adding evidence for or against the causal roles of
variants.
Addressing this need, a study published today
in Nature (Fuchsberger, Flannick, Teslovich, Mahajan, Agarwala, Gaulton et al.) presents the results of an international collaboration that has
generated an unprecedented amount of T2D genetic data. As befits an approach to
a huge problem, everything about this study is huge: the number of
collaborators (more than 300, from 22 countries), the number of individual
genomes sampled (120,000), the number of variants analyzed (tens of millions); and
the number of funding organizations (more than 60). The result is the most
comprehensive look at the genetics of T2D available to date.
One of the major projects described in the
paper, led by the Genetics of Type 2 Diabetes (GoT2D) Consortium, was
whole-genome sequencing for 2,657 people, half T2D cases and half controls. Whole-genome
sequence analysis is the only way in which the influence of rare variants can
be assessed comprehensively.
An
open question in the T2D genetics community has been whether rare variants
account for most of the T2D risk, or whether it is due to the effects of many common
variants of small effect. This study begins to answer this question. It shows
that most T2D risk can be ascribed to the modest effects of a large number of
common alleles, and that there is likely no treasure trove of rare variants of
large effect waiting to be found.
This
project uncovered more than a dozen loci that were associated with T2D at
genome-wide significance. Most were common variants, and some, such as the
variant rs11759026 near CENPW, had not been seen before in genome-wide association studies.
This study also called into question the previously identified associations of some
variants and supplied better candidates for the actual T2D risk variant. For
example, the noncoding variant rs10401969 had been associated with the CILP2 locus, but the additional data
from this project now point to a linked missense variant in TM6SF2 as causal—an exciting finding,
since TM6SF2 is involved in fat
metabolism and could have a direct role in the development of T2D.
In
another project reported by Fuchsberger and colleagues, combining exome
sequence data from the T2D-GENES (Type 2 Diabetes Genetic Exploration by Next-generation sequencing in multi-Ethnic Samples) Consortium with the exome sequences obtained
by the GoT2D project resulted in a data set of sequences from nearly 13,000
individuals, from five different ethnic groups. Data sets stratified by different ancestries
allow investigation of population-specific associations that might otherwise be
obscured. The larger sample size and the focus on coding variation, with
presumably larger effects on protein function, was another approach to maximize
discovery of rare variants if such were present. Another benefit was to help
implicate specific genes in previously associated genomic regions.
One
variant identified by this approach has an immediately understandable
relationship to diabetes: the rs2233580 variant causes a missense
mutation in the PAX4 gene, which
encodes a transcription factor that has been implicated in pancreatic islet
differentiation. Interestingly, this is a common variant in East Asian
populations but is nearly absent in the other ancestries studied. Other
variants in the same gene have previously been associated with early-onset
monogenic diabetes, so this result is a reminder that different mutations in
same gene can have very different effects on the disease process. Other work in
this study reaffirmed this conclusion for other genes.
The
scale of this study is unprecedented, and we’ve only touched upon a small piece
of it here. But something else is unprecedented about these data: they are
available for anyone to explore, right now, in the T2D Knowledge Portal. Researchers
don’t need to go to various sites to gather bits and pieces of the data,
harmonize them, and analyze them; the data sets are globally accessible in the
Portal along with pre-computed analyses and sophisticated tools for custom
analyses.
The data sets from this study in the Portal
are:
- GoT2D WGS - whole-genome sequence data
- GoT2D WGS + replication – whole-genome sequence data plus imputed genotypes
- 13K exome sequence analysis
- 82K exome chip analysis
All of these are described in more detail on
our Data page. You can see a list of the cohorts and even view their
case/control selection criteria. Our Variant Finder tool may be applied to all
of these sets, and the Genetic Association Interactive Tool (GAIT) accesses the 17K exome sequence analysis data set that includes the 13K exome sequence
analysis data from this study along with additional data from the SIGMA Consortium,
previously published by Estrada et al.
in JAMA. You’ll also see results from
these data sets in various tables and displays on the Gene and Variant pages of
the Portal.
In a review article that was also published
today in Nature Reviews Genetics, Flannick and Florez advocate for the
aggregation of genetic data in general, and the T2D Knowledge Portal in
particular, as a way to democratize the study of T2D and accelerate discoveries
that will improve patient care.
“Data
from human genetics is highly valuable in identifying and validating the role
of specific targets for development of new medicines,” said David Altshuler,
who was previously the principal investigator at Broad for the T2D genetics
studies and Portal at Broad, and is now Chief Scientific Officer at Vertex
Pharmaceuticals. “When government, non-profits and companies work
together with patients to increase our knowledge of the genetic causes of
disease, everyone benefits.”
The
Accelerating Medicines Partnership in Type 2 Diabetes funds the T2D Knowledge
Portal as a means to facilitate collaboration, with the goal of benefitting patients
with T2D world-wide. “Whether you are a biologist exploring a specific pathway
in a model system, a pharmaceutical investigator examining an appealing drug
target, or a clinician pondering whether a newly identified variant is the
cause of a patient’s symptoms, having well curated human genetic data matched
to carefully defined phenotypes at your fingertips should provide rapid insight
and accelerate discovery,” said Jose Florez, the Chief of the Diabetes Unit at
the Massachusetts General Hospital and a human geneticist at the Broad
Institute, who leads one of the groups developing the Knowledge Portal. The
deposition of the huge data sets from the Fuchsberger et al. study into the Portal has demonstrated that the processes in
place for data intake, harmonization, and quality control are functional and
can work at scale. We hope that other researchers and consortia will follow
suit and help to make the Portal an even more powerful catalyst for new
insights into T2D.