Showing posts with label GoT2D. Show all posts
Showing posts with label GoT2D. Show all posts

Wednesday, January 3, 2018

Complete data description now available for T2DKP WES and WGS datasets

A new Data Descriptor publication from Jason Flannick, Christian Fuchsberger, Anubha Mahajan, and colleagues (Scientific Data 4, Article number: 170179 (2017) doi:10.1038/sdata.2017.179), presents absolutely everything there is to know about four large, important datasets that are included in the Type 2 Diabetes Knowledge Portal. These datasets are the product of the GoT2D and T2D-GENES consortia, large international groups that seek to uncover the genetic basis of type 2 diabetes.

The investigators took a variety of approaches to generate the most complete view of the genetic architecture of T2D available to date. They performed whole-exome sequencing on a group of 12,940 individuals of multiple ancestries (6,504 T2D cases and 6,436 controls) and whole-genome sequencing on 2,657 individuals of European descent, and tested the association of variants with T2D. They also used an exome chip to test coding variants in more than 80,000 people, and used imputation to test non-coding variants in an additional 44,000.

In total, the researchers sampled more than 120,000 genomes and identified more than 27 million single nucleotide polymorphisms, indels, and structural variants, testing their association with T2D. The new publication documents the experimental and analytical methods and results in complete detail. Analysis and interpretation of these data were also discussed in a previous publication (Fuchsberger, Flannick, Teslovich, Mahajan, Agarwala, Gaulton et al., 2016).

This comprehensive catalog of T2D associations is available for you to search and explore via the T2D Knowledge Portal. The datasets from this study are named as follows in the T2DKP:

  • GoT2D WGS (whole-genome sequence data)
  • GoT2D WGS + replication (whole-genome sequence data plus imputed genotypes)
  • 13K exome sequence analysis
  • GoT2D exome chip analysis

All of these sets are described in more detail on our Data page, including lists of the cohorts studied and case/control selection criteria for each. Our Variant Finder tool searches all of these sets, and results from these datasets are displayed in various tables and interfaces on the Gene and Variant pages of the T2DKP.

The individual-level data in the 13K exome sequence set are also available for custom analysis via the Genetic Association Interactive Tool (GAIT) on Variant pages and the custom burden test on Gene pages. These tools allow researchers to interact with the individual-level data while protecting patient privacy. They access the 19K exome sequence analysis dataset, which includes the 13K exome sequence data from this study along with 6,000 additional exome sequences from the SIGMA and LuCamp consortia. Both tools allow you to filter samples by multiple criteria (for example, age, BMI, cholesterol levels of the subjects) and to choose covariates before running on-the-fly association analysis. The custom burden test also offers the ability to select the set of variants to consider in the analysis.

Please explore these datasets and, as always, let us know what you think!

Monday, July 11, 2016

World-wide cooperation to address a world-wide problem

If you’re reading this post, you’re likely well aware that type 2 diabetes (T2D) is one of the biggest health problems we face and that its incidence is rising. Clearly, we need a better understanding of how T2D develops and what the risk factors are, along with more effective treatments.

Along with environmental and behavioral factors, variation in the human genome plays an important role in susceptibility to T2D. Mutations that alter gene expression or affect the function of proteins and noncoding RNAs can lead to differences in physiology and, ultimately, to differences in T2D risk. To begin to understand this, we first need to know which variants contribute to T2D and by how much. And for that, we need genetic association data—lots of it. Large amounts of data allow us to refine the genetic association map: reconfirming some previous signals, establishing that others are not significant, and adding evidence for or against the causal roles of variants.

Addressing this need, a study published today in Nature (Fuchsberger, Flannick, Teslovich, Mahajan, Agarwala, Gaulton et al.) presents the results of an international collaboration that has generated an unprecedented amount of T2D genetic data. As befits an approach to a huge problem, everything about this study is huge: the number of collaborators (more than 300, from 22 countries), the number of individual genomes sampled (120,000), the number of variants analyzed (tens of millions); and the number of funding organizations (more than 60). The result is the most comprehensive look at the genetics of T2D available to date.

One of the major projects described in the paper, led by the Genetics of Type 2 Diabetes (GoT2D) Consortium, was whole-genome sequencing for 2,657 people, half T2D cases and half controls. Whole-genome sequence analysis is the only way in which the influence of rare variants can be assessed comprehensively.

An open question in the T2D genetics community has been whether rare variants account for most of the T2D risk, or whether it is due to the effects of many common variants of small effect. This study begins to answer this question. It shows that most T2D risk can be ascribed to the modest effects of a large number of common alleles, and that there is likely no treasure trove of rare variants of large effect waiting to be found.
This project uncovered more than a dozen loci that were associated with T2D at genome-wide significance. Most were common variants, and some, such as the variant rs11759026 near CENPW, had not been seen before in genome-wide association studies. This study also called into question the previously identified associations of some variants and supplied better candidates for the actual T2D risk variant. For example, the noncoding variant rs10401969 had been associated with the CILP2 locus, but the additional data from this project now point to a linked missense variant in TM6SF2 as causal—an exciting finding, since TM6SF2 is involved in fat metabolism and could have a direct role in the development of T2D.
In another project reported by Fuchsberger and colleagues, combining exome sequence data from the T2D-GENES (Type 2 Diabetes Genetic Exploration by Next-generation sequencing in multi-Ethnic Samples) Consortium with the exome sequences obtained by the GoT2D project resulted in a data set of sequences from nearly 13,000 individuals, from five different ethnic groups.   Data sets stratified by different ancestries allow investigation of population-specific associations that might otherwise be obscured. The larger sample size and the focus on coding variation, with presumably larger effects on protein function, was another approach to maximize discovery of rare variants if such were present. Another benefit was to help implicate specific genes in previously associated genomic regions.
One variant identified by this approach has an immediately understandable relationship to diabetes: the rs2233580 variant causes a missense mutation in the PAX4 gene, which encodes a transcription factor that has been implicated in pancreatic islet differentiation. Interestingly, this is a common variant in East Asian populations but is nearly absent in the other ancestries studied. Other variants in the same gene have previously been associated with early-onset monogenic diabetes, so this result is a reminder that different mutations in same gene can have very different effects on the disease process. Other work in this study reaffirmed this conclusion for other genes.
The scale of this study is unprecedented, and we’ve only touched upon a small piece of it here. But something else is unprecedented about these data: they are available for anyone to explore, right now, in the T2D Knowledge Portal. Researchers don’t need to go to various sites to gather bits and pieces of the data, harmonize them, and analyze them; the data sets are globally accessible in the Portal along with pre-computed analyses and sophisticated tools for custom analyses.
The data sets from this study in the Portal are:

  • GoT2D WGS - whole-genome sequence data
  • GoT2D WGS + replication – whole-genome sequence data plus imputed genotypes
  • 13K exome sequence analysis
  • 82K exome chip analysis

  
All of these are described in more detail on our Data page. You can see a list of the cohorts and even view their case/control selection criteria. Our Variant Finder tool may be applied to all of these sets, and the Genetic Association Interactive Tool (GAIT) accesses the 17K exome sequence analysis data set that includes the 13K exome sequence analysis data from this study along with additional data from the SIGMA Consortium, previously published by Estrada et al. in JAMA. You’ll also see results from these data sets in various tables and displays on the Gene and Variant pages of the Portal.

In a review article that was also published today in Nature Reviews Genetics, Flannick and Florez advocate for the aggregation of genetic data in general, and the T2D Knowledge Portal in particular, as a way to democratize the study of T2D and accelerate discoveries that will improve patient care.

“Data from human genetics is highly valuable in identifying and validating the role of specific targets for development of new medicines,” said David Altshuler, who was previously the principal investigator at Broad for the T2D genetics studies and Portal at Broad, and is now Chief Scientific Officer at Vertex Pharmaceuticals.  “When government, non-profits and companies work together with patients to increase our knowledge of the genetic causes of disease, everyone benefits.”  

The Accelerating Medicines Partnership in Type 2 Diabetes funds the T2D Knowledge Portal as a means to facilitate collaboration, with the goal of benefitting patients with T2D world-wide. “Whether you are a biologist exploring a specific pathway in a model system, a pharmaceutical investigator examining an appealing drug target, or a clinician pondering whether a newly identified variant is the cause of a patient’s symptoms, having well curated human genetic data matched to carefully defined phenotypes at your fingertips should provide rapid insight and accelerate discovery,” said Jose Florez, the Chief of the Diabetes Unit at the Massachusetts General Hospital and a human geneticist at the Broad Institute, who leads one of the groups developing the Knowledge Portal. The deposition of the huge data sets from the Fuchsberger et al. study into the Portal has demonstrated that the processes in place for data intake, harmonization, and quality control are functional and can work at scale. We hope that other researchers and consortia will follow suit and help to make the Portal an even more powerful catalyst for new insights into T2D.

Wednesday, May 18, 2016

Expanding the landscape of human genetic variation data in the Type 2 Diabetes Knowledge Portal

With the addition of four new sequence data sets to our database, the number of variants and associations accessible via the Portal pages and tools has increased by millions.

Two of the new data sets are from projects that have obtained sequence data from a wide range of individuals. The ExAC data set, comprising exome sequences collected and harmonized by the Exome Aggregation Consortium, includes sequence data from 60,706 unrelated people of multiple ancestries. The 1000 Genomes data set, from the International Genome Sample Resource project (IGSR), is composed of whole-genome sequences from 2,504 individuals in four different ethnic groups. 


The allele frequencies of variants in the different ethnic groups surveyed in the 1000 Genomes data set can be seen in the “How common is…?” section on the Variant pages (view an example). And both the ExAC and 1000 Genomes data sets can be queried using the Variant Finder tool. You can select them via a new tab on the interface, “Additional search options”, where you can choose these data sets and also add more criteria to your search. 

The Data set pull-down menu on the "Additional Search Options" tab of the Variant Finder lets you specify 1000 Genomes or ExAC data.

Available selections in the Data set pull-down menu.


The other two new data sets in the Portal were both generated by the GoT2D consortium. A whole-genome sequence data set (GoT2D WGS) adds data from 2,657 individuals, including the associations of noncoding variants that were not present in the previous whole-exome sequence data set from the GoT2D project. This new data set brings T2D association data across 30 million variants to the Portal. The GoT2D WGS + replication data set adds imputation to that set, bringing the sample size to over 47,000 and including most low-frequency and common variants.  

The new GoT2D data can be seen in multiple sections of the Portal’s Gene and Variant pages, and may also be accessed by selecting these data sets in the Variant Finder.

In addition to these major new additions, today’s release of data also includes some bug fixes and data harmonization.

Get out there and explore the new data landscape in the Portal, and let us know what you think!