Type 2 Diabetes Knowledge Portal News: Complete data description now available for T2DKP WES and WGS datasets

A new Data Descriptor publication from Jason Flannick, Christian Fuchsberger, Anubha Mahajan, and colleagues (Scientific Data 4, Article number: 170179 (2017) doi:10.1038/sdata.2017.179), presents absolutely everything there is to know about four large, important datasets that are included in the Type 2 Diabetes Knowledge Portal. These datasets are the product of the GoT2D and T2D-GENES consortia, large international groups that seek to uncover the genetic basis of type 2 diabetes.

The investigators took a variety of approaches to generate the most complete view of the genetic architecture of T2D available to date. They performed whole-exome sequencing on a group of 12,940 individuals of multiple ancestries (6,504 T2D cases and 6,436 controls) and whole-genome sequencing on 2,657 individuals of European descent, and tested the association of variants with T2D. They also used an exome chip to test coding variants in more than 80,000 people, and used imputation to test non-coding variants in an additional 44,000.

In total, the researchers sampled more than 120,000 genomes and identified more than 27 million single nucleotide polymorphisms, indels, and structural variants, testing their association with T2D. The new publication documents the experimental and analytical methods and results in complete detail. Analysis and interpretation of these data were also discussed in a previous publication (Fuchsberger, Flannick, Teslovich, Mahajan, Agarwala, Gaulton et al., 2016).

This comprehensive catalog of T2D associations is available for you to search and explore via the T2D Knowledge Portal. The datasets from this study are named as follows in the T2DKP:

GoT2D WGS (whole-genome sequence data)
GoT2D WGS + replication (whole-genome sequence data plus imputed genotypes)
13K exome sequence analysis
GoT2D exome chip analysis

All of these sets are described in more detail on our Data page, including lists of the cohorts studied and case/control selection criteria for each. Our Variant Finder tool searches all of these sets, and results from these datasets are displayed in various tables and interfaces on the Gene and Variant pages of the T2DKP.

The individual-level data in the 13K exome sequence set are also available for custom analysis via the Genetic Association Interactive Tool (GAIT) on Variant pages and the custom burden test on Gene pages. These tools allow researchers to interact with the individual-level data while protecting patient privacy. They access the 19K exome sequence analysis dataset, which includes the 13K exome sequence data from this study along with 6,000 additional exome sequences from the SIGMA and LuCamp consortia. Both tools allow you to filter samples by multiple criteria (for example, age, BMI, cholesterol levels of the subjects) and to choose covariates before running on-the-fly association analysis. The custom burden test also offers the ability to select the set of variants to consider in the analysis.

Please explore these datasets and, as always, let us know what you think!

Wednesday, January 3, 2018

Complete data description now available for T2DKP WES and WGS datasets

No comments:

Post a Comment