Tuesday, February 6, 2018

Federation brings three new datasets to the T2DKP

Our mission at the Type 2 Diabetes Knowledge Portal (T2DKP) is to aggregate and analyze genetic association data relevant to T2D, and to make the knowledge that can be gleaned from these data available to researchers around the world. But it isn't possible to aggregate all of the relevant data in one place: privacy regulations at the institutional, regional, and national levels determine how these data are handled, and whether or where they can be transferred.

The T2DKP is supported by the Accelerating Medicines Partnership in Type 2 Diabetes (AMP T2D),  a pre-competitive partnership among the National Institutes of Health, industry, and not-for-profit organizations, managed by the Foundation for the National Institutes of Health. Because AMP T2D seeks to facilitate discovery of new targets for T2D treatment by making as much data as possible available via the T2DKP, it funded the development of a mechanism for establishing interconnected federated nodes of the T2DKP that would enable researchers to interact with all of the data regardless of where they are located.

This goal was realized with the creation, by a team led by Thomas Keane and Dylan Spalding, of a federated node of the T2DKP at the European Bioinformatics Institute (EBI).  Data housed at the EBI node are stored in such a way that their specific privacy requirements are met, but they are made available for remote queries via T2DKP tools and interfaces. Results from such queries are served up alongside results from all of the datasets housed in the AMP T2D Data Coordinating Center (DCC) at the Broad Institute. Researchers may browse and query data from any location without even needing to know where they reside. This federation mechanism represents both an important technical advance in handling and protecting data, and a significant step forward in democratizing and improving access to genetic association results.

The first dataset to be incorporated into the Portal via the EBI federated node was the Oxford BioBank exome chip analysis dataset, which contains association data for glycemic, lipid, and blood pressure traits from over 7,100 subjects in Oxfordshire, U.K. The EBI Federated Node has now added three more datasets:

  • The EXTEND GWAS dataset, generated by Drs. Timothy Frayling and Andrew Wood and their colleagues, is comprised of 7,159 samples (1,395 T2D cases and 5,764 controls) from the Exeter EXTEND Biobank. It includes associations for a wealth of glycemic, anthropometric, cardiovascular, renal, and hepatic phenotypes--including many that are new to the T2DKP.
  • The GoDARTS Affymetrix GWAS dataset, from Dr. Colin Palmer and colleagues, includes summary-level statistics for associations with BMI and blood lipid levels from 3,307 diabetic participants in the Genetics of Diabetes Audit and Research Study in Tayside Scotland. In addition, individual-level data from over 17,000 subjects (including the set from which summary statistics were calculated) are available via the GAIT tool (see below). 
  • The Oxford BioBank Axiom GWAS dataset, from Dr. Fredrik Karpe and colleagues, includes associations for BMI and blood lipid levels from 7,193 participants, all healthy men and women between 30 and 50 years of age. It represents an additional analysis of the same samples contained in the Oxford BioBank exome chip analysis dataset.
These datasets are described in detail on our Data page. Summary results from all three sets are integrated into Gene and Variant pages in the T2DKP, and may also be viewed in the Manhattan plots accessible by searching for a phenotype from the T2DKP home page. The Variant Finder also queries these datasets.

The individual-level data behind all three of these datasets is accessible for custom association analysis in our Genetic Association Interactive Tool (GAIT) on Variant pages. Using this tool, researchers can filter samples to create a custom subset with defined characteristics such as age, gender, BMI, and other measures, and then run on-the-fly association analysis within that sample subset. Now, GAIT queries datasets both at the DCC and at the Federated node, using the same methodology for each, in a way that is transparent to users of the tool. The new Federated datasets bring the total number of individual-level samples available for custom analysis in the T2DKP to 67,768.