Showing posts with label federation. Show all posts
Showing posts with label federation. Show all posts

Tuesday, February 6, 2018

Federation brings three new datasets to the T2DKP

Our mission at the Type 2 Diabetes Knowledge Portal (T2DKP) is to aggregate and analyze genetic association data relevant to T2D, and to make the knowledge that can be gleaned from these data available to researchers around the world. But it isn't possible to aggregate all of the relevant data in one place: privacy regulations at the institutional, regional, and national levels determine how these data are handled, and whether or where they can be transferred.

The T2DKP is supported by the Accelerating Medicines Partnership in Type 2 Diabetes (AMP T2D),  a pre-competitive partnership among the National Institutes of Health, industry, and not-for-profit organizations, managed by the Foundation for the National Institutes of Health. Because AMP T2D seeks to facilitate discovery of new targets for T2D treatment by making as much data as possible available via the T2DKP, it funded the development of a mechanism for establishing interconnected federated nodes of the T2DKP that would enable researchers to interact with all of the data regardless of where they are located.

This goal was realized with the creation, by a team led by Thomas Keane and Dylan Spalding, of a federated node of the T2DKP at the European Bioinformatics Institute (EBI).  Data housed at the EBI node are stored in such a way that their specific privacy requirements are met, but they are made available for remote queries via T2DKP tools and interfaces. Results from such queries are served up alongside results from all of the datasets housed in the AMP T2D Data Coordinating Center (DCC) at the Broad Institute. Researchers may browse and query data from any location without even needing to know where they reside. This federation mechanism represents both an important technical advance in handling and protecting data, and a significant step forward in democratizing and improving access to genetic association results.

The first dataset to be incorporated into the Portal via the EBI federated node was the Oxford BioBank exome chip analysis dataset, which contains association data for glycemic, lipid, and blood pressure traits from over 7,100 subjects in Oxfordshire, U.K. The EBI Federated Node has now added three more datasets:

  • The EXTEND GWAS dataset, generated by Drs. Timothy Frayling and Andrew Wood and their colleagues, is comprised of 7,159 samples (1,395 T2D cases and 5,764 controls) from the Exeter EXTEND Biobank. It includes associations for a wealth of glycemic, anthropometric, cardiovascular, renal, and hepatic phenotypes--including many that are new to the T2DKP.
  • The GoDARTS Affymetrix GWAS dataset, from Dr. Colin Palmer and colleagues, includes summary-level statistics for associations with BMI and blood lipid levels from 3,307 diabetic participants in the Genetics of Diabetes Audit and Research Study in Tayside Scotland. In addition, individual-level data from over 17,000 subjects (including the set from which summary statistics were calculated) are available via the GAIT tool (see below). 
  • The Oxford BioBank Axiom GWAS dataset, from Dr. Fredrik Karpe and colleagues, includes associations for BMI and blood lipid levels from 7,193 participants, all healthy men and women between 30 and 50 years of age. It represents an additional analysis of the same samples contained in the Oxford BioBank exome chip analysis dataset.
These datasets are described in detail on our Data page. Summary results from all three sets are integrated into Gene and Variant pages in the T2DKP, and may also be viewed in the Manhattan plots accessible by searching for a phenotype from the T2DKP home page. The Variant Finder also queries these datasets.

The individual-level data behind all three of these datasets is accessible for custom association analysis in our Genetic Association Interactive Tool (GAIT) on Variant pages. Using this tool, researchers can filter samples to create a custom subset with defined characteristics such as age, gender, BMI, and other measures, and then run on-the-fly association analysis within that sample subset. Now, GAIT queries datasets both at the DCC and at the Federated node, using the same methodology for each, in a way that is transparent to users of the tool. The new Federated datasets bring the total number of individual-level samples available for custom analysis in the T2DKP to 67,768.

Friday, June 9, 2017

Providing data access, ensuring data protection

Readers of this post probably don’t need to be convinced that genetic association data have enormous potential for helping us to understand and treat complex diseases like type 2 diabetes. Significant associations between variants and diseases can suggest genes, or regions of the genome, that could be important for disease risk or progression—and this knowledge could help us identify new drug targets.

The Accelerating Medicines Partnership in Type 2 Diabetes (AMP T2D) is a pre-competitive partnership among the National Institutes of Health, industry and not-for-profit organizations, which is managed by the Foundation for the National Institutes of Health. Its mission is to make genetic association data accessible to the worldwide biomedical research community via the Type 2 Diabetes Knowledge Portal, in order to facilitate discovery of new targets for T2D treatment. But it can be a challenge to aggregate genetic data. The privacy of the individuals who contributed their health status and genomic sequences must always be protected, and there are many layers of regulation to ensure this. Restrictions at the institutional, regional, and national levels determine how data are handled and whether they can be transferred.

Until now, all of the results displayed in the Portal have been derived from data housed at the AMP T2D Data Coordinating Center (DCC) at the Broad Institute, where the Portal website resides. But some of the valuable data generated outside the U.S. cannot be transferred to the DCC. To address this issue, AMP T2D funded the development of a mechanism that enables researchers to interact with all of the data: federation. 

Federation means that data are housed at a site (a “federated node”) that meets their specific privacy requirements, but are made available for remote queries via the Portal. Results from such queries are served up alongside results from all of the datasets housed in the AMP T2D DCC. Researchers may browse and query data from any location without even needing to know where they reside.

A federated node has now been created at the European Bioinformatics Institute (EBI) and may be accessed via the T2D Knowledge Portal. Today, Portal tools and interfaces can query both data housed at the AMP T2D DCC at the Broad Institute and data at the EBI federated node. 

According to Paul Flicek, a Senior Scientist and Team Leader of Vertebrate Genomics at EMBL-EBI, “A key mission of EMBL-EBI is to make data available to the widest possible community. Seamlessly accessing stored in multiple locations via a single portal helps ensure that the data we store from many projects are maximally useful for additional research.”

The first dataset to be incorporated into the Portal via the EBI federated node is the Oxford BioBank exome chip analysis dataset, which contains association data for glycemic, lipid, and blood pressure traits from over 7,100 healthy subjects in Oxfordshire, U.K. The dataset is described on our Data page. Portal users can interact with this dataset in the same way (and with the same speed) as with other datasets. 

“Diabetes is a global problem, and it will take research and innovation on a global scale if we are to tackle it effectively,” says Mark McCarthy, Robert Turner Professor of Diabetic Medicine at University of Oxford. “The success of our research on the genetics of diabetes depends on access to data generated by groups around the world. The federated portal provides an additional set of tools that will allow us to jointly analyse those data sets wherever they happen to be based.” 

Federation represents both an important technical advance in handling and protecting data, and a significant step forward in democratizing and improving access to genetic association results. And because it is generally applicable to any kind of genetic association data, it has the potential to have an impact beyond T2D research, facilitating the study of other complex diseases and traits.