Friday, January 19, 2018

New METSIM dataset adds individual-level GWAS data to the T2DKP

The Finnish population is a valuable genetic resource. Having undergone multiple population bottlenecks, this relatively homogeneous population is enriched in low-frequency and loss-of-function variants. Even better, Finns are generally willing to participate in research studies, and many measures of their health are detailed in comprehensive electronic health records.

To take advantage of these characteristics, the METSIM (Metabolic Syndrome in Men) study (Laakso et al. 2017, J. Lipid Res. 58, 481-493) was initiated in 2005. Over 10,000 Finnish men were examined between 2005 and 2010. All of the subjects were phenotyped extensively, with an emphasis on traits associated with type 2 diabetes (T2D), cardiovascular disease, and insulin resistance, and their genotypes and exome sequences were determined. Subsets of the group have been characterized in more detail, with whole-genome sequencing and detailed analyses of transcripts and gene expression, DNA methylation, gut microbiome composition, and other phenotypes.

Now, you can easily access results from the METSIM cohort in the T2D Knowledge Portal. Variant associations with T2D, fasting glucose levels, and fasting insulin levels are available, both unadjusted or adjusted for body mass index. The individual-level data are also available for interactive analyses using our Genetic Association Interactive Tool (GAIT; see below), which allows you to design and run custom association analyses using custom subsets of the samples, while always protecting patient privacy. The addition of METSIM data brings to nearly 68,000 the number of samples available for analysis in GAIT.

The Foundation for the NIH and the Accelerating Medicines Partnership in Type 2 Diabetes were instrumental in bringing these data, generated by researchers in Finland and the U.S., to the T2DKP. Individual-level genotype data from 1,185 T2D cases and 7,357 controls were deposited into the Data Coordinating Center (AMP T2D DCC), and analysis and quality control were performed by the DCC analysis team. The experiment design and analysis are summarized on our Data page, and detailed reports that fully document the analysis are available for download.

The METSIM GWAS dataset currently has "Early Access Phase 1" status in the T2DKP, which is assigned to new data. This status denotes that although analysis and quality control checks have been performed, the data are not yet considered to be in their final state. During the early access period, users may analyze the data but may not submit the results of these analyses for publication. Find full details about the different phases of data release on our Policies page.

Results from METSIM GWAS may be viewed at these locations in the T2D Knowledge Portal:

• On Gene Pages (e.g., MTNR1B) in the Common variants and High-impact variants tables and in LocusZoom static plots, for the phenotypes T2D, T2D adjusted for BMI, fasting glucose, fasting glucose adjusted for BMI, fasting insulin, and fasting insulin adjusted for BMI;

• On Variant Pages (e.g.rs579060) in the Associations at a glance section, the Association statistics across traits table, and in LocusZoom static plots;

• From the View full genetic association results for a phenotype search on the home page: first select one of the phenotypes listed above, and then on the resulting page, select the METSIM GWAS dataset.

Individual-level METSIM GWAS data may be used for custom interactive analyses using these tools in the T2DKP:

• Using the Variant Finder tool, you may specify multiple criteria and retrieve the set of variants meeting those criteria;

• Using the Genetic Association Interactive Tool (GAIT) on Variant Pages, you may select the METSIM GWAS dataset, choose one of 5 phenotypes for association analysis, choose custom covariates, and filter the sample pool by specifying a range of values for one or more of 8 different phenotypes, then run on-the-fly analysis.

Phenotypes available for association analysis of METSIM GWAS data in GAIT

Covariates available for selection when analyzing METSIM GWAS data in GAIT

Samples may be filtered by setting ranges for one or more of 8 phenotypes for the METSIM GWAS dataset

Wednesday, January 3, 2018

Complete data description now available for T2DKP WES and WGS datasets

A new Data Descriptor publication from Jason Flannick, Christian Fuchsberger, Anubha Mahajan, and colleagues (Scientific Data 4, Article number: 170179 (2017) doi:10.1038/sdata.2017.179), presents absolutely everything there is to know about four large, important datasets that are included in the Type 2 Diabetes Knowledge Portal. These datasets are the product of the GoT2D and T2D-GENES consortia, large international groups that seek to uncover the genetic basis of type 2 diabetes.

The investigators took a variety of approaches to generate the most complete view of the genetic architecture of T2D available to date. They performed whole-exome sequencing on a group of 12,940 individuals of multiple ancestries (6,504 T2D cases and 6,436 controls) and whole-genome sequencing on 2,657 individuals of European descent, and tested the association of variants with T2D. They also used an exome chip to test coding variants in more than 80,000 people, and used imputation to test non-coding variants in an additional 44,000.

In total, the researchers sampled more than 120,000 genomes and identified more than 27 million single nucleotide polymorphisms, indels, and structural variants, testing their association with T2D. The new publication documents the experimental and analytical methods and results in complete detail. Analysis and interpretation of these data were also discussed in a previous publication (Fuchsberger, Flannick, Teslovich, Mahajan, Agarwala, Gaulton et al., 2016).

This comprehensive catalog of T2D associations is available for you to search and explore via the T2D Knowledge Portal. The datasets from this study are named as follows in the T2DKP:

  • GoT2D WGS (whole-genome sequence data)
  • GoT2D WGS + replication (whole-genome sequence data plus imputed genotypes)
  • 13K exome sequence analysis
  • GoT2D exome chip analysis

All of these sets are described in more detail on our Data page, including lists of the cohorts studied and case/control selection criteria for each. Our Variant Finder tool searches all of these sets, and results from these datasets are displayed in various tables and interfaces on the Gene and Variant pages of the T2DKP.

The individual-level data in the 13K exome sequence set are also available for custom analysis via the Genetic Association Interactive Tool (GAIT) on Variant pages and the custom burden test on Gene pages. These tools allow researchers to interact with the individual-level data while protecting patient privacy. They access the 19K exome sequence analysis dataset, which includes the 13K exome sequence data from this study along with 6,000 additional exome sequences from the SIGMA and LuCamp consortia. Both tools allow you to filter samples by multiple criteria (for example, age, BMI, cholesterol levels of the subjects) and to choose covariates before running on-the-fly association analysis. The custom burden test also offers the ability to select the set of variants to consider in the analysis.

Please explore these datasets and, as always, let us know what you think!

Wednesday, November 15, 2017

T2DKP Fall Newsletter

The latest issue of our quarterly newsletter is now available. Download it here to find out what we've been up to!

Tuesday, November 14, 2017

Announcing the Cardiovascular Disease Knowledge Portal

We are pleased to announce the launch of the Cardiovascular Disease Knowledge Portal (CVDKP). Our collaboration with Dr. Patrick Ellinor, Dr. Sek Kathiresan, and their colleagues in the Atrial Fibrillation, Global Lipids Genetics, Myocardial Infarction Genetics, and CARDIoGRAMPlusC4D consortia has created a resource that offers world-wide open access to genetic and genomic information about atrial fibrillation, myocardial infarction, and related traits, with the goal of democratizing access to genomic data and accelerating cardiovascular genomics research.

CVDKP home page

The CVDKP is constructed on a software architecture originally developed for the Type 2 Diabetes Knowledge Portal (T2DKP), which is the central product of the Accelerating Medicines Partnership in Type 2 Diabetes (AMP T2D). AMP T2D is a public-private partnership between the National Institutes of Health, the U.S. Food and Drug Administration, biopharmaceutical companies, and non-profit organizations that is managed through the Foundation for the NIH. AMP seeks to harness collective capabilities, scale, and resources toward improving current efforts to develop new therapies for complex, heterogeneous diseases.

The ultimate goal of AMP T2D is to increase the number of new diagnostics and therapies for patients while reducing the time and cost of developing them, by jointly identifying and validating promising biological targets for type 2 diabetes. The T2DKP furthers that goal by aggregating, harmonizing, and displaying genetic association and epigenomic results along with user-friendly analysis tools, allowing research biologists who would not individually be able to amass and manipulate these large datasets to glean insights from the data.

We are working towards these same goals for other complex diseases, by extending the platform and analysis tools constructed for the T2DKP. In partnership with the International Stroke Genetics Consortium, we recently created a Knowledge Portal for cerebrovascular disease (CDKP) based on the same infrastructure. Now, with the advent of the Cardiovascular Disease Knowledge Portal, we have a three-member Knowledge Portal Network for the genetics of cardiometabolic and cerebrovascular disease.

Data in the CVDKP directly relevant to heart disease include genetic associations with atrial fibrillation, electrocardiogram traits, plasma lipid levels, and myocardial infarction. Additional association datasets are available for type 2 diabetes and glycemic traits, anthropometric traits, measures of kidney function, and psychiatric traits. You may browse the complete list of datasets and their descriptions on the CVDKP Data page.

As for the Cerebrovascular Disease Knowledge Portal, in the CVDKP we also continue to work with the American Heart Association Precision Medicine Platform (PMP) to provide an additional avenue for accessing cardiovascular genetic data. Currently, summary statistics from the AFGen GWAS and AFGen exome chip analysis datasets are deposited in the PMP.

We welcome all suggestions, comments, questions, and submission of relevant datasets for the CVDKP. Please contact us at!

Wednesday, October 25, 2017

New phenotypes and physical activity stratification available in the T2DKP

We’ve recently updated one dataset and added another in the Type 2 Diabetes Knowledge Portal. Associations with multiple new phenotypes are now available for the BioMe AMP T2D GWAS dataset, and the new dataset "GIANT GWAS - stratified by physical activity" adds associations with anthropometric traits for cohorts stratified by gender and physical activity levels.

The BioMe AMP T2D GWAS dataset was first added to the T2DKP in early 2017, initially with three phenotypes (T2D, fasting glucose levels, and HbA1c levels). Deposition and analysis of these data was funded by the Accelerating Medicines Partnership in Type 2 Diabetes (AMP T2D), a collaboration between multiple stakeholders that aims to catalyze the clinical translation of genetic discoveries by producing and aggregating data, developing and implementing novel analytical methods and tools, and building infrastructure for data storage and presentation. This dataset was the first to be entirely produced within the AMP T2D project, including the deposition, analysis, quality control, and presentation of the data.

The data were generated at the Charles Bronfman Institute for Personalized Medicine BioMe BioBank, a biorepository located at the Mount Sinai Medical Center (MSMC) in the upper Manhattan area of New York City. MSMC serves a diverse population of over 800,000 outpatients each year. Importantly, since many BioMe participants are African American or Hispanic Latino, this dataset adds significant ethnic diversity to the Portal’s genetic association data.

The data were subjected to quality control and association analysis by the Analysis Team at the AMP Data Coordinating Center (DCC) at the Broad Institute. In this second phase of analysis, associations with seven traits were calculated: systolic and diastolic blood pressure; HDL and LDL cholesterol levels; creatinine levels and eGFR-creat; and BMI. A detailed analysis report for these associations may be downloaded from the BioMe AMP T2D GWAS section of our Data page.

The new GIANT dataset was generated by the GIANT (Genetic Investigation of Anthropometric Traits) consortium via a meta-analysis of genetic associations for BMI, waist-hip ratio, and waist circumference from more than 200,000 adults. Samples are stratified by sex, ancestry, and physical activity level (active or inactive). This work was published in a recent paper by Graff et al.

Data from both the BioMe and GIANT studies are available at these locations in the Portal:
  • On Gene pages (see an example) in the Common variants and High-impact variants tables and in LocusZoom static plots
  • On Variant pages  (see an example) in the Associations at a glance section and in the Association statistics across traits table, and in LocusZoom static plots 
  • Via the Variant Finder tool
  • "Manhattan plots" of associations across the genome may be seen by selecting one of the phenotypes analyzed in these datasets in the View full genetic association results for a phenotype scroll box on the Portal home page
  • Additionally, the BioMe data are available for sample filtering and custom association analysis via the Genetic Association Interactive Tool (GAIT) on Variant pages.

Please check out the new data and contact us with any questions, comments, or suggestions.

Monday, October 16, 2017

Learn about complex disease knowledge portals at ASHG 2017

Members of the Knowledge Portal team will be attending the American Society of Human Genetics meeting this week in Orlando, FL.

We'll be talking about the continuing progress of the Type 2 Diabetes Knowledge Portal, which has grown dramatically since ASHG 2016, with loads of new data and many new features. We'll also present our work towards expanding the T2DKP framework to other complex diseases, with the recent release of a new sibling portal for stroke genetics, the Cerebrovascular Disease Knowledge Portal.

You can catch us nearly every day of the meeting:

Wednesday 10/18

10 AM - 5 PM: Find us in the exhibit hall at booth #863. We’ll be there to answer your questions and give tours and tutorials on the Knowledge Portal Network.

10-10:30 AM: Demonstration of the Type 2 Diabetes Knowledge Portal at our booth, #863.
10:30-11 AM: Demonstration of the Cerebrovascular Disease Knowledge Portal at our booth, #863.

2 PM - 4 PM: Ben Alexander will present poster #1186: The Type 2 Diabetes Knowledge Portal: Clearing a path from genetic associations to disease biology.

Thursday 10/19

10 AM - 5 PM: We will again be in the exhibit hall at booth #863.

10:30 - 11:30 AM: Portal team members will be available at the Broad Institute booth (#1037) for demonstrations and tutorials.

2-2:30 PM: Demonstration of the Type 2 Diabetes Knowledge Portal at our booth, #863.
2:30-3 PM: Demonstration of the Cerebrovascular Disease Knowledge Portal at our booth, #863.

4:15 PM–6:15 PM: Portal team members will be participating in Concurrent Invited Session #49:

Data Sharing, Analysis, and Tools to Catalyze Translation from Genomic to Clinical Knowledge
Room 330C, Level 3, Convention Center
Moderators: Benjamin Neale and Noël Burtt
Serving genetic data and tools to the world - Jason Flannick.
The EGA as a platform for effective data sharing of human genetic and phenotype data -Thomas Keane.
Converting sequence data from over 140,000 people into rare disease diagnoses - Daniel MacArthur.
Assessing the phenome-wide consequences of genetically regulated molecular traits - Hae Kyung Im.

Friday 10/20

10 AM - 2:30 PM: This is our last day in the exhibit hall at booth #863.

We look forward to meeting you at ASHG! If you have questions and cannot meet us any of these times, or if you won’t be at ASHG, our mailbox is always open at

Thursday, October 5, 2017

Portal team presents at Festival of Genomics

The Festival of Genomics was held in Boston on October 3-4, and the Type 2 Diabetes Knowledge Portal was well represented. This meeting brings together multiple stakeholders in genomics and health care, using innovative formats to educate, promote connections, and spark conversations.

On the first day of the meeting, Nöel Burtt presented a talk focusing on the T2DKP: "The Type 2 Diabetes Knowledge Portal: accelerating action from data."

Nöel Burtt speaking at the Festival of Genomics

The next day, T2DKP team members participated in a session with an unusual format, designed to stimulate interaction and conversation: "In the Loop."  Jason Flannick, Senior Group Leader for the Portal project, chaired the session, which was entitled "Complex data, complex solutions: tackling the challenges of genetic data sharing, interpretation, and representation." "Loop leaders," all from the Broad Institute, included:

  • Sean Simmons, speaking on "Technological solutions to privacy concerns around data access and utilization";
  • Moran Cabili, speaking on "Ethical and technical challenges to genetic data access";
  • David Siedzik, speaking on "Analysis tools for genetic data and their scalability"; and
  • Maria Costanzo, speaking on "Representing and democratizing access to genetic association results."

Left to right: Jason Flannick, Sean Simmons, Moran Cabili, David Siedzik, and Maria Costanzo participate in an "In the Loop" session
The session began with an introduction from the chair and speakers, and then the audience broke into four groups for discussion with each of the Loop leaders. Each leader facilitated a discussion among two different groups, and then the entire group convened again to sum up the discussions. The group may not have come up with any breakthrough ideas, but participants definitely emerged with a clearer picture of the many challenges around aggregating and representing genetic data.

The T2DKP team is headed next to the American Society of Human Genetics meeting. Stay tuned for a preview of our participation there in the next blog post!