Monday, January 22, 2018

GWAS data re-analysis yields novel results about T2D risk

"Waste not, want not." The old proverb is about frugality, but a study published today gives it a whole new dimension. Lead author Sílvia Bonàs, directed by Josep Mercader and David Torrents and collaborating with many colleagues at the Barcelona Supercomputing Center, the Broad Institute, and other institutions (Bonàs-Guarch et al. (2018), Nature Communications 9), decided to investigate variants associated with type 2 diabetes (T2D) by re-analyzing existing GWAS data rather than initiating a new study.

This was a frugal strategy, conserving both time and resources. But the benefits of this approach went way beyond frugality. By aggregating multiple datasets and using unified, current methods for quality control, imputation, and association analysis, the researchers discovered nuggets of significant information that were not apparent in the original analyses of the individual sets. And all of these nuggets are freely available for browsing and searching in the T2D Knowledge Portal (T2DKP).

To amass these data, the researchers combined all of the individual-level T2D case-control GWAS data that were available from the European Genome-Phenome Archive (EGA) and the database of Genotypes and Phenotypes (dbGaP). After harmonization and quality control, data from 70,127 subjects (12,931 cases and 57,196 controls) remained, inspiring them to name the project "70KforT2D".

In the time since the original studies had been performed, better and more comprehensive reference panels for imputation had been generated by the 1000 Genomes and UK10K projects. By using both of these panels for imputation, the researchers were able to substantially increase the number of variants that could be imputed. They ended up with more than 15 million variants, including more than 5 million rare variants and over 1.3 million indels, which have previously been difficult to impute.

In performing association analysis, the authors took advantage of existing large datasets of T2D association summary statistics for meta-analysis, being careful to only combine non-overlapping samples. They also took advantage of the T2D Knowledge Portal to verify some associations for low-frequency variants that were located in coding regions and had suggestive, but not unambiguously significant, p-values. The significance of the T2D associations of these variants was confirmed by meta-analysis along with the associations seen in two large studies in the T2DKP (GoT2D exome chip analysis, with nearly 80,000 samples, and the 17K exome sequence analysis dataset with 17,000 samples).

The association analysis identified 57 loci associated with T2D risk at the genome-wide significance level or better (p-value ≤ 5x10e-8), seven of which had not previously been associated with T2D. The high quality of the data made it possible to fine-map the variants at each of these loci and construct credible sets. Many of the putative causal variants—including those in previously identified loci—were indels rather than single-nucleotide polymorphisms, underscoring the importance of an imputation procedure that discovers indels.

The T2D-associated loci discovered in this study give some tantalizing hints about genes potentially involved in T2D, and suggest new avenues for detailed wet-lab investigation. We can’t review all of them in this space, but one association is particularly interesting for the generalizable lessons it teaches us about case-control GWAS for T2D.

This association, which the authors validated and replicated using additional datasets, involves the X chromosome variant rs146662075. The risk allele confers a 2-fold elevated risk of developing T2D, in males. The variant appears to affect an enhancer that could regulate expression of AGTR2, a gene known to be involved in modulating insulin sensitivity—making it a very interesting subject for investigation with regard to T2D. More work is needed to figure out whether this is really a male-specific effect, or whether it was only detectable in males because imputation for the X chromosome is more accurate in males, who have only one copy of the chromosome.

The first lesson learned from this association is that the X chromosome harbors important loci, and deserves attention in association studies. While this seems obvious, since the X chromosome comprises 5% of the genome, it has been neglected in most studies to date.

The second lesson is that for an adult-onset disease like T2D, it’s very important to pay attention to the details of case-control classification. If there are young people in the control group, they may actually be future T2D cases, destined to develop the disease later in life. When the authors tried to replicate the initial discovery for this variant in different datasets, the associations were not as significant as expected. But after digging deeper into the experimental cohorts, they found that most of the replication datasets had many subjects younger than 55, which was the average age for T2D onset for these cohorts. Re-running the analysis after excluding controls younger than 55 and also excluding those who appeared to be pre-diabetic, based on an oral glucose tolerance test, brought the replication results into concordance with the discovery results and confirmed the significance of the rs146662075 association.

In keeping with the spirit of open access, the authors provided the summary statistics from this work to the T2DKP even before publication. These results are incorporated into the T2DKP and are visible on Gene and Variant pages as well as searchable via the Variant Finder. The authors have also made the full summary statistics available for public download.

The novel and important findings from this study strongly reaffirm the value of data sharing. Not only are data sharing and re-analysis the right things to do for reasons of fairness, equity, and frugality; they can also spark new insights and move science forward in unexpected ways.

Friday, January 19, 2018

New METSIM dataset adds individual-level GWAS data to the T2DKP

The Finnish population is a valuable genetic resource. Having undergone multiple population bottlenecks, this relatively homogeneous population is enriched in low-frequency and loss-of-function variants. Even better, Finns are generally willing to participate in research studies, and many measures of their health are detailed in comprehensive electronic health records.

To take advantage of these characteristics, the METSIM (Metabolic Syndrome in Men) study (Laakso et al. 2017, J. Lipid Res. 58, 481-493) was initiated in 2005. Over 10,000 Finnish men were examined between 2005 and 2010. All of the subjects were phenotyped extensively, with an emphasis on traits associated with type 2 diabetes (T2D), cardiovascular disease, and insulin resistance, and their genotypes and exome sequences were determined. Subsets of the group have been characterized in more detail, with whole-genome sequencing and detailed analyses of transcripts and gene expression, DNA methylation, gut microbiome composition, and other phenotypes.

Now, you can easily access results from the METSIM cohort in the T2D Knowledge Portal. Variant associations with T2D, fasting glucose levels, and fasting insulin levels are available, both unadjusted or adjusted for body mass index. The individual-level data are also available for interactive analyses using our Genetic Association Interactive Tool (GAIT; see below), which allows you to design and run custom association analyses using custom subsets of the samples, while always protecting patient privacy. The addition of METSIM data brings to nearly 68,000 the number of samples available for analysis in GAIT.

The Foundation for the NIH and the Accelerating Medicines Partnership in Type 2 Diabetes were instrumental in bringing these data, generated by researchers in Finland and the U.S., to the T2DKP. Individual-level genotype data from 1,185 T2D cases and 7,357 controls were deposited into the Data Coordinating Center (AMP T2D DCC), and analysis and quality control were performed by the DCC analysis team. The experiment design and analysis are summarized on our Data page, and detailed reports that fully document the analysis are available for download.

The METSIM GWAS dataset currently has "Early Access Phase 1" status in the T2DKP, which is assigned to new data. This status denotes that although analysis and quality control checks have been performed, the data are not yet considered to be in their final state. During the early access period, users may analyze the data but may not submit the results of these analyses for publication. Find full details about the different phases of data release on our Policies page.

Results from METSIM GWAS may be viewed at these locations in the T2D Knowledge Portal:

• On Gene Pages (e.g., MTNR1B) in the Common variants and High-impact variants tables and in LocusZoom static plots, for the phenotypes T2D, T2D adjusted for BMI, fasting glucose, fasting glucose adjusted for BMI, fasting insulin, and fasting insulin adjusted for BMI;

• On Variant Pages (e.g.rs579060) in the Associations at a glance section, the Association statistics across traits table, and in LocusZoom static plots;

• From the View full genetic association results for a phenotype search on the home page: first select one of the phenotypes listed above, and then on the resulting page, select the METSIM GWAS dataset.

Individual-level METSIM GWAS data may be used for custom interactive analyses using these tools in the T2DKP:

• Using the Variant Finder tool, you may specify multiple criteria and retrieve the set of variants meeting those criteria;

• Using the Genetic Association Interactive Tool (GAIT) on Variant Pages, you may select the METSIM GWAS dataset, choose one of 5 phenotypes for association analysis, choose custom covariates, and filter the sample pool by specifying a range of values for one or more of 8 different phenotypes, then run on-the-fly analysis.

Phenotypes available for association analysis of METSIM GWAS data in GAIT

Covariates available for selection when analyzing METSIM GWAS data in GAIT

Samples may be filtered by setting ranges for one or more of 8 phenotypes for the METSIM GWAS dataset

Wednesday, January 3, 2018

Complete data description now available for T2DKP WES and WGS datasets

A new Data Descriptor publication from Jason Flannick, Christian Fuchsberger, Anubha Mahajan, and colleagues (Scientific Data 4, Article number: 170179 (2017) doi:10.1038/sdata.2017.179), presents absolutely everything there is to know about four large, important datasets that are included in the Type 2 Diabetes Knowledge Portal. These datasets are the product of the GoT2D and T2D-GENES consortia, large international groups that seek to uncover the genetic basis of type 2 diabetes.

The investigators took a variety of approaches to generate the most complete view of the genetic architecture of T2D available to date. They performed whole-exome sequencing on a group of 12,940 individuals of multiple ancestries (6,504 T2D cases and 6,436 controls) and whole-genome sequencing on 2,657 individuals of European descent, and tested the association of variants with T2D. They also used an exome chip to test coding variants in more than 80,000 people, and used imputation to test non-coding variants in an additional 44,000.

In total, the researchers sampled more than 120,000 genomes and identified more than 27 million single nucleotide polymorphisms, indels, and structural variants, testing their association with T2D. The new publication documents the experimental and analytical methods and results in complete detail. Analysis and interpretation of these data were also discussed in a previous publication (Fuchsberger, Flannick, Teslovich, Mahajan, Agarwala, Gaulton et al., 2016).

This comprehensive catalog of T2D associations is available for you to search and explore via the T2D Knowledge Portal. The datasets from this study are named as follows in the T2DKP:

  • GoT2D WGS (whole-genome sequence data)
  • GoT2D WGS + replication (whole-genome sequence data plus imputed genotypes)
  • 13K exome sequence analysis
  • GoT2D exome chip analysis

All of these sets are described in more detail on our Data page, including lists of the cohorts studied and case/control selection criteria for each. Our Variant Finder tool searches all of these sets, and results from these datasets are displayed in various tables and interfaces on the Gene and Variant pages of the T2DKP.

The individual-level data in the 13K exome sequence set are also available for custom analysis via the Genetic Association Interactive Tool (GAIT) on Variant pages and the custom burden test on Gene pages. These tools allow researchers to interact with the individual-level data while protecting patient privacy. They access the 19K exome sequence analysis dataset, which includes the 13K exome sequence data from this study along with 6,000 additional exome sequences from the SIGMA and LuCamp consortia. Both tools allow you to filter samples by multiple criteria (for example, age, BMI, cholesterol levels of the subjects) and to choose covariates before running on-the-fly association analysis. The custom burden test also offers the ability to select the set of variants to consider in the analysis.

Please explore these datasets and, as always, let us know what you think!