Showing posts with label Variant-trait associations. Show all posts
Showing posts with label Variant-trait associations. Show all posts

Monday, October 8, 2018

DIAMANTE GWAS dataset adds close to a million samples along with fine-mapping to the T2DKP

In a groundbreaking paper published today, Anubha Mahajan and colleagues (Mahajan et al., Nature Genetics 2018) report on a meta-analysis of unprecedented size for genetic associations with type 2 diabetes (T2D) along with fine-mapping analyses to identify causal variants that can suggest new therapeutic targets. We are pleased to provide access to the summary results as well as the results of the fine-mapping today in the T2D Knowledge Portal (T2DKP).

Working as part of the DIAGRAM (DIAbetes Genetics Replication And Meta-analysis) and DIAMANTE (DIAbetes Meta-ANalysis of Trans-Ethnic association studies) consortia, the researchers aggregated and meta-analyzed genome-wide association studies for about 900,000 individuals of European ancestry (about 74,000 T2D cases and 824,000 controls). The studies were imputed using the most comprehensive reference panels possible, and in all, the analysis considered about 27 million genotyped or imputed variants.

After performing T2D association analysis (both unadjusted and adjusted for body mass index) 243 loci were seen to be associated with T2D at genome-wide significance or better (p-value for association ≤ 5 x 10-8). Of these, 135 were novel--not detected previously in any T2D association analysis to date.

Within these loci, each of which included multiple significantly associated variants, the researchers performed approximate conditional analysis to determine whether the associations were independent of each other. They found surprising complexity within some loci; for example, the well-known TCF7L2 locus appears to include as many as 8 distinct association signals!

All of the T2D associations from this study may be viewed in the T2DKP. They are represented in two datasets, named "DIAMANTE (European) T2D GWAS" and "UK Biobank T2D GWAS (DIAMANTE-Europeans Sept 2018)."  Manhattan plots showing the distribution of the associations across the genome may be seen by selecting either the "Type 2 diabetes" or "Type 2 diabetes adj BMI" phenotypes from the phenotype selection menu on the T2DKP home page. On Gene pages of the T2DKP, the results may be viewed in tables of variant associations and in the interactive LocusZoom visualization (see below). Results from this study are also displayed on Variant pages of the T2DKP.


LocusZoom plot on the PPARG Gene page


The credible set analysis performed in this study is also incorporated into the T2DKP. On the "Credible sets" tab of Gene pages, you may choose to visualize any of the credible sets available for the region. Epigenomic annotations that overlap the positions of the variants in the credible set are presented in an interactive display that allows you to select particular chromatin states or tissues to view. In the example shown below, one of the credible sets in the TCF7L2 region includes just two variants, and the one with the highest posterior probability overlaps active enhancer regions in adipose and liver tissue--both of which are important for T2D.


Detail of the Credible sets tab of the TCF7L2 Gene page

The multiple causal variants identified in this study support previous investigations on the biological mechanisms behind T2D and suggest new hypotheses that will likely lead to therapeutic insights. After reading the paper and a blog post from the authors, we invite you to explore the results in the T2DKP and to contact us with any suggestions or questions!

Tuesday, May 8, 2018

NIDDK Workshop: Towards a Functional Understanding of the Diabetic Genome 2018

Recently, members of the T2D Knowledge Portal team were fortunate to participate in a fascinating workshop hosted by the NIDDKTowards a Functional Understanding of the Diabetic Genome. Speakers highlighted the diversity of ongoing research projects that aim to translate disease-associated variants into functional insights in type 2 diabetes.

The workshop featured presentations on multiple data types that can provide clues about the mechanisms by which sequence variants affect T2D risk. Many of these offer insights into transcriptional regulation: epigenomic chromatin modifications; tissue-specific RNA levels; eQTLs; transcription factor binding sites; long-range interactions between chromosomes that bring promoters and enhancers into proximity; and regulatory pathways. Others focus on downstream processes such as protein-protein interactions, biochemical pathways, and metabolomics.

It will be crucial to integrate all of these data types with genetic association data in order to get a complete picture of how particular genomic regions influence T2D biology, and at the T2DKP we are working towards incorporating as many of these data types as possible.

Although the presentations in this workshop were diverse, some common themes were evident. One was that although the insulin-secreting beta cells in pancreatic islets are hugely significant to T2D, and most T2D risk variants influence insulin secretion, current research projects are confirming and underscoring the importance of other tissues. Fat, liver, skeletal muscle (which comprises 40% of human body weight), and brain are all intimately involved in the development of T2D.

Another common theme for ongoing T2D research is that things may often be much more complicated than they first appear. A single genomic region associated with T2D risk may harbor multiple independent causal variants, each potentially having different regulatory effects, possibly affecting different tissues, and causing varied phenotypic consequences. Even if these variants alter a protein-coding sequence, they may not act through their effects on that sequence. These genetically complicated regions, such as those elucidated in FTO or TCF7L2, may be more common than we previously thought.

A third overall conclusion from the workshop is that model organism research can accelerate the investigation of candidate genes. The short life cycles of Drosophila and zebrafish, and the versatile genetic tools available for these systems, allow for rapid and systematic interrogation of gene function. Zebrafish glucose and lipid metabolism have much in common with those processes in human cells, and with their transparent bodies, zebrafish literally give us a window into pancreatic development.  In addition to being a well-developed model system, the mouse offers much greater genetic diversity than human, with about 40 million SNPs in the mouse genome as compared to about 10 million in the human genome.

At the T2DKP, efforts to integrate many of these data types are in progress, and integration of others is being planned. We continue to work towards making the T2DKP a comprehensive resource for the T2D research community, to help accelerate the translation of variant associations into knowledge about disease mechanisms and identification of potential drug targets.



Many of the presentations at the workshop featured web resources of potential interest to T2D researchers, listed below. The T2DKP is connected with the first, the Diabetes Epigenome Atlas. We are interested providing better connections between the T2DKP and other relevant resources. If you would be particularly interested in seeing links from the T2DKP to one of the resources below, or if you know of a resource that would be informative, we would love to hear your suggestions!

  • HaploReg: explore annotations of the noncoding genome at variants on haplotype blocks
  • ExPecto: tissue-specific gene expression effect predictions for human mutations
  • DeepSea: predict the cell type-specific epigenetic state of a sequence and the chromatin effects of sequence variants
  • GeNets: unified web platform for network-based analyses of genetic data
  • DCell: a deep neural network simulating cell structure and function

Friday, April 27, 2018

New T2DKP release adds individual-level data for interactive analysis

With the April release of the Type 2 Diabetes Knowledge Portal, we are increasing the number of datasets and samples available for interactive analysis via the LocusZoom and GAIT tools. These tools now access individual-level data from three additional datasets, all of which were quality controlled and analyzed at the Accelerating Medicines Partnership in Type 2 Diabetes (AMP T2D) Data Coordinating Center (DCC):
  • CAMP GWAS: 3,628 multi-ancestry samples from the MGH Cardiology and Metabolic Patient cohort, generated by a public-private partnership between Pfizer Inc. and Massachusetts General Hospital;
  • METSIM GWAS: 8,791 European ancestry samples from the Metabolic Syndrome in Men study.
These individual-level data are available as "dynamic" datasets, powered by Hail software, in LocusZoom on Gene pages and Variant pages of the T2DKP, for the following phenotypes: 
  • BioMe AMP T2D GWAS: type 2 diabetes, BMI, diastolic blood pressure, fasting glucose, HbA1c, HDL cholesterol, LDL cholesterol, systolic blood pressure
  • CAMP GWAS: type 2 diabetes, BMI, fasting glucose, fasting insulin
  • METSIM GWAS: type 2 diabetes, BMI, diastolic blood pressure, fasting glucose, fasting insulin, HbA1c, HDL cholesterol, LDL cholesterol, systolic blood pressure
To perform interactive analyses on these data in LocusZoom, select one of the available phenotypes in step 1 and then choose a "dynamic" dataset in step 2.


When you click on a variant in the resulting LocusZoom plot, the option to condition on that variant appears in the tooltip:


Clicking on that link starts on-the-fly association analysis for the region while conditioning on that variant, which can reveal whether association signals are independent of each other. You can choose to condition on multiple variants. The variants of your choice are listed in the upper left-hand corner of the plot, and the list may be edited:



Individual-level data from these three datasets are also available for interactive analysis via the Genetic Association Interactive Tool (GAIT) on Variant Pages. After selecting one of the datasets, you will be able to choose a phenotype for association analysis, filter the sample pool by specifying a range of values for one or more phenotypes, choose custom covariates, and then run on-the-fly association analysis for your chosen subset of samples. Find all of the details about how to use this tool in our GAIT guide.

We hope that the increased ability to interact with individual-level data in the T2DKP will be helpful to your research. As always, we are happy to answer any questions about these or other data and tools; please contact us for help.

Monday, April 9, 2018

Those hoofbeats just might come from zebras

Image by Eric Dietrich via Wikimedia Commons
A physician in the 1940s wanted to convey to his students that the most obvious diagnosis is most likely to be the correct one, so he coined a saying that has become famous: “When you hear hoofbeats, think of horses not zebras.” Applying this concept to complex disease genetics, if a risk-associated variant causes a non-synonymous mutation in a coding sequence, the first hypothesis to consider is that it affects disease risk by altering the protein. But although this is often the case, one of the lessons we can learn from a large new study, published today and now available for browsing and searching in the T2D Knowledge Portal, is that we should not forget about zebras.

The new study, from a global coalition of scientists (Mahajan et al., Nature Genetics 2018), is an exome-wide association study that surveyed the T2D associations of variants within the protein-coding regions of the genome. Including more than 81,000 T2D cases, over 370,000 controls, and multiple ancestries, this study has a three-fold larger effective sample size than any previous study. Using p-value < 2.2 x 10-7 as a threshold for significance across the exome, the authors found 69 significantly associated coding variants representing 40 distinct association signals in 38 loci—16 of which had not been previously associated with T2D risk.

To get a better idea of which variants in these loci were causal for T2D risk, the researchers performed fine mapping for 37 of the 40 significant signals. They meta-analyzed T2D associations for over 500,000 individuals of European descent, performed imputation, and then generated 99% credible sets for each signal—that is, sets of variants that are 99% likely to include the causal variant. To calculate the credible sets, they used an “annotation-informed prior” model of causality that took into account the distribution of associations for different variant impact classes and also the overlap of variants with putative enhancer elements.

The 37 association signals for which the authors generated credible sets were all due to coding variants that would cause changes in the sequence of the encoded protein. But surprisingly, the fine mapping analysis found that coding variants were likely to be causal for T2D risk at fewer than half of these loci.

One of these surprising results involves a gene that is well-known to be relevant to T2D: PPARG. Involvement of the PPARG protein in T2D is beyond doubt, since this ligand-inducible transcription factor is the target of thiazolidinedione drugs that are used to treat T2D. A common variant in PPARG, rs1801282, that causes a p.Pro12Ala change in the protein has been assumed to account for the T2D association, but there is little experimental evidence that this change affects PPARG function.

In the credible set generated in this study, the probability that rs1801282 is causal was not found to be particularly high. Included in this credible set along with rs1801282 are 19 non-coding variants. One of these was previously shown to affect a binding site for the transcription factor PRRX1 and to affect expression of PPARG2, a PPARG isoform. This suggests the intriguing possibility that the T2D risk in this locus is caused, partly or wholly, by variants affecting regulation rather than protein sequence.

A similar pattern, with partial causality due to non-coding variants, was seen at an additional 7 loci. And in 13 other loci, even though these loci were discovered via coding variant signals, non-coding variants had the highest probability of causing risk.

According to Professor Mark McCarthy of the University of Oxford, one of the principal investigators of the study, “Our study shows that we should not jump to conclusions when we see that one of our association signals includes a variant around which we can base an attractive mechanistic narrative. The “average” coding variant is more likely to be causal than the “average” noncoding variant, but even at the set of loci where we detect a significant coding variant association, it is as likely as not that the signal is driven instead by one of the non-coding variants nearby. By bringing together genetic and genomic data, we can improve our prospects for finding the causal variants at GWAS loci, but these should be the starting points for empirical studies not a destination in themselves.” Dr. McCarthy has written a commentary on this study; read it here.

So, in investigating complex disease genetics, it is still a good bet that a coding variant affects disease risk via altered protein sequence: at least in some parts of the world, hoofbeats are very often due to horses. But this study reminds us that it is always a good idea to look beyond the obvious hypothesis, and remember the zebras.

This paper includes many other discoveries, and we recommend that you read the paper to get the full story. We are pleased to announce that in addition to publishing the paper, the authors have made their results available to the T2D research community immediately upon publication, in the T2D Knowledge Portal.

The dataset in the T2DKP is named ExTexT2D (ExTended exome array genotyping for T2D) and includes associations for T2D, both unadjusted and adjusted for BMI. A description of the dataset along with a table listing the cohorts of the study subjects can be found on the Data page, and you can browse and search the ExTexT2D exome chip analysis dataset at these locations in the T2DKP:

On Gene pages (see an example) on the Common variants and High-impact variants tabs
On Variant pages (see an example) in the Associations at a glance section and the Association statistics across traits table
Via the Variant Finder search
View a Manhattan plot of associations across the genome by selecting “type 2 diabetes” or “type 2 diabetes adj BMI” in the View full genetic association results for a phenotype menu on the home page.

This dataset offers by far the largest sample size for exploring associations of low-frequency and common coding variants with T2D. The size of the study enabled evaluation of which coding variants mediate GWAS signals and which are simply "proxies" to the true causal variant, as revealed in the credible set analysis. With the addition of this dataset, the T2DKP offers in-depth information on two aspects of exome associations: common and low-frequency variant associations in ExTexT2D, and comprehensive coding variant associations in the 19K exome sequence analysis dataset (soon to include 50,000 exomes).

We are pleased to provide access to these important new results. Please contact us with any questions or comments about these new data or the T2DKP in general!

Tuesday, February 6, 2018

Federation brings three new datasets to the T2DKP

Our mission at the Type 2 Diabetes Knowledge Portal (T2DKP) is to aggregate and analyze genetic association data relevant to T2D, and to make the knowledge that can be gleaned from these data available to researchers around the world. But it isn't possible to aggregate all of the relevant data in one place: privacy regulations at the institutional, regional, and national levels determine how these data are handled, and whether or where they can be transferred.

The T2DKP is supported by the Accelerating Medicines Partnership in Type 2 Diabetes (AMP T2D),  a pre-competitive partnership among the National Institutes of Health, industry, and not-for-profit organizations, managed by the Foundation for the National Institutes of Health. Because AMP T2D seeks to facilitate discovery of new targets for T2D treatment by making as much data as possible available via the T2DKP, it funded the development of a mechanism for establishing interconnected federated nodes of the T2DKP that would enable researchers to interact with all of the data regardless of where they are located.

This goal was realized with the creation, by a team led by Thomas Keane and Dylan Spalding, of a federated node of the T2DKP at the European Bioinformatics Institute (EBI).  Data housed at the EBI node are stored in such a way that their specific privacy requirements are met, but they are made available for remote queries via T2DKP tools and interfaces. Results from such queries are served up alongside results from all of the datasets housed in the AMP T2D Data Coordinating Center (DCC) at the Broad Institute. Researchers may browse and query data from any location without even needing to know where they reside. This federation mechanism represents both an important technical advance in handling and protecting data, and a significant step forward in democratizing and improving access to genetic association results.

The first dataset to be incorporated into the Portal via the EBI federated node was the Oxford BioBank exome chip analysis dataset, which contains association data for glycemic, lipid, and blood pressure traits from over 7,100 subjects in Oxfordshire, U.K. The EBI Federated Node has now added three more datasets:

  • The EXTEND GWAS dataset, generated by Drs. Timothy Frayling and Andrew Wood and their colleagues, is comprised of 7,159 samples (1,395 T2D cases and 5,764 controls) from the Exeter EXTEND Biobank. It includes associations for a wealth of glycemic, anthropometric, cardiovascular, renal, and hepatic phenotypes--including many that are new to the T2DKP.
  • The GoDARTS Affymetrix GWAS dataset, from Dr. Colin Palmer and colleagues, includes summary-level statistics for associations with BMI and blood lipid levels from 3,307 diabetic participants in the Genetics of Diabetes Audit and Research Study in Tayside Scotland. In addition, individual-level data from over 17,000 subjects (including the set from which summary statistics were calculated) are available via the GAIT tool (see below). 
  • The Oxford BioBank Axiom GWAS dataset, from Dr. Fredrik Karpe and colleagues, includes associations for BMI and blood lipid levels from 7,193 participants, all healthy men and women between 30 and 50 years of age. It represents an additional analysis of the same samples contained in the Oxford BioBank exome chip analysis dataset.
These datasets are described in detail on our Data page. Summary results from all three sets are integrated into Gene and Variant pages in the T2DKP, and may also be viewed in the Manhattan plots accessible by searching for a phenotype from the T2DKP home page. The Variant Finder also queries these datasets.

The individual-level data behind all three of these datasets is accessible for custom association analysis in our Genetic Association Interactive Tool (GAIT) on Variant pages. Using this tool, researchers can filter samples to create a custom subset with defined characteristics such as age, gender, BMI, and other measures, and then run on-the-fly association analysis within that sample subset. Now, GAIT queries datasets both at the DCC and at the Federated node, using the same methodology for each, in a way that is transparent to users of the tool. The new Federated datasets bring the total number of individual-level samples available for custom analysis in the T2DKP to 67,768.

Monday, January 22, 2018

GWAS data re-analysis yields novel results about T2D risk



"Waste not, want not." The old proverb is about frugality, but a study published today gives it a whole new dimension. Lead author Sílvia Bonàs, directed by Josep Mercader and David Torrents and collaborating with many colleagues at the Barcelona Supercomputing Center, the Broad Institute, and other institutions (Bonàs-Guarch et al. (2018), Nature Communications 9), decided to investigate variants associated with type 2 diabetes (T2D) by re-analyzing existing GWAS data rather than initiating a new study.

This was a frugal strategy, conserving both time and resources. But the benefits of this approach went way beyond frugality. By aggregating multiple datasets and using unified, current methods for quality control, imputation, and association analysis, the researchers discovered nuggets of significant information that were not apparent in the original analyses of the individual sets. And all of these nuggets are freely available for browsing and searching in the T2D Knowledge Portal (T2DKP).

To amass these data, the researchers combined all of the individual-level T2D case-control GWAS data that were available from the European Genome-Phenome Archive (EGA) and the database of Genotypes and Phenotypes (dbGaP). After harmonization and quality control, data from 70,127 subjects (12,931 cases and 57,196 controls) remained, inspiring them to name the project "70KforT2D".

In the time since the original studies had been performed, better and more comprehensive reference panels for imputation had been generated by the 1000 Genomes and UK10K projects. By using both of these panels for imputation, the researchers were able to substantially increase the number of variants that could be imputed. They ended up with more than 15 million variants, including more than 5 million rare variants and over 1.3 million indels, which have previously been difficult to impute.

In performing association analysis, the authors took advantage of existing large datasets of T2D association summary statistics for meta-analysis, being careful to only combine non-overlapping samples. They also took advantage of the T2D Knowledge Portal to verify some associations for low-frequency variants that were located in coding regions and had suggestive, but not unambiguously significant, p-values. The significance of the T2D associations of these variants was confirmed by meta-analysis along with the associations seen in two large studies in the T2DKP (GoT2D exome chip analysis, with nearly 80,000 samples, and the 17K exome sequence analysis dataset with 17,000 samples).

The association analysis identified 57 loci associated with T2D risk at the genome-wide significance level or better (p-value ≤ 5x10e-8), seven of which had not previously been associated with T2D. The high quality of the data made it possible to fine-map the variants at each of these loci and construct credible sets. Many of the putative causal variants—including those in previously identified loci—were indels rather than single-nucleotide polymorphisms, underscoring the importance of an imputation procedure that discovers indels.

The T2D-associated loci discovered in this study give some tantalizing hints about genes potentially involved in T2D, and suggest new avenues for detailed wet-lab investigation. We can’t review all of them in this space, but one association is particularly interesting for the generalizable lessons it teaches us about case-control GWAS for T2D.

This association, which the authors validated and replicated using additional datasets, involves the X chromosome variant rs146662075. The risk allele confers a 2-fold elevated risk of developing T2D, in males. The variant appears to affect an enhancer that could regulate expression of AGTR2, a gene known to be involved in modulating insulin sensitivity—making it a very interesting subject for investigation with regard to T2D. More work is needed to figure out whether this is really a male-specific effect, or whether it was only detectable in males because imputation for the X chromosome is more accurate in males, who have only one copy of the chromosome.

The first lesson learned from this association is that the X chromosome harbors important loci, and deserves attention in association studies. While this seems obvious, since the X chromosome comprises 5% of the genome, it has been neglected in most studies to date.

The second lesson is that for an adult-onset disease like T2D, it’s very important to pay attention to the details of case-control classification. If there are young people in the control group, they may actually be future T2D cases, destined to develop the disease later in life. When the authors tried to replicate the initial discovery for this variant in different datasets, the associations were not as significant as expected. But after digging deeper into the experimental cohorts, they found that most of the replication datasets had many subjects younger than 55, which was the average age for T2D onset for these cohorts. Re-running the analysis after excluding controls younger than 55 and also excluding those who appeared to be pre-diabetic, based on an oral glucose tolerance test, brought the replication results into concordance with the discovery results and confirmed the significance of the rs146662075 association.

In keeping with the spirit of open access, the authors provided the summary statistics from this work to the T2DKP even before publication. These results are incorporated into the T2DKP and are visible on Gene and Variant pages as well as searchable via the Variant Finder. The authors have also made the full summary statistics available for public download.

The novel and important findings from this study strongly reaffirm the value of data sharing. Not only are data sharing and re-analysis the right things to do for reasons of fairness, equity, and frugality; they can also spark new insights and move science forward in unexpected ways.

Friday, January 19, 2018

New METSIM dataset adds individual-level GWAS data to the T2DKP

The Finnish population is a valuable genetic resource. Having undergone multiple population bottlenecks, this relatively homogeneous population is enriched in low-frequency and loss-of-function variants. Even better, Finns are generally willing to participate in research studies, and many measures of their health are detailed in comprehensive electronic health records.

To take advantage of these characteristics, the METSIM (Metabolic Syndrome in Men) study (Laakso et al. 2017, J. Lipid Res. 58, 481-493) was initiated in 2005. Over 10,000 Finnish men were examined between 2005 and 2010. All of the subjects were phenotyped extensively, with an emphasis on traits associated with type 2 diabetes (T2D), cardiovascular disease, and insulin resistance, and their genotypes and exome sequences were determined. Subsets of the group have been characterized in more detail, with whole-genome sequencing and detailed analyses of transcripts and gene expression, DNA methylation, gut microbiome composition, and other phenotypes.

Now, you can easily access results from the METSIM cohort in the T2D Knowledge Portal. Variant associations with T2D, fasting glucose levels, and fasting insulin levels are available, both unadjusted or adjusted for body mass index. The individual-level data are also available for interactive analyses using our Genetic Association Interactive Tool (GAIT; see below), which allows you to design and run custom association analyses using custom subsets of the samples, while always protecting patient privacy. The addition of METSIM data brings to nearly 68,000 the number of samples available for analysis in GAIT.

The Foundation for the NIH and the Accelerating Medicines Partnership in Type 2 Diabetes were instrumental in bringing these data, generated by researchers in Finland and the U.S., to the T2DKP. Individual-level genotype data from 1,185 T2D cases and 7,357 controls were deposited into the Data Coordinating Center (AMP T2D DCC), and analysis and quality control were performed by the DCC analysis team. The experiment design and analysis are summarized on our Data page, and detailed reports that fully document the analysis are available for download.

The METSIM GWAS dataset currently has "Early Access Phase 1" status in the T2DKP, which is assigned to new data. This status denotes that although analysis and quality control checks have been performed, the data are not yet considered to be in their final state. During the early access period, users may analyze the data but may not submit the results of these analyses for publication. Find full details about the different phases of data release on our Policies page.

Results from METSIM GWAS may be viewed at these locations in the T2D Knowledge Portal:

• On Gene Pages (e.g., MTNR1B) in the Common variants and High-impact variants tables and in LocusZoom static plots, for the phenotypes T2D, T2D adjusted for BMI, fasting glucose, fasting glucose adjusted for BMI, fasting insulin, and fasting insulin adjusted for BMI;

• On Variant Pages (e.g.rs579060) in the Associations at a glance section, the Association statistics across traits table, and in LocusZoom static plots;

• From the View full genetic association results for a phenotype search on the home page: first select one of the phenotypes listed above, and then on the resulting page, select the METSIM GWAS dataset.

Individual-level METSIM GWAS data may be used for custom interactive analyses using these tools in the T2DKP:

• Using the Variant Finder tool, you may specify multiple criteria and retrieve the set of variants meeting those criteria;

• Using the Genetic Association Interactive Tool (GAIT) on Variant Pages, you may select the METSIM GWAS dataset, choose one of 5 phenotypes for association analysis, choose custom covariates, and filter the sample pool by specifying a range of values for one or more of 8 different phenotypes, then run on-the-fly analysis.

Phenotypes available for association analysis of METSIM GWAS data in GAIT


Covariates available for selection when analyzing METSIM GWAS data in GAIT


Samples may be filtered by setting ranges for one or more of 8 phenotypes for the METSIM GWAS dataset


Wednesday, October 25, 2017

New phenotypes and physical activity stratification available in the T2DKP

We’ve recently updated one dataset and added another in the Type 2 Diabetes Knowledge Portal. Associations with multiple new phenotypes are now available for the BioMe AMP T2D GWAS dataset, and the new dataset "GIANT GWAS - stratified by physical activity" adds associations with anthropometric traits for cohorts stratified by gender and physical activity levels.

The BioMe AMP T2D GWAS dataset was first added to the T2DKP in early 2017, initially with three phenotypes (T2D, fasting glucose levels, and HbA1c levels). Deposition and analysis of these data was funded by the Accelerating Medicines Partnership in Type 2 Diabetes (AMP T2D), a collaboration between multiple stakeholders that aims to catalyze the clinical translation of genetic discoveries by producing and aggregating data, developing and implementing novel analytical methods and tools, and building infrastructure for data storage and presentation. This dataset was the first to be entirely produced within the AMP T2D project, including the deposition, analysis, quality control, and presentation of the data.

The data were generated at the Charles Bronfman Institute for Personalized Medicine BioMe BioBank, a biorepository located at the Mount Sinai Medical Center (MSMC) in the upper Manhattan area of New York City. MSMC serves a diverse population of over 800,000 outpatients each year. Importantly, since many BioMe participants are African American or Hispanic Latino, this dataset adds significant ethnic diversity to the Portal’s genetic association data.

The data were subjected to quality control and association analysis by the Analysis Team at the AMP Data Coordinating Center (DCC) at the Broad Institute. In this second phase of analysis, associations with seven traits were calculated: systolic and diastolic blood pressure; HDL and LDL cholesterol levels; creatinine levels and eGFR-creat; and BMI. A detailed analysis report for these associations may be downloaded from the BioMe AMP T2D GWAS section of our Data page.

The new GIANT dataset was generated by the GIANT (Genetic Investigation of Anthropometric Traits) consortium via a meta-analysis of genetic associations for BMI, waist-hip ratio, and waist circumference from more than 200,000 adults. Samples are stratified by sex, ancestry, and physical activity level (active or inactive). This work was published in a recent paper by Graff et al.

Data from both the BioMe and GIANT studies are available at these locations in the Portal:
  • On Gene pages (see an example) in the Common variants and High-impact variants tables and in LocusZoom static plots
  • On Variant pages  (see an example) in the Associations at a glance section and in the Association statistics across traits table, and in LocusZoom static plots 
  • Via the Variant Finder tool
  • "Manhattan plots" of associations across the genome may be seen by selecting one of the phenotypes analyzed in these datasets in the View full genetic association results for a phenotype scroll box on the Portal home page
  • Additionally, the BioMe data are available for sample filtering and custom association analysis via the Genetic Association Interactive Tool (GAIT) on Variant pages.

Please check out the new data and contact us with any questions, comments, or suggestions.

Wednesday, August 30, 2017

Bringing the power of epigenomics to the T2DKP

Until recently, all of the results displayed in the Type 2 Diabetes Knowledge Portal (T2DKP) were based on genetic association data: the significance with which variants, or SNPs, occur in people’s genomes in conjunction with a disease or trait.

This information is hugely important for pinpointing regions of the genome that contribute to disease risk. It is now relatively straightforward to identify these regions, but it is still a large challenge to discover the mechanisms by which they act—especially for variants that are outside of coding sequences, without an obvious effect on the sequence of a particular protein. These non-coding variants, the most commonly seen in genetic association studies, are likely to affect tissue-specific gene regulation that could potentially be important to the disease process.

How can we overcome this challenge to find clues about the effects of these non-coding variants? Epigenomic data to the rescue!

Dr. Kyle Gaulton of the University of California at San Diego researches the transcriptional regulatory networks involved in type 2 diabetes by using epigenomic data in concert with genetic association data. He explains, "Regulatory elements control gene production and function, and are often highly specialized across cell and tissues and located far away from the genes they regulate. Molecular epigenomic hallmarks of gene regulation such as histone and DNA modifications, nucleosome depletion, chromatin conformation and DNA-protein interactions can pinpoint the precise genomic locations of regulatory elements. High-resolution epigenome maps of regulatory elements in pancreatic islets, liver, muscle, adipose and many other human tissues can then enable annotation of non-coding genetic variants and their potential gene regulatory functions. These maps are thus an invaluable component of determining how type 2 diabetes associated non-coding variants influence disease pathogenesis."

A recent paper from Dr. Gaulton and colleagues (Gaulton, KJ, et al. (2015) Nat Genet. 47:1415) illustrates the power of integrating these two data types. By combining information on transcription factor binding sites and tissue-specific chromatin states with genetic fine-mapping of T2D-associated loci, the authors elicidated the molecular mechanisms behind the effects of some T2D-associated variants, uncovering the role of the FOXA2 transcription factor in glucose homeostasis in T2D-relevant tissues.

Now, the T2DKP facilitates this type of analysis by presenting both genetic association and epigenomic data on Gene and Variant pages. We described the display of epigenomic data on Variant pages in a recent blog post. On Gene pages, epigenomic data are integrated into the LocusZoom display.

Locations of variants associated with T2D and chromatin states in pancreatic islets, across the SLC30A8 gene (partial view)


Below the plot of variant associations, chromatin states are displayed by default for the major T2D-relevant tissues. Using the pull-down menu at the top of the plot, you can choose from a diverse set to display other tissues and cell types. All of the details on how to use this interactive plot are included in our Gene Page guide.

This is only the first step for epigenomic data in the T2DKP. In the future, we plan to include additional types of epigenomic data that indicate chromatin accessibility and conformation. We will also add functionality; for example, for any given variant, you will be able to search for the tissues in which enhancer regions overlap the location of that variant.

As we actively develop this aspect of the T2DKP, we welcome your suggestions!

Sunday, February 5, 2017

Introductory guide to genetic association analysis now available

P-values. Odds scores and betas. GWAS. Linkage disequilibrium. What does it all mean?

Human geneticists are, of course, intimately familiar with these concepts. But for people who are not human geneticists, just getting past the terminology can be frustrating. So we’ve written a basic primer and reference guide that can help users of the T2D Knowledge Portal understand the information presented in our interfaces and tools.

Our Introduction to genetic association analysis guide is available from our Resources page. Or download it here (PDF).

This guide provides a basic introduction to the rationale behind applying human genetic association studies to complex diseases like T2D, explains some of the parameters of genetic associations such as p-values and odds ratios, and describes the different types of experiment used to determine genetic associations.

Many thanks to Andrew Morris, University of Oxford, for his thoughtful review and helpful comments on this guide.

We would be happy to hear your suggestions for improvements and additions!

Monday, January 23, 2017

Insulin Sensitivity Index data added to the Portal

The loss of sensitivity to insulin, often termed insulin resistance, is characteristic of type 2 diabetes. Since this sensitivity is difficult to measure directly, researchers have developed an index that reflects it: the modified Stumvoll Insulin Sensitivity Index (ISI). The index is derived by a formula that combines fasting insulin levels with glucose and insulin levels measured two hours after a glucose load.

Now, the results of a study of genetic associations of variants with ISI are available in the T2D Knowledge Portal. These results are from a recent paper in Diabetes by co-first authors Geoffrey Walford, Stefan Gustafsson, Denis Rybin, and fellow members of the Meta-Analyses of Glucose and Insulin-related traits Consortium (MAGIC). (For an overview of the results, see our blog post about the paper.)

In this study, ISI was calculated for 16,753 non-diabetic individuals, and associations of their variants with ISI values were analyzed. The associations were adjusted in one of three ways: for age and sex; for age, sex, and body mass index (BMI); or according to a model that analyzed the combined influence of the genotype effect adjusted for BMI and the interaction effect between the genotype and BMI on ISI. More details about this data set and others from MAGIC may be found on our Data page.

ISI associations are a subset of the MAGIC GWAS data set. They may be viewed in the Portal by selecting one of these phenotypes:
  • ISI adjusted for age-sex
  • ISI adjusted for age-sex-BMI
  • ISI adjusted for genotype-BMI interaction
Associations with these phenotypes can be found in these locations on Portal pages:
  • On Gene Pages (see an example) in the Variants & Associations table
  • On Variant Pages (see an example) in the Associations at a glance section and in the Association statistics across traits table
  • Via the Variant Finder tool, for the phenotypes listed above
  • A "Manhattan plot" of associations across the genome may be seen by selecting one of the phenotypes listed above in the View full genetic association results for a phenotype scroll box on the Portal home page.

Tuesday, October 25, 2016

Design your own association analysis with our Genetic Association Interactive Tool (GAIT)

Genetic association analysis—identifying polymorphisms in the human genome that are correlated with altered risk of disease—is a powerful method for discovering disease mechanisms. These polymorphisms can indicate what goes wrong at the cellular level in the disease process, knowledge that is critically important for developing better diagnostics and therapies.

The Type 2 Diabetes Knowledge Portal offers a wealth of pre-calculated information on genetic associations between variants and type 2 diabetes (T2D) or other related traits. These results are computed using broadly defined groups of samples: either an entire sample set from a project, or ancestry-specific cohorts. This approach, while it generates very valuable results, masks effects that could only be detected in even more narrowly defined groups: for example, individuals within a certain range of age, body mass index, or cholesterol level. 

Until now, analysis of such fine-grained subsets of individual-level data has only been possible for expert geneticists with access to protected data. But our new Genetic Association Interactive Tool (GAIT) offers everyone an unprecedented amount of access to individual-level data along with an easy-to-use interface for analyzing genetic associations using custom subsets of samples and variants.

Two versions of GAIT are available in the Portal. One, on Variant pages (see an example) computes association statistics for the single variant featured on that page. The other, accessible on Gene pages (see an example) powers an interactive burden test that considers the collection of variants in or near a gene, or a selected subset of those variants. 

Where to find GAIT on Gene pages (left) and Variant pages (right)


The GAIT interface offers incredible flexibility for designing custom analyses. In the interactive burden test, you can filter variants by their predicted effects, or pick and choose individual variants to include. When creating sample sets for either single-variant association analysis or a gene burden test, you can specify a gender, set ranges for the values for multiple phenotypes, and choose principal components or phenotypes to use as covariates. And all these parameters may be set differently for different ethnic groups.

The GAIT interface displays phenotype values within the sample set and allows you to filter samples by multiple criteria


Once you set parameters of your choice, GAIT computes associations on the fly, based on individual-level data. To protect patient confidentiality, GAIT will not display results from sample sets consisting of fewer than 100 individuals.

To help you get familiar with this versatile tool, we’ve created a User Guide (download PDF) that summarizes all the details of the interface. Please give GAIT a try and let us know what you think!



Wednesday, August 10, 2016

Insulin sensitivity comes into focus

Many different things can be seen in any landscape, depending on your focal point.
Image by Nicooo76 via Pixabay.
When photographing a landscape, different photographers choose different perspectives. Some capture a wide-angle view, while others focus on particular details.

It’s no different for researchers who use genome-wide association studies (GWAS) to investigate the genetic landscape of type 2 diabetes (T2D). A common perspective is to study the wide range of variants that are significantly associated with the presence of T2D in patients. But it can also be very informative to concentrate on individual traits related to the physiology of T2D. In a new paper in Diabetes, co-first authors Geoffrey Walford, Stefan Gustafsson, Denis Rybin, and fellow members of the Meta-Analyses of Glucose and Insulin-related traits Consortium (MAGIC) took this focused perspective to discover associations of genetic variants with insulin sensitivity.

Along with reduced insulin levels, the loss of insulin sensitivity (often termed insulin resistance) is a major hallmark of T2D. When muscle, liver, and fat cells become less able to respond to insulin, blood glucose levels rise. Since this can contribute to development of T2D and exacerbate its symptoms, knowing which genetic variants are associated with sensitivity to insulin could be informative for understanding pathways that contribute to T2D risk.

But insulin sensitivity is difficult to measure. Earlier GWAS have used simple estimates of insulin sensitivity, such as fasting levels of insulin, and have discovered a handful of genetic variants that influence insulin sensitivity. The “gold standard” test, the euglycemic clamp, involves giving patients continuous infusions of insulin and glucose and monitoring their blood glucose every few minutes. It’s expensive and time-consuming—not a test that is practical to perform on the tens of thousands of subjects that are commonly used in GWAS.

The authors wondered whether they could instead use an index that combines several measurements, each relatively easy to make. It’s an index with a long name: the modified Stumvoll Insulin Sensitivity Index (ISI). Developed by Stumvoll and colleagues in 2001, this index can be derived in a variety of ways. The authors chose the ISI requiring just three measurements: fasting insulin levels; glucose levels two hours after a glucose load; and insulin levels two hours after a glucose load. This ISI is as good as or better than other estimates of insulin sensitivity and correlates well with the euglycemic clamp.

So the researchers looked for variants associated with the Stumvoll ISI in nearly 17,000 participants in the discovery phase of the work. They added another 13,300 in the replication phase, adding up to about 30,000 in the combined meta-analysis. Since obesity, measured by body mass index (BMI), can affect insulin sensitivity, the authors added BMI to some of their statistical models.

First, the authors found associations between the ISI and other variants already known to affect simple measures of insulin sensitivity. This provided reassurance that the ISI was properly detecting genetic influences on insulin sensitivity. After discovery, replication, and meta-analysis, two novel genetic variants were associated with ISI at genome-wide significance (P-value < 5.0 ×10-8) in a model that tested the effect of the variant, age, sex, and the interaction between the variant and BMI: variant rs12454712, near the gene BCL2, and variant rs10506418, near the gene FAM19A2.

How might these variants affect insulin sensitivity? There’s a lot more work to be done before that question can be answered. Additional studies will need to clarify whether these variants, which are near BCL2 and FAM19A2, affect these or other genes, and then how these variants actually cause changes in insulin sensitivity. 

There are some clues already in the published literature. The variant rs12454712 near BCL2 has previously been found to be associated with T2D, supporting the hypothesis that this region of the genome contributes to T2D risk through reducing insulin sensitivity. And the gene itself (BCL2) has already been implicated in glycemic metabolism: inhibiting bcl2 improves glucose tolerance in a mouse model, while a drug that inhibits the protein product of the gene (BCL2) increases blood glucose levels in certain chronic lymphocytic leukemia patients. So there’s even more reason to suspect that the rs12454712 variant might affect insulin sensitivity via BCL2.

There is as yet no evidence linking the protein FAM19A2 function to glucose metabolism, so the jury is out on whether the variant rs10506418 affects FAM19A2 or some other nearby gene. 

By focusing on a detail of the T2D-related genetic landscape, this study has teased out two variants that may give us clues about the physiology of insulin sensitivity and the development of T2D. And that’s a valuable addition to our overall picture of T2D genetics!

Friday, June 10, 2016

Come meet the Portal team at ADA, booth #1762!

Today’s news comes to you from the Big Easy—New Orleans, LA, where the 76th Scientific Sessions of the American Diabetes Association are in full swing this weekend. Members of the Knowledge Portal team have traveled here to talk to researchers about how the Portal can become even more useful in helping to generate hypotheses that spark insights into the mechanism of T2D and the development of new therapies. Starting at 10am on Saturday June 11, we’ll be at booth #1762 in the exhibit hall, ready to hear your suggestions and give you an individual tutorial on the Portal’s tools and features. There just might be a gift waiting for you, too!

We’ve been working hard and we have an incredible number of new features to show off at #2016ADA. We’ll be featuring them individually in this space in the coming weeks, with in-depth explanation of each. To list some of the highlights:

  • a collaborative project between software engineers at the University of Michigan and the Broad Institute has come to fruition with the integration of LocusZoom into the Portal. This interactive visualization looks, superficially, like a Manhattan plot—but it’s so much more. It shows the significance of variant associations with any of several phenotypes and also displays linkage disequilibrium among nearby variants, and you can choose to do conditional analysis based on any variant.
  • engineers at the Broad Institute have developed a completely new tool, called Genetic Association Interactive Tool (GAIT), that offers a multitude of options allowing you to compute custom association statistics for a variant. You can specify the phenotype to test for association, stratify samples by ancestry, choose a subset of samples to analyze based on specific phenotypic criteria, and control for specific covariates. 
  • we’ve also redesigned and augmented many of the displays of pre-computed information that are available in the Portal
  • finally, we’ve added a lot of new, informative content: a Data page with a complete description of each data set in the Portal, more background about the AMP-T2D project that supports the Portal, and more help text to guide you as you use the Portal’s interfaces



Come to the booth and let us give you a tour of these new features—or, if you're not at ADA, take a look and let us know what you think. And take a look at this great press release from NIH about the project!

Wednesday, May 18, 2016

Expanding the landscape of human genetic variation data in the Type 2 Diabetes Knowledge Portal

With the addition of four new sequence data sets to our database, the number of variants and associations accessible via the Portal pages and tools has increased by millions.

Two of the new data sets are from projects that have obtained sequence data from a wide range of individuals. The ExAC data set, comprising exome sequences collected and harmonized by the Exome Aggregation Consortium, includes sequence data from 60,706 unrelated people of multiple ancestries. The 1000 Genomes data set, from the International Genome Sample Resource project (IGSR), is composed of whole-genome sequences from 2,504 individuals in four different ethnic groups. 


The allele frequencies of variants in the different ethnic groups surveyed in the 1000 Genomes data set can be seen in the “How common is…?” section on the Variant pages (view an example). And both the ExAC and 1000 Genomes data sets can be queried using the Variant Finder tool. You can select them via a new tab on the interface, “Additional search options”, where you can choose these data sets and also add more criteria to your search. 

The Data set pull-down menu on the "Additional Search Options" tab of the Variant Finder lets you specify 1000 Genomes or ExAC data.

Available selections in the Data set pull-down menu.


The other two new data sets in the Portal were both generated by the GoT2D consortium. A whole-genome sequence data set (GoT2D WGS) adds data from 2,657 individuals, including the associations of noncoding variants that were not present in the previous whole-exome sequence data set from the GoT2D project. This new data set brings T2D association data across 30 million variants to the Portal. The GoT2D WGS + replication data set adds imputation to that set, bringing the sample size to over 47,000 and including most low-frequency and common variants.  

The new GoT2D data can be seen in multiple sections of the Portal’s Gene and Variant pages, and may also be accessed by selecting these data sets in the Variant Finder.

In addition to these major new additions, today’s release of data also includes some bug fixes and data harmonization.

Get out there and explore the new data landscape in the Portal, and let us know what you think!

Monday, May 9, 2016

Better summaries of variant information convey the most important information at a glance

We’ve made significant improvements to the information we display on the Variant pages of the T2D Knowledge Portal. The summary at the top of each Variant page (view an example) now shows the reference nucleotide and the variant nucleotide at that position. Transcripts covering the variant are listed, along with several important details for each transcript: the change caused by the variant in the encoded protein sequence (if applicable); the Sequence Ontology term describing the consequence of the variation (for example, “missense variant”); and the expected effect of the variant on protein function, as predicted by the PolyPhen and Sift algorithms.


Summary section of the Variant page

Just below the summary on the Variant page, we’ve also improved the graphic showing the association of the variant with T2D and related traits. We’ve re-named this section “associations at a glance” because it immediately shows the most important information about these associations. 


At-a-glance section of the Variant page. Click the image to view a larger version.


The boxes in this graphic represent the associations of this variant with T2D (at the top) and with other traits (below, in an expandable section). Under the hood, the software is now pulling up information more quickly so that the display is more responsive. We’ve also made it more pleasant to look at, tidying up the shape of the boxes and the alignment of the information they contain.

But beyond the style improvements, we’ve added a lot of substance. Where available, each association now includes the odds ratio (for dichotomous traits) or the effect size (for continuous traits) and the direction of effect. Positive effects are shown in blue, and negative effects in purple. 

We’ve also added the sample size, in black text in the bottom left corner of the box, for each data set. This indicates the total number of individuals involved in the study. And if available, the frequency and count of the variant in the data set are shown in red and blue text at the bottom middle and bottom right corner of the box, respectively. The count indicates the number of haplotypes in the set that contain the variant, while the frequency indicates the occurrence of the variant allele in the sampled population.

This additional information can help you evaluate the significance of associations. The sample size and variant count determine the power of the data set to establish the association. The higher the power, the more accurate the estimate of the variant’s effect.

Finally, when a variant is associated with other traits in addition to T2D, those traits in the same category are labeled with the same color. For example, in the display above, proinsulin levels, fasting glucose, HOMA-B, and two-hour glucose—all glycemic phenotypes—are labeled in orange, while triglycerides, LDL cholesterol, and cholesterol—lipid phenotypes—are labeled in red. This lets you see easily when a variant is linked to multiple traits that could reflect a common process or pathway, possibly offering a clue to the mechanism by which it affects physiology.

So this improved graphic now gives you an idea, literally at a single glance, of how strongly a variant is associated with T2D, how significant that association is, and whether it is also associated with other traits. 

We made these improvements in response to suggestions from scientists who use the T2D Knowledge Portal. We hope to hear your feedback too!

Thursday, April 28, 2016

Variant Finder results may be saved, shared, and bookmarked

You may have noticed that our Variant Finder tool has a cleaner look and clearer instructions. But did you know that you can also save your search parameters, to re-create your search later or share it with a colleague?

First, construct your search. Here’s an example:

Click the image to view a larger version

After you click “Submit search request” you’ll be taken to the results page:

Click the image to view a larger version


And here’s the URL of the results page for this example search:


It isn’t pretty, but it encodes the search. You can bookmark it, save it, or email it and you’ll get back the same result next time you enter it in a browser.

There’s one small caveat here. On the results page, you can modify the results table by clicking on the + signs in the table header to see options for adding more data to the table. But if you do this, those changes will not be encoded in the URL (we plan to enable this in the future); only the original search is encoded.

Let us know how you like this feature and what other features might be useful to you. And check out our mini-tutorial on the Variant Finder to see full instructions on how to use this tool.