Showing posts with label T2D Knowledge Portal. Show all posts
Showing posts with label T2D Knowledge Portal. Show all posts

Monday, October 8, 2018

DIAMANTE GWAS dataset adds close to a million samples along with fine-mapping to the T2DKP

In a groundbreaking paper published today, Anubha Mahajan and colleagues (Mahajan et al., Nature Genetics 2018) report on a meta-analysis of unprecedented size for genetic associations with type 2 diabetes (T2D) along with fine-mapping analyses to identify causal variants that can suggest new therapeutic targets. We are pleased to provide access to the summary results as well as the results of the fine-mapping today in the T2D Knowledge Portal (T2DKP).

Working as part of the DIAGRAM (DIAbetes Genetics Replication And Meta-analysis) and DIAMANTE (DIAbetes Meta-ANalysis of Trans-Ethnic association studies) consortia, the researchers aggregated and meta-analyzed genome-wide association studies for about 900,000 individuals of European ancestry (about 74,000 T2D cases and 824,000 controls). The studies were imputed using the most comprehensive reference panels possible, and in all, the analysis considered about 27 million genotyped or imputed variants.

After performing T2D association analysis (both unadjusted and adjusted for body mass index) 243 loci were seen to be associated with T2D at genome-wide significance or better (p-value for association ≤ 5 x 10-8). Of these, 135 were novel--not detected previously in any T2D association analysis to date.

Within these loci, each of which included multiple significantly associated variants, the researchers performed approximate conditional analysis to determine whether the associations were independent of each other. They found surprising complexity within some loci; for example, the well-known TCF7L2 locus appears to include as many as 8 distinct association signals!

All of the T2D associations from this study may be viewed in the T2DKP. They are represented in two datasets, named "DIAMANTE (European) T2D GWAS" and "UK Biobank T2D GWAS (DIAMANTE-Europeans Sept 2018)."  Manhattan plots showing the distribution of the associations across the genome may be seen by selecting either the "Type 2 diabetes" or "Type 2 diabetes adj BMI" phenotypes from the phenotype selection menu on the T2DKP home page. On Gene pages of the T2DKP, the results may be viewed in tables of variant associations and in the interactive LocusZoom visualization (see below). Results from this study are also displayed on Variant pages of the T2DKP.


LocusZoom plot on the PPARG Gene page


The credible set analysis performed in this study is also incorporated into the T2DKP. On the "Credible sets" tab of Gene pages, you may choose to visualize any of the credible sets available for the region. Epigenomic annotations that overlap the positions of the variants in the credible set are presented in an interactive display that allows you to select particular chromatin states or tissues to view. In the example shown below, one of the credible sets in the TCF7L2 region includes just two variants, and the one with the highest posterior probability overlaps active enhancer regions in adipose and liver tissue--both of which are important for T2D.


Detail of the Credible sets tab of the TCF7L2 Gene page

The multiple causal variants identified in this study support previous investigations on the biological mechanisms behind T2D and suggest new hypotheses that will likely lead to therapeutic insights. After reading the paper and a blog post from the authors, we invite you to explore the results in the T2DKP and to contact us with any suggestions or questions!

Friday, May 11, 2018

T2DKP Spring Newsletter

The latest issue of our quarterly newsletter is now available. Download it here and get the latest!

Wednesday, May 2, 2018

Join the Knowledge Portal Network team!

At the Knowledge Portal Network (currently consisting of the Type 2 Diabetes, Cerebrovascular Disease, and Cardiovascular Disease Knowledge Portals), we are looking for energetic, talented people to help us produce web portals that aggregate and serve genetic association results to the world in order to spark insights into complex diseases. There are positions open for a software engineer to help in developing and producing these web portals, and for a technical release manager to manage and coordinate tasks during production and maintenance of the portals.

The positions are located at the Broad Institute in Cambridge, MA, a dynamic and exciting work environment where cutting-edge science is applied to critical biomedical problems.

Find more details and apply for the software engineer or technical release manager positions at the Broad Careers site.

Tuesday, April 17, 2018

Developing a model for collaborative science: a mid-term perspective on the AMP T2D Partnership

In 2011, Dr. Francis Collins, Director of the National Institutes of Health (NIH), met with leaders in biomedical research to discuss a frustrating problem. Continual improvements in molecular biological and genomic techniques were generating an avalanche of data relevant to complex diseases, yet the translation of these data into insights about disease mechanisms and drug targets was unacceptably slow. It was clear that an entirely new paradigm for collaborative research would be needed to speed up the extraction of knowledge from data.

The result of these discussions was the creation of the Accelerating Medicines Partnership (AMP), one branch of which focuses on type 2 diabetes (T2D)—a life-threatening disease that affects hundreds of millions of people worldwide, whose incidence is growing, and whose progression cannot yet be effectively stopped or reversed. AMP T2D, a five-year project, includes the National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK); the pharmaceutical companies Janssen Pharmaceuticals, Eli Lilly and Company, Merck, Pfizer, and Sanofi; the University of Michigan; the University of Oxford; the Broad Institute; and other researchers around the globe. The Foundation for the National Institutes of Health (FNIH) also provides funding and coordination for the project.

Drawing on the strengths of both academia and industry, this public-private partnership brings together all stakeholders in a pre-competitive space to share data and combine resources, with the goal of validating new drug targets faster. Now in Spring 2018, roughly mid-way through the funding period, it is evident that this collaboration has resulted in remarkable progress on both scientific and collaborative fronts.

Genetic association data: the foundation of AMP T2D

Genetic association studies interrogate the genomes of individuals at millions of specific genomic positions to discover sequence variants that are correlated with the incidence of disease. From the outset, AMP T2D aimed to support the generation of unprecedented amounts of new genome-wide association study (GWAS), exome sequencing, and whole-genome sequencing data within the project as well as their aggregation with all relevant publicly available data. Originally, 5 sites were funded by the NIDDK to generate new data and deposit them into the AMP T2D Data Coordinating Center (DCC) at the Broad Institute. As the project evolved, another site was funded by the NIDDK and 8 more sites were funded by the FNIH. Additionally, an Opportunity Pool of funds from the NIDDK was created, allowing the AMP T2D Steering Committee to award smaller grants for complementary research projects in a flexible, science-driven manner.  Currently 10 Opportunity Pool projects are in progress, and more awards will be given in the future.

Not only has the number of genetic association studies increased since the inception of AMP T2D, but also the number of samples surveyed in each has grown dramatically, from typically under 100,000 to approaching 1 million today. The increased statistical power conferred by these large sample sizes has led to a huge increase in the number of loci found to be significantly associated with T2D, from about 70 at the start of the project to nearly 430.

Improvements in genomic technologies in the past few years have allowed AMP T2D collaborators to generate increasing amounts of sequencing data, which make it possible to comprehensively interrogate all alleles and to uncover rare variation. At the project’s start, T2D associations with exome sequences (covering the protein-coding regions of the genome) were available for about 13,000 samples, and no whole-genome sequencing studies had been published. Now, more than 2,600 whole genomes are available, and analysis of a set of 50,000 exomes—the largest disease-specific aggregation of exome sequencing data to date—is nearly complete. Importantly, many of the associations that have been newly discovered in sequencing studies involve relatively rare variants that affect protein-coding regions. It is often more straightforward to develop hypotheses about the impact of such variants than it is for variants outside of coding regions.

As the AMP T2D partnership has grown in prominence in the diabetes field, the DCC has been approached by investigators outside the project who want to contribute their data in order to aggregate and display them in the context of AMP T2D data. In early 2017, researchers in the 70kforT2D project, which found novel T2D associations by re-analyzing existing GWAS data, offered their results for integration into the DCC and display in the Type 2 Diabetes Knowledge Portal (T2DKP; see below) before publication.

70kforT2D GWAS was first pre-publication dataset to be added to the T2DKP from outside the AMP T2D partnership, and it was particularly appropriate that these scientists, whose results illustrate the value of data sharing, themselves chose to freely share their results. Incorporation of datasets into the AMP T2D DCC and T2DKP offers investigators the chance to take advantage of the expertise of the AMP DCC analysis team, apply cutting-edge analysis tools to their data, and display their results broadly to the T2D research community in the context of multiple datasets. The AMP T2D DCC is open to incorporating T2D-relevant datasets from all investigators (find details on contributing data here).

In addition to the datasets generated by AMP T2D partners and other T2D researchers, which focus on associations with T2D, glycemic measures, and T2D complications, the AMP T2D DCC also collects publicly available genetic association datasets for traits relevant to T2D, such as anthropometric measures, blood pressure and lipid levels, and heart and kidney disease.

Orthogonal data types to help identify and prioritize causal variants and genes

Finding genetic variants that are associated with T2D risk is critically important to understanding the genetics of T2D, but it is only a first step. The most significantly associated variant in a genomic region may not be the causal variant that is responsible for altered T2D risk. Researchers perform fine mapping to analyze genetic associations in specific regions of the genome and generate credible sets—that is, sets of variants that are predicted to include the causal variant. Mid-way through the AMP T2D funding period, emphasis among the data-generating partners is beginning to shift from simply generating association data to performing fine mapping and credible set analysis.

But even after predicting which sequence variations are responsible for altered risk, finding clues about how they affect risk requires integration with additional data types. Information about the functional importance of the genomic region where a variant is located—its relevance to gene expression, protein function, networks and pathways, metabolite levels, and more, all determined on a tissue-specific basis—can help prioritize genes and pathways for in-depth experimental investigation. These kinds of research were built into AMP T2D from the beginning, and as the importance of these data types became even clearer, several Opportunity Pool awards were given to projects focusing on complementary data types that shed light on the significance of genetic associations.

Several of these projects focus on generating tissue-specific epigenomic data: histone modifications, DNA methylation, chromatin conformation, transcription factor binding, 3-dimensional chromosome structure, and other data types. Epigenomic data can provide important clues about the mechanisms by which sequence variation affects T2D risk, particularly for variants that lie outside of protein-coding regions. For example, if a risk-associated variant is seen to disrupt a transcription factor binding site, this would support the hypothesis that the transcription factor and its target genes are relevant to T2D.

To make these data accessible to researchers, one Opportunity Pool award supports the creation of the Diabetes Epigenome Atlas, which collects and displays epigenomic datasets relevant to T2D. In the near future, these data will be fully integrated with genetic association data in the Type 2 Diabetes Knowledge Portal (see below).

Other Opportunity Pool projects are concerned with processes downstream of gene expression. Discovering interactions between proteins implicated in T2D risk, for example, could help to uncover all of the players in pathways important for the development of T2D, increasing the number of potential drug targets. Determining the effects of variants on the levels of key metabolites can illuminate the metabolic pathways that change during the development of T2D. 

In addition to generating all of these orthogonal data types, AMP T2D partners are developing algorithms and using machine learning to classify and prioritize variants on the basis of the functional annotations that accompany them. Finally, other Opportunity Pool projects will use model organisms to test and validate drug targets that are suggested by these analyses.

Tools and methods to speed analysis and interpretation

At the inception of AMP T2D it was also clear that the development of new methods and tools would need to accompany the generation of data, and support for these activities was built into the program. One major technical effort has addressed an obstacle to global data aggregation: because of institutional and national privacy regulations, some datasets may not leave their site of origin to be aggregated with other datasets at the AMP T2D DCC. A group at the European Bioinformatics Institute has built a technical replicate of the DCC and knowledgebase, such that data stored there are equally as accessible for browsing, searching, and interactive analysis as are the data stored at the AMP T2D DCC at the Broad Institute. This federation mechanism allows global data accessibility even when data aggregation is not permitted.

Other efforts supported by AMP T2D are aimed at improving the speed and efficiency at which data can be taken in and analyzed. In one project, a data intake system is being developed that will streamline the process for both data submitters and for the DCC team, and will be applicable to data submission both at the Broad DCC and at other federated sites. Another project has created a software pipeline, LoamStream, that will largely automate quality control and association analysis of incoming data. Currently, LoamStream is in use for quality control of genotype data, and this has already greatly reduced the time required to process new datasets. Future work will extend the pipeline to association analysis and will also allow it to take in sequence data as well as genotype data.

A genetic association of a variant with T2D gains credibility if multiple independent studies replicate the association. Thus, it is important for researchers to be able to evaluate the weight of available evidence. But currently this is difficult to assess from the association datasets in the AMP T2D DCC, because many are based on overlapping sets of subjects. AMP T2D partners at the University of Michigan and University of Oxford are working on a method to take these overlaps into account and synthesize associations from multiple datasets into a “bottom-line” significance for association of a variant with T2D, which will aid in prioritizing variants for future work.

Multiple AMP T2D projects for analysis, interpretation, and custom interactive analysis of variant-phenotype associations are ongoing at the Universities of Michigan, Chicago, and Oxford, Vanderbilt University, and the Broad Institute. These projects are aimed at facilitating, in various ways, the path from variant associations to functional knowledge, and all have been or will be integrated into the T2D Knowledge Portal (see below).

Hail software offers a pipeline that speeds up the analysis of huge genomic datasets, while the gnomAD resource aggregates and harmonizes exome and genome sequences to provide a catalog of genetic diversity, in more than 100,000 humans, that aids in interpretation of variant associations with disease. A tool under development in the gnomAD project will display the effects of variants on protein structures as another way to deduce their potential impact.

Other analysis modules include gene-based association methods for using expression data to predict genes that may impact a phenotype (PrediXcan and MetaXcan), and a phenome-wide association study (PheWAS) method for visualization of the associations of a variant across multiple phenotypes, which is a crucial consideration during drug development. 

The interactive visualization tool LocusZoom will integrate many of these methods to display variant associations and credible sets, epigenomic and functional annotations, and phenotype associations across a genomic region as well as offering custom association analysis.

An example LocusZoom plot


AMP T2D Knowledge Portal: democratizing T2D genetic results for researchers world-wide

AMP T2D was founded on the idea that in order to truly accelerate progress, genomic information must be freely accessible to all scientists and presented in a way that is understandable by a broad range of researchers working on T2D biology, not only by human geneticists and bioinformaticians with special computational skills. So the roadmap for the project included not only data generation and analysis, but also the production of a publicly available web resource that would integrate data types, interpret the evidence, and present of all these results. 

While it is under continuous development, mid-way through the initial funding period the T2D Knowledge Portal (T2DKP) is already a well-established resource. Other web resources collect genetic association data, but the T2DKP is unusual in providing harmonized datasets to which a consistent analysis pipeline has been applied. Rather than simply cataloging datasets, it offers distilled and synthesized results along with their interpretation, to guide more detailed exploration of the evidence. And, unlike any other extant resource, it offers researchers the ability to perform interactive queries on protected individual-level data. 

T2DKP home page

The Gene page of the T2DKP (see an example) illustrates the presentation of immediately understandable summary information along with the opportunity to drill down to the details. An algorithm considers the associations of all variants across a gene, for all phenotypes and in all datasets aggregated at the DCC, and calculates from them a “traffic light” signal for the gene: green to indicate that there is a significant association for at least one phenotype; yellow to indicate suggestive, if not highly significant, associations; and red to indicate that there is no evidence for association for any of the phenotypes considered in the T2DKP. Below this, tables and graphics invite users to explore all variants across the gene, their impacts on the encoded protein, and their associations, as well as their positions relative to epigenomic marks across the region in multiple tissues.

The T2DKP currently offers the ability to run custom, interactive association analyses using two different tools. In the LocusZoom visualization, users may choose one or more variants as covariates before performing association analysis. The Genetic Association Interactive Tool (GAIT) for single variant associations, which also powers the custom burden test for gene-level associations, is even more versatile, presenting the distributions of different characteristics of the sample set (age, sex, BMI, glycemic measures, blood lipid levels, and many more) and allowing users to filter the set by multiple criteria and to choose custom covariates before performing association analysis. Both of these tools allow analytical access to the individual-level data, whether housed at the Broad DCC or at the EBI federated node, in a secure environment so that data privacy is always protected.

Evolution of a collaborative environment

AMP T2D organization


The AMP T2D partnership is a multifaceted project (illustrated above) that embraces several aspects of basic research and combines them with building a product, the T2DKP. In connecting scientists both within and outside of consortia, in academia and in industry, working on genetic associations or functional studies, it is becoming the nexus of the T2D genetics community. Researchers are finding the T2DKP helpful for accessing even their own results and for viewing them in the context of multiple phenotypic associations and other complementary data types. Pharmaceutical partners are finding help via the Target Prioritization project, in which the tools and methods developed within AMP T2D are being used to prioritize a list of genes of mutual interest for further investigation.

Perhaps most importantly, AMP T2D has made researchers—both within and outside of the project—aware of the value of sharing data for representation in the context of all other relevant data. Only by compiling and interpreting all available information will we be able to make the best hypotheses about genes and pathways that are possible drug targets and prioritize them for in-depth functional investigation.

AMP T2D and beyond

In the remainder of the initial AMP T2D funding period, we expect continued progress in each of the areas discussed above. The data intake and analysis pipelines will be improved, and new data will be incorporated at an increasing pace—including data from the UK Biobank, which has generated association results for 500,000 genotyped subjects and more than 2,500 traits. Associations will be added for many more phenotypes related to T2D, including diabetic complications and longitudinal phenotype data that connect the development of various traits to the timeline of incident T2D.  Much more T2D-relevant epigenomic data will be available for query as well as for browsing, via dynamic connection with the Diabetes Epigenome Atlas. And entirely new data types (for example, metabolomic and proteomic data) arising from Opportunity Pool projects will be added to the T2DKP.

Ongoing work on tools and methods will result in the addition of many more interactive modules to the T2DKP. Researchers will be able to view PheWAS data; prune lists of variants by their linkage disequilibrium relationships; calculate credible sets and genetic risk scores with custom parameters; perform more versatile interactive burden tests; prioritize genes by pre-calculated association scores; overlay the positions of coding variants on protein structures to help assess their impact; and perform enrichment analysis on sets of loci to suggest pathways implicated in disease processes.

The Knowledge Portal platform developed for AMP T2D has already proved extensible to other complex diseases: in 2017, both the Cerebrovascular Disease and Cardiovascular Disease Knowledge Portals were launched. In the future, connections within the ecosystem formed by the T2D, Cerebrovascular, and Cardiovascular Portals will be improved, so that researchers can easily assess the impact of a variant or involvement of a gene for all of these related diseases. If funding and collaboration considerations allow, perhaps one day these Portals will merge into a single cardiometabolic disease genetics Knowledge Portal to accelerate the development of new therapeutics in this broader area.

Finally, the ultimate goal of this funding period is that by its end, the data generation, analysis, and interpretation will have facilitated the validation of multiple promising drug targets for further investigation. Given the rate of progress on multiple fronts, this seems a realistic goal. We hope that this unique collaborative environment will continue to accelerate T2D genetic research and will become a paradigm for other research communities.

Monday, April 9, 2018

Those hoofbeats just might come from zebras

Image by Eric Dietrich via Wikimedia Commons
A physician in the 1940s wanted to convey to his students that the most obvious diagnosis is most likely to be the correct one, so he coined a saying that has become famous: “When you hear hoofbeats, think of horses not zebras.” Applying this concept to complex disease genetics, if a risk-associated variant causes a non-synonymous mutation in a coding sequence, the first hypothesis to consider is that it affects disease risk by altering the protein. But although this is often the case, one of the lessons we can learn from a large new study, published today and now available for browsing and searching in the T2D Knowledge Portal, is that we should not forget about zebras.

The new study, from a global coalition of scientists (Mahajan et al., Nature Genetics 2018), is an exome-wide association study that surveyed the T2D associations of variants within the protein-coding regions of the genome. Including more than 81,000 T2D cases, over 370,000 controls, and multiple ancestries, this study has a three-fold larger effective sample size than any previous study. Using p-value < 2.2 x 10-7 as a threshold for significance across the exome, the authors found 69 significantly associated coding variants representing 40 distinct association signals in 38 loci—16 of which had not been previously associated with T2D risk.

To get a better idea of which variants in these loci were causal for T2D risk, the researchers performed fine mapping for 37 of the 40 significant signals. They meta-analyzed T2D associations for over 500,000 individuals of European descent, performed imputation, and then generated 99% credible sets for each signal—that is, sets of variants that are 99% likely to include the causal variant. To calculate the credible sets, they used an “annotation-informed prior” model of causality that took into account the distribution of associations for different variant impact classes and also the overlap of variants with putative enhancer elements.

The 37 association signals for which the authors generated credible sets were all due to coding variants that would cause changes in the sequence of the encoded protein. But surprisingly, the fine mapping analysis found that coding variants were likely to be causal for T2D risk at fewer than half of these loci.

One of these surprising results involves a gene that is well-known to be relevant to T2D: PPARG. Involvement of the PPARG protein in T2D is beyond doubt, since this ligand-inducible transcription factor is the target of thiazolidinedione drugs that are used to treat T2D. A common variant in PPARG, rs1801282, that causes a p.Pro12Ala change in the protein has been assumed to account for the T2D association, but there is little experimental evidence that this change affects PPARG function.

In the credible set generated in this study, the probability that rs1801282 is causal was not found to be particularly high. Included in this credible set along with rs1801282 are 19 non-coding variants. One of these was previously shown to affect a binding site for the transcription factor PRRX1 and to affect expression of PPARG2, a PPARG isoform. This suggests the intriguing possibility that the T2D risk in this locus is caused, partly or wholly, by variants affecting regulation rather than protein sequence.

A similar pattern, with partial causality due to non-coding variants, was seen at an additional 7 loci. And in 13 other loci, even though these loci were discovered via coding variant signals, non-coding variants had the highest probability of causing risk.

According to Professor Mark McCarthy of the University of Oxford, one of the principal investigators of the study, “Our study shows that we should not jump to conclusions when we see that one of our association signals includes a variant around which we can base an attractive mechanistic narrative. The “average” coding variant is more likely to be causal than the “average” noncoding variant, but even at the set of loci where we detect a significant coding variant association, it is as likely as not that the signal is driven instead by one of the non-coding variants nearby. By bringing together genetic and genomic data, we can improve our prospects for finding the causal variants at GWAS loci, but these should be the starting points for empirical studies not a destination in themselves.” Dr. McCarthy has written a commentary on this study; read it here.

So, in investigating complex disease genetics, it is still a good bet that a coding variant affects disease risk via altered protein sequence: at least in some parts of the world, hoofbeats are very often due to horses. But this study reminds us that it is always a good idea to look beyond the obvious hypothesis, and remember the zebras.

This paper includes many other discoveries, and we recommend that you read the paper to get the full story. We are pleased to announce that in addition to publishing the paper, the authors have made their results available to the T2D research community immediately upon publication, in the T2D Knowledge Portal.

The dataset in the T2DKP is named ExTexT2D (ExTended exome array genotyping for T2D) and includes associations for T2D, both unadjusted and adjusted for BMI. A description of the dataset along with a table listing the cohorts of the study subjects can be found on the Data page, and you can browse and search the ExTexT2D exome chip analysis dataset at these locations in the T2DKP:

On Gene pages (see an example) on the Common variants and High-impact variants tabs
On Variant pages (see an example) in the Associations at a glance section and the Association statistics across traits table
Via the Variant Finder search
View a Manhattan plot of associations across the genome by selecting “type 2 diabetes” or “type 2 diabetes adj BMI” in the View full genetic association results for a phenotype menu on the home page.

This dataset offers by far the largest sample size for exploring associations of low-frequency and common coding variants with T2D. The size of the study enabled evaluation of which coding variants mediate GWAS signals and which are simply "proxies" to the true causal variant, as revealed in the credible set analysis. With the addition of this dataset, the T2DKP offers in-depth information on two aspects of exome associations: common and low-frequency variant associations in ExTexT2D, and comprehensive coding variant associations in the 19K exome sequence analysis dataset (soon to include 50,000 exomes).

We are pleased to provide access to these important new results. Please contact us with any questions or comments about these new data or the T2DKP in general!

Tuesday, March 6, 2018

T2DKP Winter Newsletter

The latest issue of our quarterly newsletter is now available. Download it here to find out what we've been up to!

Friday, January 19, 2018

New METSIM dataset adds individual-level GWAS data to the T2DKP

The Finnish population is a valuable genetic resource. Having undergone multiple population bottlenecks, this relatively homogeneous population is enriched in low-frequency and loss-of-function variants. Even better, Finns are generally willing to participate in research studies, and many measures of their health are detailed in comprehensive electronic health records.

To take advantage of these characteristics, the METSIM (Metabolic Syndrome in Men) study (Laakso et al. 2017, J. Lipid Res. 58, 481-493) was initiated in 2005. Over 10,000 Finnish men were examined between 2005 and 2010. All of the subjects were phenotyped extensively, with an emphasis on traits associated with type 2 diabetes (T2D), cardiovascular disease, and insulin resistance, and their genotypes and exome sequences were determined. Subsets of the group have been characterized in more detail, with whole-genome sequencing and detailed analyses of transcripts and gene expression, DNA methylation, gut microbiome composition, and other phenotypes.

Now, you can easily access results from the METSIM cohort in the T2D Knowledge Portal. Variant associations with T2D, fasting glucose levels, and fasting insulin levels are available, both unadjusted or adjusted for body mass index. The individual-level data are also available for interactive analyses using our Genetic Association Interactive Tool (GAIT; see below), which allows you to design and run custom association analyses using custom subsets of the samples, while always protecting patient privacy. The addition of METSIM data brings to nearly 68,000 the number of samples available for analysis in GAIT.

The Foundation for the NIH and the Accelerating Medicines Partnership in Type 2 Diabetes were instrumental in bringing these data, generated by researchers in Finland and the U.S., to the T2DKP. Individual-level genotype data from 1,185 T2D cases and 7,357 controls were deposited into the Data Coordinating Center (AMP T2D DCC), and analysis and quality control were performed by the DCC analysis team. The experiment design and analysis are summarized on our Data page, and detailed reports that fully document the analysis are available for download.

The METSIM GWAS dataset currently has "Early Access Phase 1" status in the T2DKP, which is assigned to new data. This status denotes that although analysis and quality control checks have been performed, the data are not yet considered to be in their final state. During the early access period, users may analyze the data but may not submit the results of these analyses for publication. Find full details about the different phases of data release on our Policies page.

Results from METSIM GWAS may be viewed at these locations in the T2D Knowledge Portal:

• On Gene Pages (e.g., MTNR1B) in the Common variants and High-impact variants tables and in LocusZoom static plots, for the phenotypes T2D, T2D adjusted for BMI, fasting glucose, fasting glucose adjusted for BMI, fasting insulin, and fasting insulin adjusted for BMI;

• On Variant Pages (e.g.rs579060) in the Associations at a glance section, the Association statistics across traits table, and in LocusZoom static plots;

• From the View full genetic association results for a phenotype search on the home page: first select one of the phenotypes listed above, and then on the resulting page, select the METSIM GWAS dataset.

Individual-level METSIM GWAS data may be used for custom interactive analyses using these tools in the T2DKP:

• Using the Variant Finder tool, you may specify multiple criteria and retrieve the set of variants meeting those criteria;

• Using the Genetic Association Interactive Tool (GAIT) on Variant Pages, you may select the METSIM GWAS dataset, choose one of 5 phenotypes for association analysis, choose custom covariates, and filter the sample pool by specifying a range of values for one or more of 8 different phenotypes, then run on-the-fly analysis.

Phenotypes available for association analysis of METSIM GWAS data in GAIT


Covariates available for selection when analyzing METSIM GWAS data in GAIT


Samples may be filtered by setting ranges for one or more of 8 phenotypes for the METSIM GWAS dataset


Wednesday, August 30, 2017

Bringing the power of epigenomics to the T2DKP

Until recently, all of the results displayed in the Type 2 Diabetes Knowledge Portal (T2DKP) were based on genetic association data: the significance with which variants, or SNPs, occur in people’s genomes in conjunction with a disease or trait.

This information is hugely important for pinpointing regions of the genome that contribute to disease risk. It is now relatively straightforward to identify these regions, but it is still a large challenge to discover the mechanisms by which they act—especially for variants that are outside of coding sequences, without an obvious effect on the sequence of a particular protein. These non-coding variants, the most commonly seen in genetic association studies, are likely to affect tissue-specific gene regulation that could potentially be important to the disease process.

How can we overcome this challenge to find clues about the effects of these non-coding variants? Epigenomic data to the rescue!

Dr. Kyle Gaulton of the University of California at San Diego researches the transcriptional regulatory networks involved in type 2 diabetes by using epigenomic data in concert with genetic association data. He explains, "Regulatory elements control gene production and function, and are often highly specialized across cell and tissues and located far away from the genes they regulate. Molecular epigenomic hallmarks of gene regulation such as histone and DNA modifications, nucleosome depletion, chromatin conformation and DNA-protein interactions can pinpoint the precise genomic locations of regulatory elements. High-resolution epigenome maps of regulatory elements in pancreatic islets, liver, muscle, adipose and many other human tissues can then enable annotation of non-coding genetic variants and their potential gene regulatory functions. These maps are thus an invaluable component of determining how type 2 diabetes associated non-coding variants influence disease pathogenesis."

A recent paper from Dr. Gaulton and colleagues (Gaulton, KJ, et al. (2015) Nat Genet. 47:1415) illustrates the power of integrating these two data types. By combining information on transcription factor binding sites and tissue-specific chromatin states with genetic fine-mapping of T2D-associated loci, the authors elicidated the molecular mechanisms behind the effects of some T2D-associated variants, uncovering the role of the FOXA2 transcription factor in glucose homeostasis in T2D-relevant tissues.

Now, the T2DKP facilitates this type of analysis by presenting both genetic association and epigenomic data on Gene and Variant pages. We described the display of epigenomic data on Variant pages in a recent blog post. On Gene pages, epigenomic data are integrated into the LocusZoom display.

Locations of variants associated with T2D and chromatin states in pancreatic islets, across the SLC30A8 gene (partial view)


Below the plot of variant associations, chromatin states are displayed by default for the major T2D-relevant tissues. Using the pull-down menu at the top of the plot, you can choose from a diverse set to display other tissues and cell types. All of the details on how to use this interactive plot are included in our Gene Page guide.

This is only the first step for epigenomic data in the T2DKP. In the future, we plan to include additional types of epigenomic data that indicate chromatin accessibility and conformation. We will also add functionality; for example, for any given variant, you will be able to search for the tissues in which enhancer regions overlap the location of that variant.

As we actively develop this aspect of the T2DKP, we welcome your suggestions!

Thursday, August 17, 2017

New member of the Knowledge Portal family: the Cerebrovascular Disease Knowledge Portal

We are pleased to announce today’s launch of the Cerebrovascular Disease Knowledge Portal (CDKP), an open-access resource for the genetics of stroke built on the framework and infrastructure of the Type 2 Diabetes Knowledge Portal (T2DKP). The CDKP aggregates data from five large genome-wide association studies for stroke, and presents them along with GWAS results for T2D and other cardiometabolic and biometric phenotypes as well as epigenomic data from a wide range of tissues.


CDKP home page


Users of the T2DKP will find familiar interfaces in the CDKP, which offers the same three major entry points for exploring the data: Gene and Variant pages; the Variant Finder tool; and pages displaying genome-wide association results for each phenotype. Summary-level data are presented for browsing and searching, and researchers may perform custom analyses using individual-level data via the Genetic Association Interactive Tool (GAIT) or LocusZoom. Using the CDKP, T2D researchers can now check their favorite variants and genes for associations with a range of phenotypes related to cerebrovascular health and disease.

The CDKP has two additional layers of functionality relative to the T2DKP, addressing particular needs of the stroke research community. A Downloads page provides files of summary statistics from recent stroke genetic association studies. And a home page link leads to the Precision Medicine Platform (PMP) of the American Heart Association Institute for Precision Cardiovascular Medicine, where authorized researchers may work with selected sets of individual-level data in a secure computing environment.

The Knowledge Portal (KP) framework was designed and built by a team at the Broad Institute as part of the Accelerating Medicines Partnership in Type 2 Diabetes (AMP T2D), a public-private partnership that seeks to speed up the translation of genetic association data for T2D and related traits into actionable knowledge for new T2D treatments. In a collaboration with the International Stroke Genetics Consortium, funded by the National Institute of Neurological Disorders and Stroke, the Broad team incorporated stroke genetic data into the KP framework and customized the user interface for the stroke genetics research community.

This first application of the scalable, open-source KP software platform to a complex disease area other than T2D has paved the way for future collaborations to extend this platform to additional diseases, facilitating the translation of genetic data into actionable knowledge to improve human health.

Tuesday, July 11, 2017

Inaugural issue of the T2DKP quarterly newsletter

We've started a quarterly newsletter to keep you informed of the latest developments at the T2D Knowledge Portal. Download our Summer 2017 issue!

Monday, June 19, 2017

T2D Portal team at ADA 2017

Members of the T2D Knowledge Portal team returned last week from the 77th Scientific Sessions of the American Diabetes Association, inspired and invigorated by many great discussions with T2D researchers, educators, and clinicians.

In preparation for the conference, we set ourselves goals to add several new features to the Portal:

  • incorporate several new datasets and implement a new interactive Data page for exploring all datasets (see details)
  • add epigenomic data to shed light on the potential regulatory roles of genomic regions (see details)
  • implement a complete redesign of the Gene page that integrates multiple datasets to summarizes the significance of each gene to T2D and related phenotypes (see details)
  • connect with the new Federated Node of the Portal at EBI to provide seamless access to data housed there alongside data housed at the AMP T2D Data Coordinating Center at the Broad Institute (see details)

On the first day of the conference, Noël Burtt and Jason Flannick presented a mini-symposium focusing on the Portal to several hundred attendees.




This clearly generated a lot of interest, because our exhibit booth was a busy place for the next three days. 


T2D Portal team members at our exhibit booth

Multiple conversations happened at the booth!

We handed out a general guide to the Portal (download), and also presented a moderated poster (download).

At the booth, we especially enjoyed talking with people in the T2D field who are not geneticists but are simply curious about the genetics of T2D and the mission of the Portal. We encourage everyone to explore the Portal and to feel free to ask us any questions, even if they seem elementary. Please contact us any time with questions or feedback!

Monday, June 12, 2017

T2D Knowledge Portal now distills and summarizes genetic information for individual genes

The Type 2 Diabetes (T2D) Knowledge Portal presents genetic data relevant to T2D on two major types of page: Variant pages for individual variants, or SNPs; and Gene pages focusing on individual genes. Visual displays on Variant pages provide an immediate indication of the possible significance of each variant for T2D. But until now, Gene pages have presented large amounts of information from disparate sources without much integration or interpretation to guide the viewer.

Now, that has all changed with our release of the new Gene page. It guides researchers through an organized workflow that can help them take advantage of the aggregated data in the Portal to move from a variant of interest, to a gene of interest, to an assessment of the potential involvement of that gene’s product in T2D.

The central feature of the new Gene page is an at-a-glance display that summarizes the strength of the evidence for associations of the gene with T2D or related traits. An algorithm scans the comprehensive collection of datasets within the Portal to find data on variants in the gene, and the overall conclusion is shown by a “traffic light” icon. A green light indicates that there is strong evidence for association of at least one variant in the gene with at least one phenotype; a yellow light indicates that there is suggestive evidence, and a red light indicates that the data aggregated in the Portal contain no evidence for associations of variants within this gene.

Figure 1. Traffic light display for MTNR1B


Several sections of the page below the traffic light allow the user to drill down to much more information about the variants within the gene, their individual associations, and their collective impact on the disease burden of the gene. An interactive LocusZoom plot allows users to view the linkage disequilibrium relationships and associations from multiple datasets, with a wide variety of phenotypes, for common variants. The plot also displays the location of chromatin states, which can indicate the regulatory role of a region, in multiple tissues.


Figure 2. LocusZoom plot of the credible set of T2D-associated variants in MTNR1B (above) and chromatin state annotations for the region (below).

In the example shown above, the traffic light (Fig. 1) shows that variants in the MTNR1B gene encoding the melatonin receptor have one or more strong phenotypic associations (view the MTNR1B Gene page in the T2D Knowledge Portal). The table of common variants for MTNR1B (not shown) tells us that the most significantly associated variant is rs10830963. And a view of the LocusZoom plot for the credible set of variants associated with T2D (Fig. 2, top) shows that in fact the credible set for this region contains only rs10830963, further supporting its significance. The chromatin state annotations for this region (Fig. 2, bottom) provide evidence for a regulatory effect in pancreatic islets, consistent with a potential role in T2D. This information, easily found in the Portal today, replicates the results of a 2015 genetic analysis that required over 100 authors (Gaulton, KJ, et al. (2015) Nature Genetics 47:1415).

The new Gene page presents a lot of information and we can't cover it all in this space. But don't worry, we've created a guide to the page that explains every feature in detail. It's linked from the top of the page, or you can download it here.

With the inclusion of the new Gene page, the Portal now enables the rapid generation of testable hypotheses, by integrating, interpreting, and presenting information that previously could only be generated by coordinated research across a consortium. This new development brings the T2D Knowledge Portal project one step closer to informing the discovery of new targets and treatments for T2D.

Wednesday, June 7, 2017

New clues about variant effects: epigenomic data now available in the Portal

The T2D Knowledge Portal aggregates a wealth of genetic association data identifying variants that are associated with type 2 diabetes and related traits. These identifications show us that something within these genomic regions contributes to the risk of developing T2D. That’s an important first step, but in order to make use of this information to develop new T2D treatments, we need to figure out exactly what is causing the effect and how it relates to the disease process.

If a variant lies within a gene and changes a protein sequence, it can be relatively straightforward to formulate testable hypotheses about its effects. But most of the variants that are significantly associated with T2D—and with complex diseases in general—lie within noncoding regions of the genome and are likely to affect regulation of genes that could be far removed from the chromosomal position of the variant. It can be difficult to find clues about which genes are affected by these distant, noncoding changes, but now, we present a new type of data in the T2DKP that can help address this challenge.

The pattern of epigenetic modifications within a genomic region can provide important clues about its regulatory role. The distribution of these position-specific and tissue-specific marks—for example, covalent modifications of the histone proteins that package DNA—is characteristic of elements such as enhancers or transcription start sites. The Roadmap Epigenomics Consortium has developed methods for detecting these modifications genome-wide (hence the term “epigenomics”) and integrating their positional data, using the ChromHMM algorithm, to categorize genomic regions into “chromatin states”. The presence of these states in a given genomic region in different tissue types can give hints about whether that region might be involved in regulation of specific genes or pathways.

Now, you can view the tissue-specific chromatin states spanning the position of each variant on Variant pages within the Portal. We have incorporated epigenomic data from a study (Varshney et al., 2017) in which the locations of 13 distinct chromatin states were determined across a diverse set of cell lines and tissues, including pancreatic islets. The new “Epigenomic annotations” section of each Variant page (see an example) presents information about chromatin states in three different ways.

1. An interactive table listing chromatin states in this region, the tissue or cell line in which they were observed, and their genomic coordinates. Filter the table by chromatin state or by tissue to find states of particular interest.



2. A matrix displaying chromatin states by tissue type. This graphic gives a quick indication of chromatin states that are present in this region, across the whole panel of tissues.



3. A graphic showing the positions of chromatin states relative to the position of the variant.


These new features represent only the first phase of incorporating this new data type into the Portal. In the future, we will be adding more of these data along with more versatile interfaces for exploring them. Please check out our new epigenomic annotations and send us your feedback!

Wednesday, May 31, 2017

See you in San Diego!

Members of the T2D Knowledge Portal team are gearing up for the 77th Scientific Sessions of the American Diabetes Association, June 9-13 in San Diego, CA. We'll be releasing exciting new features of the Portal just before the conference, and we have a wide variety of presentations planned for each day.

On the opening day of the conference (Friday, June 9), join us for a mini-symposium that will present a comprehensive guide to the T2D Knowledge Portal and how you can use it to further your type 2 diabetes research. We will be exhibiting at booth #2452 on Saturday, Sunday, and Monday, and each day, genetics experts will be available at the booth to answer questions and discuss both the Portal and the genetics of T2D. On Saturday, members of the Portal team will participate in a moderated poster session, and posters will also be displayed on Monday. And on Sunday morning, our principal investigator, Dr. Jose C. Florez, will give a symposium presentation on "Mining the Genome for Therapeutic Targets."

Find the full details in the schedule below and follow us on Twitter (@T2DKP) for up-to-the-minute news throughout the conference. We're looking forward to meeting you!

Friday, June 9, 2017

Mini-Symposium: A Researcher’s Guide to Exploring Diabetes Genetic Data in the Type 2 Diabetes Knowledge Portal
Chair: Mark McCarthy
11:30am - 12:30pm, Room 28

11:30-11:50am        Noël Burtt: Data, Analysis, and Tools in the Type 2 Diabetes Knowledge Portal
11:50am-12:10pm   Jason Flannick: Demonstration of Questions that Can Be Addressed Using the Portal

12:10-12:30pm       Question and Discussion Period


Saturday, June 10, 2017

  • Exhibiting at booth #2452, 10am - 4pm
  • Moderated Poster Session: Genetic Data, Pathways, and Variants for Type 2 Diabetes and Related Traits. 12:30-1:30pm, Hall B
Poster 1765-P
The Type 2 Diabetes Knowledge Portal: Accelerating Type 2 Diabetes Research through Community Access to Human Genetic Information and Tools
Presenter: Maria C. Costanzo

Poster 1766-P
Key Biological Pathways for Type 2 Diabetes Determined by Genetic Cluster Analysis on Related Traits
Presenter: Miriam S. Udler


Sunday, June 11, 2017

  • Symposium presentation: Mining the Genome for Therapeutic Targets. 
Dr. Jose C. Florez
9:20-9:55am, Ballroom 20D

  • Exhibiting at booth #2452, 10am - 4pm


Monday, June 12, 2017

  • Exhibiting at booth #2452, 10am - 2pm
  • Poster session, 12-1pm, Hall B
Poster 1765-P
The Type 2 Diabetes Knowledge Portal: Accelerating Type 2 Diabetes Research through Community Access to Human Genetic Information and Tools
Presenter: Maria C. Costanzo

Poster 1795-P
Type 2 Diabetes Gene Bioinformatically Identified by Variants Mapping to Amino-Acid Changes in Three-Dimensional Protein Space
Presenter: Marcin von Grotthuss

Wednesday, May 3, 2017

Explore new datasets and phenotypes in the T2D Knowledge Portal

We are releasing multiple new datasets in the Portal and have updated existing sets with associations for new phenotypes. To make it even easier to browse and explore these sets, we've also updated our Data page and added new functionality. Here's an overview of what's new in the Portal today.

17K exome sequence analysis dataset has grown to 19K
The Data Coordinating Center (DCC) of the Accelerating Medicines Partnership in Type 2 Diabetes (AMP T2D) analyzes exome sequence data contributed by AMP T2D consortium members to find variant associations with T2D and related traits. The exome sequencing dataset available in the Portal has until now consisted of exome sequences from about 17,000 individuals. Today, we have added exome sequencing performed on 2,000 Danish subjects by the LuCamp (Lubeck Foundation Centre for Applied Medical Genomics in Personalised Disease Prediction, Prevention and Care) consortium, making a total of nearly 19,000 exomes. This is just a taste of things to come: at the AMP T2D DCC we are currently analyzing additional exome sequences that will bring the total up to 52,000!

New community-contributed datasets: GENESIS GWAS and 70KforT2D GWAS

We are grateful to two groups from the larger T2D research community who have shared data that will make the T2D Knowledge Portal even more valuable to worldwide T2D researchers.

The GENEticS of Insulin Sensitivity (GENESIS) consortium performed GWAS on over 2,700 nondiabetic participants, finding genetic associations with direct measures of insulin sensitivity.

The 70KforT2D project collected, harmonized, and re-analyzed public GWAS data from over 70,000 individuals to find T2D genetic associations.

New public dataset: VATGen GWAS

The VATGen GWAS consortium performed meta-analysis of GWAS data from a mixed-ancestry group of more than 18,000 people to identify genetic associations with the localization of body fat deposition, leading to insights into adipocyte development.

Updated dataset: glucose-stimulated insulin secretion phenotypes in MAGIC GWAS

A study by Prokopenko et al. analyzed genetic associations with insulin secretion. Associations of variants with nine different measures of insulin secretion, among them corrected insulin response (CIR) and disposition index (DI), have now been added to the MAGIC GWAS dataset.

ExAC updated to gnomAD exomes and whole genomes
The Exome Aggregation Consortium (ExAC) has more than doubled in size and has morphed into the Genome Aggregation Database (gnomAD). More than 120,000 exome sequences and 15,000 whole genome sequences are now available, and these data are accessible via several tools and interfaces in the T2D Knowledge Portal.

New Data page: explore datasets using new filters

As our collection of data grows, it becomes more difficult to understand the differences between datasets and to find those of interest. To address this challenge, we've reorganized and streamlined our Data page.


A section of the Data page, expanded to show phenotype selection.

At the top of the Data page, you can choose to filter the dataset table by data type, phenotype category, or both. When you click on a phenotype category, the phenotypes within that category are available for selection. Clicking on the name of any dataset expands a section with details and references for each. 

In the coming days, watch this space for more details about each of these new developments. And as always, please contact us if you have any comments or questions.


Wednesday, March 15, 2017

The Portal’s interactive burden test: now more versatile than ever

Significant associations between genes and T2D or related phenotypes can provide powerful insights into disease mechanisms and possible therapies. The T2D Knowledge Portal includes results from pre-computed analyses of genetic associations for a large, and growing, number of datasets. But what if you want to do a more fine-grained analysis? You might want to test whether the disease burden for a gene differs between groups of people with specific characteristics—for example, lean people with T2D versus obese people without T2D. Or you might want to test the aggregate effect of a specific subset of variants, such as those that are likely to knock out the function of a protein of interest.

Our interactive burden test on Gene pages, powered by the Genetic Association Analysis Tool (GAIT), allows you to do all that and more. The burden test considers a gene as the unit of inquiry, including all the variants it contains in a statistical test of disease association. We described the basics of the burden test and GAIT in a recent blog post. Now, we’ve added some options for selecting variants in the interactive burden test that make this tool even more versatile.

The variant selection step of the burden test on a Gene page is pre-populated with all of the variants present in the selected dataset that are located within the gene and its 100 kb up- and downstream flanking regions. You can create a specific subset of these by checking or un-checking individual variants. The table may be sorted by multiple criteria in order to find variants of interest: chromosomal coordinate; minor allele count; predictions of the effect allele’s impact on the encoded protein; and the protein change or type of mutation caused by the effect allele.


Section of the interactive burden test interface showing the default list of variants for the SLC30A8 gene. Options for customizing the list are located above the variant table.

The table of variants may be filtered so that the test considers only certain categories of variants, with varying predicted impacts on the encoded protein. Previously, the burden test offered filters based on an unpublished method. Now, we have replaced those filters with the set that was used in a recent major publication: The genetic architecture of type 2 diabetes, by Fuchsberger, Flannick, Teslovich, Mahajan, Agarwala, Gaulton, et al.

Variant filters in the interactive burden test

All coding variants--selects variants within the coding sequence, from the dataset that was initially selected for the burden test

Protein-truncating + missense with MAF<1%--selects variants in both of these categories:
  • protein-truncating (predicted to cause a truncated protein to be generated, either by creating a premature stop codon or by causing a frameshift) 
  • cause a missense mutation AND have minor allele frequency (MAF) of less than 1%. The MAF limit eliminates common variants, which would not be expected to have very deleterious effects. 

Protein-truncating + possibly deleterious missense with MAF<1%--selects variants in both of these categories:

Protein-truncating + probably deleterious missense--selects variants in both of these categories:

Protein-truncating only--selects variants predicted to cause a truncated protein to be generated, either by creating a premature stop codon or by causing a frameshift.

Using these filters, you can tailor the list of variants to those with specific impact on the encoded protein. If you would like to customize the list even further by adding variants that were not present in the default list, there is now an option to add single or multiple variants, using dbSNP IDs (e.g., rs112881768) or identifiers in the format “chromosome_coordinate_reference-nucleotide_variant-nucleotide” (e.g., 8_112881768_G_A).

When “single variant” is selected, once you begin typing, variant IDs that match your entry are suggested. When “multiple” is selected, you may type or paste in a list of variant IDs, separated by commas or returns. Note that any added variants are not subject to the filters, which act only on the default list of variants for a gene.

Our GAIT User Guide (download PDF) that summarizes all the details of the interface has been updated with the latest changes. Please check out our new, improved interactive burden test and let us know if you have comments or suggestions.

Sunday, February 5, 2017

Introductory guide to genetic association analysis now available

P-values. Odds scores and betas. GWAS. Linkage disequilibrium. What does it all mean?

Human geneticists are, of course, intimately familiar with these concepts. But for people who are not human geneticists, just getting past the terminology can be frustrating. So we’ve written a basic primer and reference guide that can help users of the T2D Knowledge Portal understand the information presented in our interfaces and tools.

Our Introduction to genetic association analysis guide is available from our Resources page. Or download it here (PDF).

This guide provides a basic introduction to the rationale behind applying human genetic association studies to complex diseases like T2D, explains some of the parameters of genetic associations such as p-values and odds ratios, and describes the different types of experiment used to determine genetic associations.

Many thanks to Andrew Morris, University of Oxford, for his thoughtful review and helpful comments on this guide.

We would be happy to hear your suggestions for improvements and additions!