Wednesday, November 15, 2017

T2DKP Fall Newsletter

The latest issue of our quarterly newsletter is now available. Download it here to find out what we've been up to!

Tuesday, November 14, 2017

Announcing the Cardiovascular Disease Knowledge Portal

We are pleased to announce the launch of the Cardiovascular Disease Knowledge Portal (CVDKP). Our collaboration with Dr. Patrick Ellinor, Dr. Sek Kathiresan, and their colleagues in the Atrial Fibrillation, Global Lipids Genetics, Myocardial Infarction Genetics, and CARDIoGRAMPlusC4D consortia has created a resource that offers world-wide open access to genetic and genomic information about atrial fibrillation, myocardial infarction, and related traits, with the goal of democratizing access to genomic data and accelerating cardiovascular genomics research.

CVDKP home page

The CVDKP is constructed on a software architecture originally developed for the Type 2 Diabetes Knowledge Portal (T2DKP), which is the central product of the Accelerating Medicines Partnership in Type 2 Diabetes (AMP T2D). AMP T2D is a public-private partnership between the National Institutes of Health, the U.S. Food and Drug Administration, biopharmaceutical companies, and non-profit organizations that is managed through the Foundation for the NIH. AMP seeks to harness collective capabilities, scale, and resources toward improving current efforts to develop new therapies for complex, heterogeneous diseases.

The ultimate goal of AMP T2D is to increase the number of new diagnostics and therapies for patients while reducing the time and cost of developing them, by jointly identifying and validating promising biological targets for type 2 diabetes. The T2DKP furthers that goal by aggregating, harmonizing, and displaying genetic association and epigenomic results along with user-friendly analysis tools, allowing research biologists who would not individually be able to amass and manipulate these large datasets to glean insights from the data.

We are working towards these same goals for other complex diseases, by extending the platform and analysis tools constructed for the T2DKP. In partnership with the International Stroke Genetics Consortium, we recently created a Knowledge Portal for cerebrovascular disease (CDKP) based on the same infrastructure. Now, with the advent of the Cardiovascular Disease Knowledge Portal, we have a three-member Knowledge Portal Network for the genetics of cardiometabolic and cerebrovascular disease.

Data in the CVDKP directly relevant to heart disease include genetic associations with atrial fibrillation, electrocardiogram traits, plasma lipid levels, and myocardial infarction. Additional association datasets are available for type 2 diabetes and glycemic traits, anthropometric traits, measures of kidney function, and psychiatric traits. You may browse the complete list of datasets and their descriptions on the CVDKP Data page.

As for the Cerebrovascular Disease Knowledge Portal, in the CVDKP we also continue to work with the American Heart Association Precision Medicine Platform (PMP) to provide an additional avenue for accessing cardiovascular genetic data. Currently, summary statistics from the AFGen GWAS and AFGen exome chip analysis datasets are deposited in the PMP.

We welcome all suggestions, comments, questions, and submission of relevant datasets for the CVDKP. Please contact us at!

Wednesday, October 25, 2017

New phenotypes and physical activity stratification available in the T2DKP

We’ve recently updated one dataset and added another in the Type 2 Diabetes Knowledge Portal. Associations with multiple new phenotypes are now available for the BioMe AMP T2D GWAS dataset, and the new dataset "GIANT GWAS - stratified by physical activity" adds associations with anthropometric traits for cohorts stratified by gender and physical activity levels.

The BioMe AMP T2D GWAS dataset was first added to the T2DKP in early 2017, initially with three phenotypes (T2D, fasting glucose levels, and HbA1c levels). Deposition and analysis of these data was funded by the Accelerating Medicines Partnership in Type 2 Diabetes (AMP T2D), a collaboration between multiple stakeholders that aims to catalyze the clinical translation of genetic discoveries by producing and aggregating data, developing and implementing novel analytical methods and tools, and building infrastructure for data storage and presentation. This dataset was the first to be entirely produced within the AMP T2D project, including the deposition, analysis, quality control, and presentation of the data.

The data were generated at the Charles Bronfman Institute for Personalized Medicine BioMe BioBank, a biorepository located at the Mount Sinai Medical Center (MSMC) in the upper Manhattan area of New York City. MSMC serves a diverse population of over 800,000 outpatients each year. Importantly, since many BioMe participants are African American or Hispanic Latino, this dataset adds significant ethnic diversity to the Portal’s genetic association data.

The data were subjected to quality control and association analysis by the Analysis Team at the AMP Data Coordinating Center (DCC) at the Broad Institute. In this second phase of analysis, associations with seven traits were calculated: systolic and diastolic blood pressure; HDL and LDL cholesterol levels; creatinine levels and eGFR-creat; and BMI. A detailed analysis report for these associations may be downloaded from the BioMe AMP T2D GWAS section of our Data page.

The new GIANT dataset was generated by the GIANT (Genetic Investigation of Anthropometric Traits) consortium via a meta-analysis of genetic associations for BMI, waist-hip ratio, and waist circumference from more than 200,000 adults. Samples are stratified by sex, ancestry, and physical activity level (active or inactive). This work was published in a recent paper by Graff et al.

Data from both the BioMe and GIANT studies are available at these locations in the Portal:
  • On Gene pages (see an example) in the Common variants and High-impact variants tables and in LocusZoom static plots
  • On Variant pages  (see an example) in the Associations at a glance section and in the Association statistics across traits table, and in LocusZoom static plots 
  • Via the Variant Finder tool
  • "Manhattan plots" of associations across the genome may be seen by selecting one of the phenotypes analyzed in these datasets in the View full genetic association results for a phenotype scroll box on the Portal home page
  • Additionally, the BioMe data are available for sample filtering and custom association analysis via the Genetic Association Interactive Tool (GAIT) on Variant pages.

Please check out the new data and contact us with any questions, comments, or suggestions.

Monday, October 16, 2017

Learn about complex disease knowledge portals at ASHG 2017

Members of the Knowledge Portal team will be attending the American Society of Human Genetics meeting this week in Orlando, FL.

We'll be talking about the continuing progress of the Type 2 Diabetes Knowledge Portal, which has grown dramatically since ASHG 2016, with loads of new data and many new features. We'll also present our work towards expanding the T2DKP framework to other complex diseases, with the recent release of a new sibling portal for stroke genetics, the Cerebrovascular Disease Knowledge Portal.

You can catch us nearly every day of the meeting:

Wednesday 10/18

10 AM - 5 PM: Find us in the exhibit hall at booth #863. We’ll be there to answer your questions and give tours and tutorials on the Knowledge Portal Network.

10-10:30 AM: Demonstration of the Type 2 Diabetes Knowledge Portal at our booth, #863.
10:30-11 AM: Demonstration of the Cerebrovascular Disease Knowledge Portal at our booth, #863.

2 PM - 4 PM: Ben Alexander will present poster #1186: The Type 2 Diabetes Knowledge Portal: Clearing a path from genetic associations to disease biology.

Thursday 10/19

10 AM - 5 PM: We will again be in the exhibit hall at booth #863.

10:30 - 11:30 AM: Portal team members will be available at the Broad Institute booth (#1037) for demonstrations and tutorials.

2-2:30 PM: Demonstration of the Type 2 Diabetes Knowledge Portal at our booth, #863.
2:30-3 PM: Demonstration of the Cerebrovascular Disease Knowledge Portal at our booth, #863.

4:15 PM–6:15 PM: Portal team members will be participating in Concurrent Invited Session #49:

Data Sharing, Analysis, and Tools to Catalyze Translation from Genomic to Clinical Knowledge
Room 330C, Level 3, Convention Center
Moderators: Benjamin Neale and Noël Burtt
Serving genetic data and tools to the world - Jason Flannick.
The EGA as a platform for effective data sharing of human genetic and phenotype data -Thomas Keane.
Converting sequence data from over 140,000 people into rare disease diagnoses - Daniel MacArthur.
Assessing the phenome-wide consequences of genetically regulated molecular traits - Hae Kyung Im.

Friday 10/20

10 AM - 2:30 PM: This is our last day in the exhibit hall at booth #863.

We look forward to meeting you at ASHG! If you have questions and cannot meet us any of these times, or if you won’t be at ASHG, our mailbox is always open at

Thursday, October 5, 2017

Portal team presents at Festival of Genomics

The Festival of Genomics was held in Boston on October 3-4, and the Type 2 Diabetes Knowledge Portal was well represented. This meeting brings together multiple stakeholders in genomics and health care, using innovative formats to educate, promote connections, and spark conversations.

On the first day of the meeting, Nöel Burtt presented a talk focusing on the T2DKP: "The Type 2 Diabetes Knowledge Portal: accelerating action from data."

Nöel Burtt speaking at the Festival of Genomics

The next day, T2DKP team members participated in a session with an unusual format, designed to stimulate interaction and conversation: "In the Loop."  Jason Flannick, Senior Group Leader for the Portal project, chaired the session, which was entitled "Complex data, complex solutions: tackling the challenges of genetic data sharing, interpretation, and representation." "Loop leaders," all from the Broad Institute, included:

  • Sean Simmons, speaking on "Technological solutions to privacy concerns around data access and utilization";
  • Moran Cabili, speaking on "Ethical and technical challenges to genetic data access";
  • David Siedzik, speaking on "Analysis tools for genetic data and their scalability"; and
  • Maria Costanzo, speaking on "Representing and democratizing access to genetic association results."

Left to right: Jason Flannick, Sean Simmons, Moran Cabili, David Siedzik, and Maria Costanzo participate in an "In the Loop" session
The session began with an introduction from the chair and speakers, and then the audience broke into four groups for discussion with each of the Loop leaders. Each leader facilitated a discussion among two different groups, and then the entire group convened again to sum up the discussions. The group may not have come up with any breakthrough ideas, but participants definitely emerged with a clearer picture of the many challenges around aggregating and representing genetic data.

The T2DKP team is headed next to the American Society of Human Genetics meeting. Stay tuned for a preview of our participation there in the next blog post!

Monday, September 18, 2017

All for one (population) and one for all

Type 2 diabetes (T2D) is a world-wide health problem, but it hits especially hard in Latin America, where incidence is higher than in many other parts of the world. To investigate the genetic basis for this difference, researchers from the U.S., Mexico, and Spain teamed up to look for genetic coding variants associated with T2D risk that are more common in people of Hispanic descent. In their recent paper (Mercader et al. 2017, Diabetes), the researchers discovered such variants and uncovered the molecular details of how one in particular affects T2D risk. Their results suggest a new avenue for drug development that could benefit diabetics of all ancestries. And surprisingly, although Hispanics have higher T2D risk, this variant actually protects against T2D.

In designing the study, Mercader and colleagues decided to focus on variants located within protein-coding sequences, whose effects can be more direct and more straightforward to test than those of variants outside genes. They used exome chip analysis, which considers only variants in protein-coding regions of the genome, to genotype both diabetics and non-diabetics of Hispanic descent from Mexico and the U.S. Their dataset, SIGMA exome chip analysis, is accessible in the T2D Knowledge Portal and described on our Data page.

To find variants that might differentially affect the Hispanic population, the researchers looked for T2D-associated variants that were common in Hispanics, but rare or low-frequency in people of European ancestry. The most significant variant in this category, rs149483638, is present at a minor allele frequency (MAF) of 17% in people of Hispanic ancestry, but has MAF of only 1%, 0.1%, and 0.02% in East Asian, African, or European ancestries, respectively.

Surprisingly, although enriched in this population that is more vulnerable to T2D, the rs149483638 effect allele is protective against T2D. People who are heterozygous for the effect allele (a T at position 2161530 of chromosome 11 rather than a C) have 22% decreased risk of T2D, while homozygous carriers have 40% decreased risk.

After the initial discovery, the investigators performed more analyses to verify whether rs149483638 was the causal variant in the region, and replicated the T2D association in independent datasets. All the results supported the hypothesis that this particular variant directly reduces T2D risk.

The variant is located in the IGF2 gene, which encodes a peptide similar to insulin that has previously been linked to growth disorders, obesity, and T2D. Alternative splicing generates two different isoforms of IGF2, and the protective allele disrupts a predicted acceptor site for the splicing event that would generate isoform 2. Could the absence of IGF2 isoform 2 be protective against T2D?

Mercader and colleagues performed further experiments to address the questions of whether the rs149483638 effect allele blocks the production of isoform 2 and whether this has an impact on T2D risk. In human cell culture, the protective allele did indeed block splicing at that site.

To see whether this happens in humans, the researchers tested tissue samples for the presence of isoform 2, and found that its expression was lower in people carrying the protective allele. Furthermore, among people who lacked the protective allele, those with T2D showed higher expression of isoform 2 in their visceral fat tissue than did those without T2D. Levels of isoform 2 in non-diabetics were also positively correlated with levels of HbA1c, which is an indicator of elevated blood glucose levels. No such correlations were seen for levels of IGF2 isoform 1.

Taken together, these results support the involvement of isoform 2 in the elevation of T2D risk, suggesting an intriguing possibility: could lowering levels of isoform 2 be an effective way to lower T2D risk?

If lowering isoform 2 levels were to be used as a T2D therapeutic, it would be important to know that this reduction had no adverse effects. Genetic data can shed light on this question as well. The authors looked in the Exome Aggregation Consortium (ExAC) database and in the clinical records of their study subjects, and saw no health effects other than lowered T2D risk in carriers of the protective variant. They also performed a phenome-wide association study (PheWAS) in the in Genetic Epidemiology Research on Aging (GERA) cohort, and saw no association of the T2D-protective allele with any of 18 medical conditions.

Thus it seems likely that loss of IGF2 isoform 2 would not be harmful, setting the stage for research into drugs that could specifically inhibit isoform 2 or block its production as a way to delay or treat the development of T2D.

These fascinating results have opened multiple avenues for future research. What is the specific biological role of IGF2 isoform 2 in T2D? It differs from isoform 1 only in that it carries an extra 56 N-terminal amino acids. Isoform 1 predominates, while isoform 2 is expressed at very low levels—although its highest expression is seen in pancreatic islets, liver, and fat, all tissues that are relevant for T2D. Elucidating the molecular details of this role will increase our understanding of the biological mechanisms in T2D. And from an evolutionary perspective, the question of how this protective variant came to be enriched in this population is an interesting one.

The motto of the Three Musketeers was "All for one and one for all," meaning that the group supports each member and each member supports the group. As this paper illustrates, this theme is also emerging in human genetics. By investigating distinct populations, we can not only learn about those specific populations but also gain knowledge to benefit all humankind.

Wednesday, August 30, 2017

Bringing the power of epigenomics to the T2DKP

Until recently, all of the results displayed in the Type 2 Diabetes Knowledge Portal (T2DKP) were based on genetic association data: the significance with which variants, or SNPs, occur in people’s genomes in conjunction with a disease or trait.

This information is hugely important for pinpointing regions of the genome that contribute to disease risk. It is now relatively straightforward to identify these regions, but it is still a large challenge to discover the mechanisms by which they act—especially for variants that are outside of coding sequences, without an obvious effect on the sequence of a particular protein. These non-coding variants, the most commonly seen in genetic association studies, are likely to affect tissue-specific gene regulation that could potentially be important to the disease process.

How can we overcome this challenge to find clues about the effects of these non-coding variants? Epigenomic data to the rescue!

Dr. Kyle Gaulton of the University of California at San Diego researches the transcriptional regulatory networks involved in type 2 diabetes by using epigenomic data in concert with genetic association data. He explains, "Regulatory elements control gene production and function, and are often highly specialized across cell and tissues and located far away from the genes they regulate. Molecular epigenomic hallmarks of gene regulation such as histone and DNA modifications, nucleosome depletion, chromatin conformation and DNA-protein interactions can pinpoint the precise genomic locations of regulatory elements. High-resolution epigenome maps of regulatory elements in pancreatic islets, liver, muscle, adipose and many other human tissues can then enable annotation of non-coding genetic variants and their potential gene regulatory functions. These maps are thus an invaluable component of determining how type 2 diabetes associated non-coding variants influence disease pathogenesis."

A recent paper from Dr. Gaulton and colleagues (Gaulton, KJ, et al. (2015) Nat Genet. 47:1415) illustrates the power of integrating these two data types. By combining information on transcription factor binding sites and tissue-specific chromatin states with genetic fine-mapping of T2D-associated loci, the authors elicidated the molecular mechanisms behind the effects of some T2D-associated variants, uncovering the role of the FOXA2 transcription factor in glucose homeostasis in T2D-relevant tissues.

Now, the T2DKP facilitates this type of analysis by presenting both genetic association and epigenomic data on Gene and Variant pages. We described the display of epigenomic data on Variant pages in a recent blog post. On Gene pages, epigenomic data are integrated into the LocusZoom display.

Locations of variants associated with T2D and chromatin states in pancreatic islets, across the SLC30A8 gene (partial view)

Below the plot of variant associations, chromatin states are displayed by default for the major T2D-relevant tissues. Using the pull-down menu at the top of the plot, you can choose from a diverse set to display other tissues and cell types. All of the details on how to use this interactive plot are included in our Gene Page guide.

This is only the first step for epigenomic data in the T2DKP. In the future, we plan to include additional types of epigenomic data that indicate chromatin accessibility and conformation. We will also add functionality; for example, for any given variant, you will be able to search for the tissues in which enhancer regions overlap the location of that variant.

As we actively develop this aspect of the T2DKP, we welcome your suggestions!

Thursday, August 17, 2017

New member of the Knowledge Portal family: the Cerebrovascular Disease Knowledge Portal

We are pleased to announce today’s launch of the Cerebrovascular Disease Knowledge Portal (CDKP), an open-access resource for the genetics of stroke built on the framework and infrastructure of the Type 2 Diabetes Knowledge Portal (T2DKP). The CDKP aggregates data from five large genome-wide association studies for stroke, and presents them along with GWAS results for T2D and other cardiometabolic and biometric phenotypes as well as epigenomic data from a wide range of tissues.

CDKP home page

Users of the T2DKP will find familiar interfaces in the CDKP, which offers the same three major entry points for exploring the data: Gene and Variant pages; the Variant Finder tool; and pages displaying genome-wide association results for each phenotype. Summary-level data are presented for browsing and searching, and researchers may perform custom analyses using individual-level data via the Genetic Association Interactive Tool (GAIT) or LocusZoom. Using the CDKP, T2D researchers can now check their favorite variants and genes for associations with a range of phenotypes related to cerebrovascular health and disease.

The CDKP has two additional layers of functionality relative to the T2DKP, addressing particular needs of the stroke research community. A Downloads page provides files of summary statistics from recent stroke genetic association studies. And a home page link leads to the Precision Medicine Platform (PMP) of the American Heart Association Institute for Precision Cardiovascular Medicine, where authorized researchers may work with selected sets of individual-level data in a secure computing environment.

The Knowledge Portal (KP) framework was designed and built by a team at the Broad Institute as part of the Accelerating Medicines Partnership in Type 2 Diabetes (AMP T2D), a public-private partnership that seeks to speed up the translation of genetic association data for T2D and related traits into actionable knowledge for new T2D treatments. In a collaboration with the International Stroke Genetics Consortium, funded by the National Institute of Neurological Disorders and Stroke, the Broad team incorporated stroke genetic data into the KP framework and customized the user interface for the stroke genetics research community.

This first application of the scalable, open-source KP software platform to a complex disease area other than T2D has paved the way for future collaborations to extend this platform to additional diseases, facilitating the translation of genetic data into actionable knowledge to improve human health.

Tuesday, July 11, 2017

Inaugural issue of the T2DKP quarterly newsletter

We've started a quarterly newsletter to keep you informed of the latest developments at the T2D Knowledge Portal. Download our Summer 2017 issue!

Monday, June 19, 2017

T2D Portal team at ADA 2017

Members of the T2D Knowledge Portal team returned last week from the 77th Scientific Sessions of the American Diabetes Association, inspired and invigorated by many great discussions with T2D researchers, educators, and clinicians.

In preparation for the conference, we set ourselves goals to add several new features to the Portal:

  • incorporate several new datasets and implement a new interactive Data page for exploring all datasets (see details)
  • add epigenomic data to shed light on the potential regulatory roles of genomic regions (see details)
  • implement a complete redesign of the Gene page that integrates multiple datasets to summarizes the significance of each gene to T2D and related phenotypes (see details)
  • connect with the new Federated Node of the Portal at EBI to provide seamless access to data housed there alongside data housed at the AMP T2D Data Coordinating Center at the Broad Institute (see details)

On the first day of the conference, Noël Burtt and Jason Flannick presented a mini-symposium focusing on the Portal to several hundred attendees.

This clearly generated a lot of interest, because our exhibit booth was a busy place for the next three days. 

T2D Portal team members at our exhibit booth

Multiple conversations happened at the booth!

We handed out a general guide to the Portal (download), and also presented a moderated poster (download).

At the booth, we especially enjoyed talking with people in the T2D field who are not geneticists but are simply curious about the genetics of T2D and the mission of the Portal. We encourage everyone to explore the Portal and to feel free to ask us any questions, even if they seem elementary. Please contact us any time with questions or feedback!

Monday, June 12, 2017

T2D Knowledge Portal now distills and summarizes genetic information for individual genes

The Type 2 Diabetes (T2D) Knowledge Portal presents genetic data relevant to T2D on two major types of page: Variant pages for individual variants, or SNPs; and Gene pages focusing on individual genes. Visual displays on Variant pages provide an immediate indication of the possible significance of each variant for T2D. But until now, Gene pages have presented large amounts of information from disparate sources without much integration or interpretation to guide the viewer.

Now, that has all changed with our release of the new Gene page. It guides researchers through an organized workflow that can help them take advantage of the aggregated data in the Portal to move from a variant of interest, to a gene of interest, to an assessment of the potential involvement of that gene’s product in T2D.

The central feature of the new Gene page is an at-a-glance display that summarizes the strength of the evidence for associations of the gene with T2D or related traits. An algorithm scans the comprehensive collection of datasets within the Portal to find data on variants in the gene, and the overall conclusion is shown by a “traffic light” icon. A green light indicates that there is strong evidence for association of at least one variant in the gene with at least one phenotype; a yellow light indicates that there is suggestive evidence, and a red light indicates that the data aggregated in the Portal contain no evidence for associations of variants within this gene.

Figure 1. Traffic light display for MTNR1B

Several sections of the page below the traffic light allow the user to drill down to much more information about the variants within the gene, their individual associations, and their collective impact on the disease burden of the gene. An interactive LocusZoom plot allows users to view the linkage disequilibrium relationships and associations from multiple datasets, with a wide variety of phenotypes, for common variants. The plot also displays the location of chromatin states, which can indicate the regulatory role of a region, in multiple tissues.

Figure 2. LocusZoom plot of the credible set of T2D-associated variants in MTNR1B (above) and chromatin state annotations for the region (below).

In the example shown above, the traffic light (Fig. 1) shows that variants in the MTNR1B gene encoding the melatonin receptor have one or more strong phenotypic associations (view the MTNR1B Gene page in the T2D Knowledge Portal). The table of common variants for MTNR1B (not shown) tells us that the most significantly associated variant is rs10830963. And a view of the LocusZoom plot for the credible set of variants associated with T2D (Fig. 2, top) shows that in fact the credible set for this region contains only rs10830963, further supporting its significance. The chromatin state annotations for this region (Fig. 2, bottom) provide evidence for a regulatory effect in pancreatic islets, consistent with a potential role in T2D. This information, easily found in the Portal today, replicates the results of a 2015 genetic analysis that required over 100 authors (Gaulton, KJ, et al. (2015) Nature Genetics 47:1415).

The new Gene page presents a lot of information and we can't cover it all in this space. But don't worry, we've created a guide to the page that explains every feature in detail. It's linked from the top of the page, or you can download it here.

With the inclusion of the new Gene page, the Portal now enables the rapid generation of testable hypotheses, by integrating, interpreting, and presenting information that previously could only be generated by coordinated research across a consortium. This new development brings the T2D Knowledge Portal project one step closer to informing the discovery of new targets and treatments for T2D.

Friday, June 9, 2017

Providing data access, ensuring data protection

Readers of this post probably don’t need to be convinced that genetic association data have enormous potential for helping us to understand and treat complex diseases like type 2 diabetes. Significant associations between variants and diseases can suggest genes, or regions of the genome, that could be important for disease risk or progression—and this knowledge could help us identify new drug targets.

The Accelerating Medicines Partnership in Type 2 Diabetes (AMP T2D) is a pre-competitive partnership among the National Institutes of Health, industry and not-for-profit organizations, which is managed by the Foundation for the National Institutes of Health. Its mission is to make genetic association data accessible to the worldwide biomedical research community via the Type 2 Diabetes Knowledge Portal, in order to facilitate discovery of new targets for T2D treatment. But it can be a challenge to aggregate genetic data. The privacy of the individuals who contributed their health status and genomic sequences must always be protected, and there are many layers of regulation to ensure this. Restrictions at the institutional, regional, and national levels determine how data are handled and whether they can be transferred.

Until now, all of the results displayed in the Portal have been derived from data housed at the AMP T2D Data Coordinating Center (DCC) at the Broad Institute, where the Portal website resides. But some of the valuable data generated outside the U.S. cannot be transferred to the DCC. To address this issue, AMP T2D funded the development of a mechanism that enables researchers to interact with all of the data: federation. 

Federation means that data are housed at a site (a “federated node”) that meets their specific privacy requirements, but are made available for remote queries via the Portal. Results from such queries are served up alongside results from all of the datasets housed in the AMP T2D DCC. Researchers may browse and query data from any location without even needing to know where they reside.

A federated node has now been created at the European Bioinformatics Institute (EBI) and may be accessed via the T2D Knowledge Portal. Today, Portal tools and interfaces can query both data housed at the AMP T2D DCC at the Broad Institute and data at the EBI federated node. 

According to Paul Flicek, a Senior Scientist and Team Leader of Vertebrate Genomics at EMBL-EBI, “A key mission of EMBL-EBI is to make data available to the widest possible community. Seamlessly accessing stored in multiple locations via a single portal helps ensure that the data we store from many projects are maximally useful for additional research.”

The first dataset to be incorporated into the Portal via the EBI federated node is the Oxford BioBank exome chip analysis dataset, which contains association data for glycemic, lipid, and blood pressure traits from over 7,100 healthy subjects in Oxfordshire, U.K. The dataset is described on our Data page. Portal users can interact with this dataset in the same way (and with the same speed) as with other datasets. 

“Diabetes is a global problem, and it will take research and innovation on a global scale if we are to tackle it effectively,” says Mark McCarthy, Robert Turner Professor of Diabetic Medicine at University of Oxford. “The success of our research on the genetics of diabetes depends on access to data generated by groups around the world. The federated portal provides an additional set of tools that will allow us to jointly analyse those data sets wherever they happen to be based.” 

Federation represents both an important technical advance in handling and protecting data, and a significant step forward in democratizing and improving access to genetic association results. And because it is generally applicable to any kind of genetic association data, it has the potential to have an impact beyond T2D research, facilitating the study of other complex diseases and traits.

Wednesday, June 7, 2017

New clues about variant effects: epigenomic data now available in the Portal

The T2D Knowledge Portal aggregates a wealth of genetic association data identifying variants that are associated with type 2 diabetes and related traits. These identifications show us that something within these genomic regions contributes to the risk of developing T2D. That’s an important first step, but in order to make use of this information to develop new T2D treatments, we need to figure out exactly what is causing the effect and how it relates to the disease process.

If a variant lies within a gene and changes a protein sequence, it can be relatively straightforward to formulate testable hypotheses about its effects. But most of the variants that are significantly associated with T2D—and with complex diseases in general—lie within noncoding regions of the genome and are likely to affect regulation of genes that could be far removed from the chromosomal position of the variant. It can be difficult to find clues about which genes are affected by these distant, noncoding changes, but now, we present a new type of data in the T2DKP that can help address this challenge.

The pattern of epigenetic modifications within a genomic region can provide important clues about its regulatory role. The distribution of these position-specific and tissue-specific marks—for example, covalent modifications of the histone proteins that package DNA—is characteristic of elements such as enhancers or transcription start sites. The Roadmap Epigenomics Consortium has developed methods for detecting these modifications genome-wide (hence the term “epigenomics”) and integrating their positional data, using the ChromHMM algorithm, to categorize genomic regions into “chromatin states”. The presence of these states in a given genomic region in different tissue types can give hints about whether that region might be involved in regulation of specific genes or pathways.

Now, you can view the tissue-specific chromatin states spanning the position of each variant on Variant pages within the Portal. We have incorporated epigenomic data from a study (Varshney et al., 2017) in which the locations of 13 distinct chromatin states were determined across a diverse set of cell lines and tissues, including pancreatic islets. The new “Epigenomic annotations” section of each Variant page (see an example) presents information about chromatin states in three different ways.

1. An interactive table listing chromatin states in this region, the tissue or cell line in which they were observed, and their genomic coordinates. Filter the table by chromatin state or by tissue to find states of particular interest.

2. A matrix displaying chromatin states by tissue type. This graphic gives a quick indication of chromatin states that are present in this region, across the whole panel of tissues.

3. A graphic showing the positions of chromatin states relative to the position of the variant.

These new features represent only the first phase of incorporating this new data type into the Portal. In the future, we will be adding more of these data along with more versatile interfaces for exploring them. Please check out our new epigenomic annotations and send us your feedback!

Wednesday, May 31, 2017

See you in San Diego!

Members of the T2D Knowledge Portal team are gearing up for the 77th Scientific Sessions of the American Diabetes Association, June 9-13 in San Diego, CA. We'll be releasing exciting new features of the Portal just before the conference, and we have a wide variety of presentations planned for each day.

On the opening day of the conference (Friday, June 9), join us for a mini-symposium that will present a comprehensive guide to the T2D Knowledge Portal and how you can use it to further your type 2 diabetes research. We will be exhibiting at booth #2452 on Saturday, Sunday, and Monday, and each day, genetics experts will be available at the booth to answer questions and discuss both the Portal and the genetics of T2D. On Saturday, members of the Portal team will participate in a moderated poster session, and posters will also be displayed on Monday. And on Sunday morning, our principal investigator, Dr. Jose C. Florez, will give a symposium presentation on "Mining the Genome for Therapeutic Targets."

Find the full details in the schedule below and follow us on Twitter (@T2DKP) for up-to-the-minute news throughout the conference. We're looking forward to meeting you!

Friday, June 9, 2017

Mini-Symposium: A Researcher’s Guide to Exploring Diabetes Genetic Data in the Type 2 Diabetes Knowledge Portal
Chair: Mark McCarthy
11:30am - 12:30pm, Room 28

11:30-11:50am        Noël Burtt: Data, Analysis, and Tools in the Type 2 Diabetes Knowledge Portal
11:50am-12:10pm   Jason Flannick: Demonstration of Questions that Can Be Addressed Using the Portal

12:10-12:30pm       Question and Discussion Period

Saturday, June 10, 2017

  • Exhibiting at booth #2452, 10am - 4pm
  • Moderated Poster Session: Genetic Data, Pathways, and Variants for Type 2 Diabetes and Related Traits. 12:30-1:30pm, Hall B
Poster 1765-P
The Type 2 Diabetes Knowledge Portal: Accelerating Type 2 Diabetes Research through Community Access to Human Genetic Information and Tools
Presenter: Maria C. Costanzo

Poster 1766-P
Key Biological Pathways for Type 2 Diabetes Determined by Genetic Cluster Analysis on Related Traits
Presenter: Miriam S. Udler

Sunday, June 11, 2017

  • Symposium presentation: Mining the Genome for Therapeutic Targets. 
Dr. Jose C. Florez
9:20-9:55am, Ballroom 20D

  • Exhibiting at booth #2452, 10am - 4pm

Monday, June 12, 2017

  • Exhibiting at booth #2452, 10am - 2pm
  • Poster session, 12-1pm, Hall B
Poster 1765-P
The Type 2 Diabetes Knowledge Portal: Accelerating Type 2 Diabetes Research through Community Access to Human Genetic Information and Tools
Presenter: Maria C. Costanzo

Poster 1795-P
Type 2 Diabetes Gene Bioinformatically Identified by Variants Mapping to Amino-Acid Changes in Three-Dimensional Protein Space
Presenter: Marcin von Grotthuss

Wednesday, May 3, 2017

Explore new datasets and phenotypes in the T2D Knowledge Portal

We are releasing multiple new datasets in the Portal and have updated existing sets with associations for new phenotypes. To make it even easier to browse and explore these sets, we've also updated our Data page and added new functionality. Here's an overview of what's new in the Portal today.

17K exome sequence analysis dataset has grown to 19K
The Data Coordinating Center (DCC) of the Accelerating Medicines Partnership in Type 2 Diabetes (AMP T2D) analyzes exome sequence data contributed by AMP T2D consortium members to find variant associations with T2D and related traits. The exome sequencing dataset available in the Portal has until now consisted of exome sequences from about 17,000 individuals. Today, we have added exome sequencing performed on 2,000 Danish subjects by the LuCamp (Lubeck Foundation Centre for Applied Medical Genomics in Personalised Disease Prediction, Prevention and Care) consortium, making a total of nearly 19,000 exomes. This is just a taste of things to come: at the AMP T2D DCC we are currently analyzing additional exome sequences that will bring the total up to 52,000!

New community-contributed datasets: GENESIS GWAS and 70KforT2D GWAS

We are grateful to two groups from the larger T2D research community who have shared data that will make the T2D Knowledge Portal even more valuable to worldwide T2D researchers.

The GENEticS of Insulin Sensitivity (GENESIS) consortium performed GWAS on over 2,700 nondiabetic participants, finding genetic associations with direct measures of insulin sensitivity.

The 70KforT2D project collected, harmonized, and re-analyzed public GWAS data from over 70,000 individuals to find T2D genetic associations.

New public dataset: VATGen GWAS

The VATGen GWAS consortium performed meta-analysis of GWAS data from a mixed-ancestry group of more than 18,000 people to identify genetic associations with the localization of body fat deposition, leading to insights into adipocyte development.

Updated dataset: glucose-stimulated insulin secretion phenotypes in MAGIC GWAS

A study by Prokopenko et al. analyzed genetic associations with insulin secretion. Associations of variants with nine different measures of insulin secretion, among them corrected insulin response (CIR) and disposition index (DI), have now been added to the MAGIC GWAS dataset.

ExAC updated to gnomAD exomes and whole genomes
The Exome Aggregation Consortium (ExAC) has more than doubled in size and has morphed into the Genome Aggregation Database (gnomAD). More than 120,000 exome sequences and 15,000 whole genome sequences are now available, and these data are accessible via several tools and interfaces in the T2D Knowledge Portal.

New Data page: explore datasets using new filters

As our collection of data grows, it becomes more difficult to understand the differences between datasets and to find those of interest. To address this challenge, we've reorganized and streamlined our Data page.

A section of the Data page, expanded to show phenotype selection.

At the top of the Data page, you can choose to filter the dataset table by data type, phenotype category, or both. When you click on a phenotype category, the phenotypes within that category are available for selection. Clicking on the name of any dataset expands a section with details and references for each. 

In the coming days, watch this space for more details about each of these new developments. And as always, please contact us if you have any comments or questions.

Wednesday, March 15, 2017

The Portal’s interactive burden test: now more versatile than ever

Significant associations between genes and T2D or related phenotypes can provide powerful insights into disease mechanisms and possible therapies. The T2D Knowledge Portal includes results from pre-computed analyses of genetic associations for a large, and growing, number of datasets. But what if you want to do a more fine-grained analysis? You might want to test whether the disease burden for a gene differs between groups of people with specific characteristics—for example, lean people with T2D versus obese people without T2D. Or you might want to test the aggregate effect of a specific subset of variants, such as those that are likely to knock out the function of a protein of interest.

Our interactive burden test on Gene pages, powered by the Genetic Association Analysis Tool (GAIT), allows you to do all that and more. The burden test considers a gene as the unit of inquiry, including all the variants it contains in a statistical test of disease association. We described the basics of the burden test and GAIT in a recent blog post. Now, we’ve added some options for selecting variants in the interactive burden test that make this tool even more versatile.

The variant selection step of the burden test on a Gene page is pre-populated with all of the variants present in the selected dataset that are located within the gene and its 100 kb up- and downstream flanking regions. You can create a specific subset of these by checking or un-checking individual variants. The table may be sorted by multiple criteria in order to find variants of interest: chromosomal coordinate; minor allele count; predictions of the effect allele’s impact on the encoded protein; and the protein change or type of mutation caused by the effect allele.

Section of the interactive burden test interface showing the default list of variants for the SLC30A8 gene. Options for customizing the list are located above the variant table.

The table of variants may be filtered so that the test considers only certain categories of variants, with varying predicted impacts on the encoded protein. Previously, the burden test offered filters based on an unpublished method. Now, we have replaced those filters with the set that was used in a recent major publication: The genetic architecture of type 2 diabetes, by Fuchsberger, Flannick, Teslovich, Mahajan, Agarwala, Gaulton, et al.

Variant filters in the interactive burden test

All coding variants--selects variants within the coding sequence, from the dataset that was initially selected for the burden test

Protein-truncating + missense with MAF<1%--selects variants in both of these categories:
  • protein-truncating (predicted to cause a truncated protein to be generated, either by creating a premature stop codon or by causing a frameshift) 
  • cause a missense mutation AND have minor allele frequency (MAF) of less than 1%. The MAF limit eliminates common variants, which would not be expected to have very deleterious effects. 

Protein-truncating + possibly deleterious missense with MAF<1%--selects variants in both of these categories:

Protein-truncating + probably deleterious missense--selects variants in both of these categories:

Protein-truncating only--selects variants predicted to cause a truncated protein to be generated, either by creating a premature stop codon or by causing a frameshift.

Using these filters, you can tailor the list of variants to those with specific impact on the encoded protein. If you would like to customize the list even further by adding variants that were not present in the default list, there is now an option to add single or multiple variants, using dbSNP IDs (e.g., rs112881768) or identifiers in the format “chromosome_coordinate_reference-nucleotide_variant-nucleotide” (e.g., 8_112881768_G_A).

When “single variant” is selected, once you begin typing, variant IDs that match your entry are suggested. When “multiple” is selected, you may type or paste in a list of variant IDs, separated by commas or returns. Note that any added variants are not subject to the filters, which act only on the default list of variants for a gene.

Our GAIT User Guide (download PDF) that summarizes all the details of the interface has been updated with the latest changes. Please check out our new, improved interactive burden test and let us know if you have comments or suggestions.

Sunday, February 5, 2017

Introductory guide to genetic association analysis now available

P-values. Odds scores and betas. GWAS. Linkage disequilibrium. What does it all mean?

Human geneticists are, of course, intimately familiar with these concepts. But for people who are not human geneticists, just getting past the terminology can be frustrating. So we’ve written a basic primer and reference guide that can help users of the T2D Knowledge Portal understand the information presented in our interfaces and tools.

Our Introduction to genetic association analysis guide is available from our Resources page. Or download it here (PDF).

This guide provides a basic introduction to the rationale behind applying human genetic association studies to complex diseases like T2D, explains some of the parameters of genetic associations such as p-values and odds ratios, and describes the different types of experiment used to determine genetic associations.

Many thanks to Andrew Morris, University of Oxford, for his thoughtful review and helpful comments on this guide.

We would be happy to hear your suggestions for improvements and additions!

Monday, January 23, 2017

Insulin Sensitivity Index data added to the Portal

The loss of sensitivity to insulin, often termed insulin resistance, is characteristic of type 2 diabetes. Since this sensitivity is difficult to measure directly, researchers have developed an index that reflects it: the modified Stumvoll Insulin Sensitivity Index (ISI). The index is derived by a formula that combines fasting insulin levels with glucose and insulin levels measured two hours after a glucose load.

Now, the results of a study of genetic associations of variants with ISI are available in the T2D Knowledge Portal. These results are from a recent paper in Diabetes by co-first authors Geoffrey Walford, Stefan Gustafsson, Denis Rybin, and fellow members of the Meta-Analyses of Glucose and Insulin-related traits Consortium (MAGIC). (For an overview of the results, see our blog post about the paper.)

In this study, ISI was calculated for 16,753 non-diabetic individuals, and associations of their variants with ISI values were analyzed. The associations were adjusted in one of three ways: for age and sex; for age, sex, and body mass index (BMI); or according to a model that analyzed the combined influence of the genotype effect adjusted for BMI and the interaction effect between the genotype and BMI on ISI. More details about this data set and others from MAGIC may be found on our Data page.

ISI associations are a subset of the MAGIC GWAS data set. They may be viewed in the Portal by selecting one of these phenotypes:
  • ISI adjusted for age-sex
  • ISI adjusted for age-sex-BMI
  • ISI adjusted for genotype-BMI interaction
Associations with these phenotypes can be found in these locations on Portal pages:
  • On Gene Pages (see an example) in the Variants & Associations table
  • On Variant Pages (see an example) in the Associations at a glance section and in the Association statistics across traits table
  • Via the Variant Finder tool, for the phenotypes listed above
  • A "Manhattan plot" of associations across the genome may be seen by selecting one of the phenotypes listed above in the View full genetic association results for a phenotype scroll box on the Portal home page.

Thursday, January 19, 2017

CAMP GWAS data set moves to Early Access Phase 2

Three months ago, we incorporated a data set from the MGH Cardiology and Metabolic Patient Cohort (CAMP) into the T2D Knowledge Portal. These data were contributed by Pfizer, Inc. as part of a public-private partnership to generate genotype data for a cardiometabolic and prediabetic cohort; they add individual-level genetic association data for type 2 diabetes (T2D), fasting glucose levels, and fasting insulin levels from more than 3,500 samples to the Portal knowledgebase. Now, the CAMP GWAS data set has transitioned to Early Access Phase 2 status in the Portal.

The CAMP GWAS data set was the first to be included in the Portal with “Early Access” status, which is assigned to new data. As described on our Policies page, all newly added data sets have Early Access status for the first six months that they are in the Portal. In the first three months, Phase 1 of the Early Access period, the data have undergone quality control checks but they are not considered to be in their final form. The purpose of Phase 1 is to allow Portal users to review and analyze the data in order to identify any potential problems or areas needing further analysis. After this three-month period, data sets move to Phase 2, indicating that the data are in final form and are fully integrated into the Portal.

Portal users must not submit manuscripts concerning new data until both Phase 1 and Phase 2 of the Early Access period have passed, and any results of analyses or proposed publications are subject to the "Fort Lauderdale Principles" articulated for the sharing of genomic data.

In three months, the CAMP GWAS data set will become Open Access, meaning that it may be freely used for research as long as Portal users comply with our guidelines on user responsibilities and proper citation. It is important to note that in order to protect patient privacy, individual-level data in the Portal are never directly accessible to users. Rather, the Portal makes available summary statistics derived from the data, and also provides tools (such as the Genetic Association Interactive Tool (GAIT) and the Interactive Burden Test) that allow users to perform custom analyses based on individual-level data while protecting the security and privacy of those data.

Find CAMP data at all of these locations in the Portal:

  • On Gene Pages (e.g.,  HLA-C) in the Variants & Associations table.
  • On Variant Pages (e.g., rs9468919) in the Associations at a glance section and in the Association statistics across traits table.
  • Via the Variant Finder tool, for the phenotypes T2D, fasting glucose, and fasting insulin.
  • Via the Genetic Association Interactive Tool (GAIT), which enables custom association analysis for either single variants (available on Variant Pages) or for the set of variants in and near a gene (Interactive burden test, available on Gene Pages).
  • A "Manhattan plot" of genetic associations across the genome may be accessed by selecting the phenotype T2D, fasting glucose, or fasting insulin in the "View full genetic association results for a phenotype" selection box on our home page, and then choosing the CAMP GWAS data set.

Find many more details about the CAMP GWAS data set on our Data page, or read a summary in this blog post.

Tuesday, January 17, 2017

New Year, New Data: BioMe AMP T2D GWAS

We’re happy to announce the first addition of data to the Type 2 Diabetes Knowledge Portal in 2017: the BioMe AMP T2D GWAS data set. The generation of these data was funded by the Accelerating Medicines Partnership in Type 2 Diabetes (AMP T2D), a collaboration between multiple stakeholders that aims to catalyze the clinical translation of genetic discoveries by producing and aggregating data, developing and implementing novel analytical methods and tools, and building infrastructure for data storage and presentation.

The BioMe AMP T2D GWAS data set is the first set to be entirely produced by the AMP T2D project, which supplied the funding and carried out every step of its production, from data generation to analysis, quality control, and presentation. Its immediate availability in the Portal, prior to publication, fulfills the mission of AMP T2D to speed up access to and utilization of new data.

These data were generated at the Charles Bronfman Institute for Personalized Medicine BioMe BioBank, a biorepository located at the Mount Sinai Medical Center (MSMC) in the upper Manhattan area of New York City. MSMC serves a diverse population of over 800,000 outpatients each year. Importantly, since many BioMe participants are African American or Hispanic Latino, this data set adds significant ethnic diversity to the Portal’s genetic association data.

The BioMe AMP T2D GWAS data set is comprised of about 13,000 unique individuals, 41.5% of whom are admixed American, 38% African American, and 20% European. Subjects were genotyped using at least one of three platforms: the Illumina Exome Array, the Illumina GWAS array, or the Affymetrix GWAS array. Their T2D status was assessed by an algorithm, and many additional traits were also measured.

The data were subjected to quality control and association analysis by the Analysis Team at the AMP Data Coordinating Center (DCC) at the Broad Institute. Variant associations with T2D, fasting glucose levels, and HbA1c levels were analyzed. The top results included both previously known and novel variants, with only a single variant reaching genome-wide significance: T2D association of the variant rs7903146, within the well-established T2D risk gene TCF7L2. Now that these results are available in the T2D Knowledge Portal, the ability to analyze them further in the context of all other available T2D association data may lead to additional insights.

The BioMe AMP T2D GWAS data currently has the “Early Access Phase 1” status that is assigned to new data. This status denotes that although analysis and quality control checks have been performed, the data are not yet considered to be in their final state. During the early access period, users may analyze the data but may not submit the results of these analyses for publication. Find the full details about the different phases of data release on our Policies page. More information about the data set, along with links to download even more detailed reports on its quality control and analysis, may be found in the BioMe AMP T2D GWAS section of our Data page.

BioMe AMP T2D GWAS data are available at these locations in the Portal:

  • On Gene Pages (see an example) in the Variants & Associations table and the Minor allele frequencies across data sets table
  • On Variant Pages  (see an example) in the Associations at a glance section and in the Association statistics across traits table
  • Via the Variant Finder tool, for these phenotypes: type 2 diabetes; fasting glucose adjusted for age and sex; HbA1c adjusted for age and sex; and HbA1c adjusted for age, sex, and body mass index
  • A "Manhattan plot" of associations across the genome may be seen by selecting one of the phenotypes above in the View full genetic association results for a phenotype scroll box on the Portal home page, and then selecting the BioMe AMP T2D GWAS data set.

As always, please contact us with any questions, comments, or suggestions.