Wednesday, November 15, 2017

T2DKP Fall Newsletter

The latest issue of our quarterly newsletter is now available. Download it here to find out what we've been up to!

Tuesday, November 14, 2017

Announcing the Cardiovascular Disease Knowledge Portal

We are pleased to announce the launch of the Cardiovascular Disease Knowledge Portal (CVDKP). Our collaboration with Dr. Patrick Ellinor, Dr. Sek Kathiresan, and their colleagues in the Atrial Fibrillation, Global Lipids Genetics, Myocardial Infarction Genetics, and CARDIoGRAMPlusC4D consortia has created a resource that offers world-wide open access to genetic and genomic information about atrial fibrillation, myocardial infarction, and related traits, with the goal of democratizing access to genomic data and accelerating cardiovascular genomics research.

CVDKP home page

The CVDKP is constructed on a software architecture originally developed for the Type 2 Diabetes Knowledge Portal (T2DKP), which is the central product of the Accelerating Medicines Partnership in Type 2 Diabetes (AMP T2D). AMP T2D is a public-private partnership between the National Institutes of Health, the U.S. Food and Drug Administration, biopharmaceutical companies, and non-profit organizations that is managed through the Foundation for the NIH. AMP seeks to harness collective capabilities, scale, and resources toward improving current efforts to develop new therapies for complex, heterogeneous diseases.

The ultimate goal of AMP T2D is to increase the number of new diagnostics and therapies for patients while reducing the time and cost of developing them, by jointly identifying and validating promising biological targets for type 2 diabetes. The T2DKP furthers that goal by aggregating, harmonizing, and displaying genetic association and epigenomic results along with user-friendly analysis tools, allowing research biologists who would not individually be able to amass and manipulate these large datasets to glean insights from the data.

We are working towards these same goals for other complex diseases, by extending the platform and analysis tools constructed for the T2DKP. In partnership with the International Stroke Genetics Consortium, we recently created a Knowledge Portal for cerebrovascular disease (CDKP) based on the same infrastructure. Now, with the advent of the Cardiovascular Disease Knowledge Portal, we have a three-member Knowledge Portal Network for the genetics of cardiometabolic and cerebrovascular disease.

Data in the CVDKP directly relevant to heart disease include genetic associations with atrial fibrillation, electrocardiogram traits, plasma lipid levels, and myocardial infarction. Additional association datasets are available for type 2 diabetes and glycemic traits, anthropometric traits, measures of kidney function, and psychiatric traits. You may browse the complete list of datasets and their descriptions on the CVDKP Data page.

As for the Cerebrovascular Disease Knowledge Portal, in the CVDKP we also continue to work with the American Heart Association Precision Medicine Platform (PMP) to provide an additional avenue for accessing cardiovascular genetic data. Currently, summary statistics from the AFGen GWAS and AFGen exome chip analysis datasets are deposited in the PMP.

We welcome all suggestions, comments, questions, and submission of relevant datasets for the CVDKP. Please contact us at!

Wednesday, October 25, 2017

New phenotypes and physical activity stratification available in the T2DKP

We’ve recently updated one dataset and added another in the Type 2 Diabetes Knowledge Portal. Associations with multiple new phenotypes are now available for the BioMe AMP T2D GWAS dataset, and the new dataset "GIANT GWAS - stratified by physical activity" adds associations with anthropometric traits for cohorts stratified by gender and physical activity levels.

The BioMe AMP T2D GWAS dataset was first added to the T2DKP in early 2017, initially with three phenotypes (T2D, fasting glucose levels, and HbA1c levels). Deposition and analysis of these data was funded by the Accelerating Medicines Partnership in Type 2 Diabetes (AMP T2D), a collaboration between multiple stakeholders that aims to catalyze the clinical translation of genetic discoveries by producing and aggregating data, developing and implementing novel analytical methods and tools, and building infrastructure for data storage and presentation. This dataset was the first to be entirely produced within the AMP T2D project, including the deposition, analysis, quality control, and presentation of the data.

The data were generated at the Charles Bronfman Institute for Personalized Medicine BioMe BioBank, a biorepository located at the Mount Sinai Medical Center (MSMC) in the upper Manhattan area of New York City. MSMC serves a diverse population of over 800,000 outpatients each year. Importantly, since many BioMe participants are African American or Hispanic Latino, this dataset adds significant ethnic diversity to the Portal’s genetic association data.

The data were subjected to quality control and association analysis by the Analysis Team at the AMP Data Coordinating Center (DCC) at the Broad Institute. In this second phase of analysis, associations with seven traits were calculated: systolic and diastolic blood pressure; HDL and LDL cholesterol levels; creatinine levels and eGFR-creat; and BMI. A detailed analysis report for these associations may be downloaded from the BioMe AMP T2D GWAS section of our Data page.

The new GIANT dataset was generated by the GIANT (Genetic Investigation of Anthropometric Traits) consortium via a meta-analysis of genetic associations for BMI, waist-hip ratio, and waist circumference from more than 200,000 adults. Samples are stratified by sex, ancestry, and physical activity level (active or inactive). This work was published in a recent paper by Graff et al.

Data from both the BioMe and GIANT studies are available at these locations in the Portal:
  • On Gene pages (see an example) in the Common variants and High-impact variants tables and in LocusZoom static plots
  • On Variant pages  (see an example) in the Associations at a glance section and in the Association statistics across traits table, and in LocusZoom static plots 
  • Via the Variant Finder tool
  • "Manhattan plots" of associations across the genome may be seen by selecting one of the phenotypes analyzed in these datasets in the View full genetic association results for a phenotype scroll box on the Portal home page
  • Additionally, the BioMe data are available for sample filtering and custom association analysis via the Genetic Association Interactive Tool (GAIT) on Variant pages.

Please check out the new data and contact us with any questions, comments, or suggestions.

Monday, October 16, 2017

Learn about complex disease knowledge portals at ASHG 2017

Members of the Knowledge Portal team will be attending the American Society of Human Genetics meeting this week in Orlando, FL.

We'll be talking about the continuing progress of the Type 2 Diabetes Knowledge Portal, which has grown dramatically since ASHG 2016, with loads of new data and many new features. We'll also present our work towards expanding the T2DKP framework to other complex diseases, with the recent release of a new sibling portal for stroke genetics, the Cerebrovascular Disease Knowledge Portal.

You can catch us nearly every day of the meeting:

Wednesday 10/18

10 AM - 5 PM: Find us in the exhibit hall at booth #863. We’ll be there to answer your questions and give tours and tutorials on the Knowledge Portal Network.

10-10:30 AM: Demonstration of the Type 2 Diabetes Knowledge Portal at our booth, #863.
10:30-11 AM: Demonstration of the Cerebrovascular Disease Knowledge Portal at our booth, #863.

2 PM - 4 PM: Ben Alexander will present poster #1186: The Type 2 Diabetes Knowledge Portal: Clearing a path from genetic associations to disease biology.

Thursday 10/19

10 AM - 5 PM: We will again be in the exhibit hall at booth #863.

10:30 - 11:30 AM: Portal team members will be available at the Broad Institute booth (#1037) for demonstrations and tutorials.

2-2:30 PM: Demonstration of the Type 2 Diabetes Knowledge Portal at our booth, #863.
2:30-3 PM: Demonstration of the Cerebrovascular Disease Knowledge Portal at our booth, #863.

4:15 PM–6:15 PM: Portal team members will be participating in Concurrent Invited Session #49:

Data Sharing, Analysis, and Tools to Catalyze Translation from Genomic to Clinical Knowledge
Room 330C, Level 3, Convention Center
Moderators: Benjamin Neale and Noël Burtt
Serving genetic data and tools to the world - Jason Flannick.
The EGA as a platform for effective data sharing of human genetic and phenotype data -Thomas Keane.
Converting sequence data from over 140,000 people into rare disease diagnoses - Daniel MacArthur.
Assessing the phenome-wide consequences of genetically regulated molecular traits - Hae Kyung Im.

Friday 10/20

10 AM - 2:30 PM: This is our last day in the exhibit hall at booth #863.

We look forward to meeting you at ASHG! If you have questions and cannot meet us any of these times, or if you won’t be at ASHG, our mailbox is always open at

Thursday, October 5, 2017

Portal team presents at Festival of Genomics

The Festival of Genomics was held in Boston on October 3-4, and the Type 2 Diabetes Knowledge Portal was well represented. This meeting brings together multiple stakeholders in genomics and health care, using innovative formats to educate, promote connections, and spark conversations.

On the first day of the meeting, Nöel Burtt presented a talk focusing on the T2DKP: "The Type 2 Diabetes Knowledge Portal: accelerating action from data."

Nöel Burtt speaking at the Festival of Genomics

The next day, T2DKP team members participated in a session with an unusual format, designed to stimulate interaction and conversation: "In the Loop."  Jason Flannick, Senior Group Leader for the Portal project, chaired the session, which was entitled "Complex data, complex solutions: tackling the challenges of genetic data sharing, interpretation, and representation." "Loop leaders," all from the Broad Institute, included:

  • Sean Simmons, speaking on "Technological solutions to privacy concerns around data access and utilization";
  • Moran Cabili, speaking on "Ethical and technical challenges to genetic data access";
  • David Siedzik, speaking on "Analysis tools for genetic data and their scalability"; and
  • Maria Costanzo, speaking on "Representing and democratizing access to genetic association results."

Left to right: Jason Flannick, Sean Simmons, Moran Cabili, David Siedzik, and Maria Costanzo participate in an "In the Loop" session
The session began with an introduction from the chair and speakers, and then the audience broke into four groups for discussion with each of the Loop leaders. Each leader facilitated a discussion among two different groups, and then the entire group convened again to sum up the discussions. The group may not have come up with any breakthrough ideas, but participants definitely emerged with a clearer picture of the many challenges around aggregating and representing genetic data.

The T2DKP team is headed next to the American Society of Human Genetics meeting. Stay tuned for a preview of our participation there in the next blog post!

Monday, September 18, 2017

All for one (population) and one for all

Type 2 diabetes (T2D) is a world-wide health problem, but it hits especially hard in Latin America, where incidence is higher than in many other parts of the world. To investigate the genetic basis for this difference, researchers from the U.S., Mexico, and Spain teamed up to look for genetic coding variants associated with T2D risk that are more common in people of Hispanic descent. In their recent paper (Mercader et al. 2017, Diabetes), the researchers discovered such variants and uncovered the molecular details of how one in particular affects T2D risk. Their results suggest a new avenue for drug development that could benefit diabetics of all ancestries. And surprisingly, although Hispanics have higher T2D risk, this variant actually protects against T2D.

In designing the study, Mercader and colleagues decided to focus on variants located within protein-coding sequences, whose effects can be more direct and more straightforward to test than those of variants outside genes. They used exome chip analysis, which considers only variants in protein-coding regions of the genome, to genotype both diabetics and non-diabetics of Hispanic descent from Mexico and the U.S. Their dataset, SIGMA exome chip analysis, is accessible in the T2D Knowledge Portal and described on our Data page.

To find variants that might differentially affect the Hispanic population, the researchers looked for T2D-associated variants that were common in Hispanics, but rare or low-frequency in people of European ancestry. The most significant variant in this category, rs149483638, is present at a minor allele frequency (MAF) of 17% in people of Hispanic ancestry, but has MAF of only 1%, 0.1%, and 0.02% in East Asian, African, or European ancestries, respectively.

Surprisingly, although enriched in this population that is more vulnerable to T2D, the rs149483638 effect allele is protective against T2D. People who are heterozygous for the effect allele (a T at position 2161530 of chromosome 11 rather than a C) have 22% decreased risk of T2D, while homozygous carriers have 40% decreased risk.

After the initial discovery, the investigators performed more analyses to verify whether rs149483638 was the causal variant in the region, and replicated the T2D association in independent datasets. All the results supported the hypothesis that this particular variant directly reduces T2D risk.

The variant is located in the IGF2 gene, which encodes a peptide similar to insulin that has previously been linked to growth disorders, obesity, and T2D. Alternative splicing generates two different isoforms of IGF2, and the protective allele disrupts a predicted acceptor site for the splicing event that would generate isoform 2. Could the absence of IGF2 isoform 2 be protective against T2D?

Mercader and colleagues performed further experiments to address the questions of whether the rs149483638 effect allele blocks the production of isoform 2 and whether this has an impact on T2D risk. In human cell culture, the protective allele did indeed block splicing at that site.

To see whether this happens in humans, the researchers tested tissue samples for the presence of isoform 2, and found that its expression was lower in people carrying the protective allele. Furthermore, among people who lacked the protective allele, those with T2D showed higher expression of isoform 2 in their visceral fat tissue than did those without T2D. Levels of isoform 2 in non-diabetics were also positively correlated with levels of HbA1c, which is an indicator of elevated blood glucose levels. No such correlations were seen for levels of IGF2 isoform 1.

Taken together, these results support the involvement of isoform 2 in the elevation of T2D risk, suggesting an intriguing possibility: could lowering levels of isoform 2 be an effective way to lower T2D risk?

If lowering isoform 2 levels were to be used as a T2D therapeutic, it would be important to know that this reduction had no adverse effects. Genetic data can shed light on this question as well. The authors looked in the Exome Aggregation Consortium (ExAC) database and in the clinical records of their study subjects, and saw no health effects other than lowered T2D risk in carriers of the protective variant. They also performed a phenome-wide association study (PheWAS) in the in Genetic Epidemiology Research on Aging (GERA) cohort, and saw no association of the T2D-protective allele with any of 18 medical conditions.

Thus it seems likely that loss of IGF2 isoform 2 would not be harmful, setting the stage for research into drugs that could specifically inhibit isoform 2 or block its production as a way to delay or treat the development of T2D.

These fascinating results have opened multiple avenues for future research. What is the specific biological role of IGF2 isoform 2 in T2D? It differs from isoform 1 only in that it carries an extra 56 N-terminal amino acids. Isoform 1 predominates, while isoform 2 is expressed at very low levels—although its highest expression is seen in pancreatic islets, liver, and fat, all tissues that are relevant for T2D. Elucidating the molecular details of this role will increase our understanding of the biological mechanisms in T2D. And from an evolutionary perspective, the question of how this protective variant came to be enriched in this population is an interesting one.

The motto of the Three Musketeers was "All for one and one for all," meaning that the group supports each member and each member supports the group. As this paper illustrates, this theme is also emerging in human genetics. By investigating distinct populations, we can not only learn about those specific populations but also gain knowledge to benefit all humankind.

Wednesday, August 30, 2017

Bringing the power of epigenomics to the T2DKP

Until recently, all of the results displayed in the Type 2 Diabetes Knowledge Portal (T2DKP) were based on genetic association data: the significance with which variants, or SNPs, occur in people’s genomes in conjunction with a disease or trait.

This information is hugely important for pinpointing regions of the genome that contribute to disease risk. It is now relatively straightforward to identify these regions, but it is still a large challenge to discover the mechanisms by which they act—especially for variants that are outside of coding sequences, without an obvious effect on the sequence of a particular protein. These non-coding variants, the most commonly seen in genetic association studies, are likely to affect tissue-specific gene regulation that could potentially be important to the disease process.

How can we overcome this challenge to find clues about the effects of these non-coding variants? Epigenomic data to the rescue!

Dr. Kyle Gaulton of the University of California at San Diego researches the transcriptional regulatory networks involved in type 2 diabetes by using epigenomic data in concert with genetic association data. He explains, "Regulatory elements control gene production and function, and are often highly specialized across cell and tissues and located far away from the genes they regulate. Molecular epigenomic hallmarks of gene regulation such as histone and DNA modifications, nucleosome depletion, chromatin conformation and DNA-protein interactions can pinpoint the precise genomic locations of regulatory elements. High-resolution epigenome maps of regulatory elements in pancreatic islets, liver, muscle, adipose and many other human tissues can then enable annotation of non-coding genetic variants and their potential gene regulatory functions. These maps are thus an invaluable component of determining how type 2 diabetes associated non-coding variants influence disease pathogenesis."

A recent paper from Dr. Gaulton and colleagues (Gaulton, KJ, et al. (2015) Nat Genet. 47:1415) illustrates the power of integrating these two data types. By combining information on transcription factor binding sites and tissue-specific chromatin states with genetic fine-mapping of T2D-associated loci, the authors elicidated the molecular mechanisms behind the effects of some T2D-associated variants, uncovering the role of the FOXA2 transcription factor in glucose homeostasis in T2D-relevant tissues.

Now, the T2DKP facilitates this type of analysis by presenting both genetic association and epigenomic data on Gene and Variant pages. We described the display of epigenomic data on Variant pages in a recent blog post. On Gene pages, epigenomic data are integrated into the LocusZoom display.

Locations of variants associated with T2D and chromatin states in pancreatic islets, across the SLC30A8 gene (partial view)

Below the plot of variant associations, chromatin states are displayed by default for the major T2D-relevant tissues. Using the pull-down menu at the top of the plot, you can choose from a diverse set to display other tissues and cell types. All of the details on how to use this interactive plot are included in our Gene Page guide.

This is only the first step for epigenomic data in the T2DKP. In the future, we plan to include additional types of epigenomic data that indicate chromatin accessibility and conformation. We will also add functionality; for example, for any given variant, you will be able to search for the tissues in which enhancer regions overlap the location of that variant.

As we actively develop this aspect of the T2DKP, we welcome your suggestions!