Thursday, November 17, 2016

Collaborate with us!

One of the goals of the Type 2 Diabetes Knowledge Portal project is to bring together the world-wide T2D and genetics research communities to share data, knowledge, methods, and tools. In keeping with that goal, we welcome contributions of data to the Portal and we are also open to collaboration as we develop new and better ways to analyze and display data.

We’ve added a new page to the Portal, "Collaborate," that answers frequently asked questions about how to get involved. It includes links to our Data Submitter’s Guide and Data Transfer Agreement, gives an overview of the kinds of data we’re looking for, and tells you how to get in touch with our team.

The “Collaborate” page also links to information about funding opportunities offered by the Foundation for the NIH. Check this out if you’re interested in starting a new project to generate data for the Portal!

Monday, November 7, 2016

New MGH Cardiology and Metabolic Patient Cohort data in the T2D Knowledge Portal

We are pleased to announce a new data set in the T2D Knowledge Portal, from the MGH Cardiology and Metabolic Patient Cohort (CAMP). These data were contributed by Pfizer, Inc. as part of a public-private partnership to generate genotype data for a cardiometabolic and prediabetic cohort. This data set adds individual-level genetic association data for type 2 diabetes (T2D), fasting glucose levels, and fasting insulin levels from more than 3,500 samples to the Portal knowledgebase. Association data for additional phenotypes from this cohort will be incorporated in the future.

The inclusion of this data set in the T2D Knowledge Portal illustrates the uniqueness of the Accelerating Medicines Partnership, which brings together pharmaceutical companies and non-profit institutions with the goal of speeding up the discovery of new targets for treatment of T2D. The pharmaceutical partners in this collaboration have committed not only to providing funding, but also to sharing the data they generate. The CAMP data set contributed by Pfizer is the first set from a pharmaceutical partner to be made available in the Portal.

Another unique aspect of this data set is that it is the first to be included in the Portal with “Early Access Phase 1” status, which is assigned to new data. This status denotes that although analysis and quality control checks have been performed, the data are not yet considered to be in their final state. During the early access period, users may analyze the data but may not submit the results of these analyses for publication. Find the full details about the different phases of data release on our Policies page.

The CAMP cohort consists of 3,857 subjects who were recruited at the Massachusetts General Hospital Heart Center between 2008 and 2012. In addition to genotyping, the subjects had either vascular reactivity measurements (for T2D patients) or an oral glucose tolerance test (for patients not known to have T2D), and samples of their plasma and serum were analyzed. Most of the subjects were of European ancestry; about 10% were African American.

The analysis and quality control processes for this data set were performed by the Analysis Team of the Accelerating Medicines Partnership Data Coordinating Center (AMP-DCC) at the Broad Institute, and are completely transparent and fully documented. The experiment design and analysis are summarized on our Data page, and detailed reports are available for download. Going forward, all new data sets added to the Portal will be fully documented in this manner.

One intriguing—and somewhat puzzling—result from the analysis highlights the utility of incorporating data sets like this one into the Portal. The variant most strongly associated with T2D (at genome-wide significance) in this set is located in the major histocompatibility complex region near the HLA-C gene.

Known associations of genes in this region with type 1 diabetes, along with a high local recombination rate, make it challenging to interpret the meaning of this association. However, it certainly merits further investigation because of its genome-wide significance. The inclusion of this data set in the Portal, in the context of all other available data about T2D associations in the region, greatly facilitates the further analysis of this and other associations in the set.

The CAMP data may be accessed via multiple interfaces in the Portal. They are shown in tables of summary statistics and accessible in variant searches using the Variant Finder. Importantly, since the data are individual-level, samples may be filtered by various parameters and used for custom association analysis in our Genetic Association Interactive Tool (GAIT).

Find CAMP data at all of these locations in the Portal:

On Gene Pages (e.g.,  HLA-C) in the Variants & Associations table.
On Variant Pages (e.g., rs9468919) in the Associations at a glance section and in the Association statistics across traits table.
Via the Variant Finder tool, for the phenotypes T2D, fasting glucose, and fasting insulin.
Via the Genetic Association Interactive Tool (GAIT), which enables custom association analysis for either single variants (available on Variant Pages) or for the set of variants in and near a gene (Interactive burden test, available on Gene Pages).

Tuesday, November 1, 2016

View ASHG posters from the T2D Knowledge Portal team

Did you miss the American Society of Human Genetics 2016 Annual Meeting last month in Vancouver? Or did you attend, but weren’t able to get to our posters among the hundreds that were there? 

Now you can catch up on everything you missed from the Portal team. We’ve uploaded our posters to the open access publishing platform F1000Research, where you can view or download them. The Portal team presented four posters:

1. Automated, scalable quality control of heterogeneous exome sequence data. This poster presented by Ryan Koesterer, a member of the Analysis Team of the Accelerating Medicines Partnership Data Coordinating Center (AMP-DCC), describes a new, scalable method for quality control for exome sequence data, applied to data before they are incorporated into the Portal. 

Citation: Koesterer R, von Grotthuss M, Flannick J et al. Automated, scalable quality control of heterogeneous exome sequence data [v1; not peer reviewed]. F1000Research 2016, 5:2609 (poster) (doi: 10.7490/f1000research.1113354.1)

2. The Type 2 Diabetes Knowledge Portal: a paradigm for the democratization of human genetic information. This poster, from Portal content and community manager Maria Costanzo, presents an introduction to the Portal: its purpose, its content, and what kinds of questions it allows you to ask.

Citation: Costanzo MC and Accelerating Medicines Partnership: Type 2 Diabetes. The type 2 diabetes knowledge portal: a paradigm for the democratization of human genetic information [v1; not peer reviewed]. F1000Research 2016, 5:2607 (poster) (doi: 10.7490/f1000research.1113352.1)

3. A software platform facilitating community analyses of genetic datasets for complex disease. This poster from Benjamin Alexander, on the Portal software engineering team, describes the tools in the Portal that allow you to do both forward and reverse genetic analysis and even perform custom association analysis.

Citation: Alexander B, Duby M, Sanders M et al. A software platform facilitating community analyses of genetic datasets for complex disease [v1; not peer reviewed]. F1000Research 2016, 5:2608 (poster) (doi: 10.7490/f1000research.1113353.1)

4. Mapping variants to amino-acid changes in three-dimensional protein space improves aggregate association test power and suggests mechanisms of action. This poster, presented by Portal computational biologist Marcin von Grotthuss, illustrates a new method for evaluating the significance of variants by considering the protein structural context of the amino acids they encode. A long-term goal is to incorporate this analysis into the Portal.

von Grotthuss M, Florez JC, Flannick J et al. Mapping variants to amino-acid changes in three-dimensional protein space improves aggregate association test power and suggests mechanisms of action [v1; not peer reviewed]. F1000Research 2016, 5:2610 (poster) (doi: 10.7490/f1000research.1113355.1)

We hope you find these posters informative! Please let us know if you have any questions or suggestions.

Tuesday, October 25, 2016

Design your own association analysis with our Genetic Association Interactive Tool (GAIT)

Genetic association analysis—identifying polymorphisms in the human genome that are correlated with altered risk of disease—is a powerful method for discovering disease mechanisms. These polymorphisms can indicate what goes wrong at the cellular level in the disease process, knowledge that is critically important for developing better diagnostics and therapies.

The Type 2 Diabetes Knowledge Portal offers a wealth of pre-calculated information on genetic associations between variants and type 2 diabetes (T2D) or other related traits. These results are computed using broadly defined groups of samples: either an entire sample set from a project, or ancestry-specific cohorts. This approach, while it generates very valuable results, masks effects that could only be detected in even more narrowly defined groups: for example, individuals within a certain range of age, body mass index, or cholesterol level. 

Until now, analysis of such fine-grained subsets of individual-level data has only been possible for expert geneticists with access to protected data. But our new Genetic Association Interactive Tool (GAIT) offers everyone an unprecedented amount of access to individual-level data along with an easy-to-use interface for analyzing genetic associations using custom subsets of samples and variants.

Two versions of GAIT are available in the Portal. One, on Variant pages (see an example) computes association statistics for the single variant featured on that page. The other, accessible on Gene pages (see an example) powers an interactive burden test that considers the collection of variants in or near a gene, or a selected subset of those variants. 

Where to find GAIT on Gene pages (left) and Variant pages (right)

The GAIT interface offers incredible flexibility for designing custom analyses. In the interactive burden test, you can filter variants by their predicted effects, or pick and choose individual variants to include. When creating sample sets for either single-variant association analysis or a gene burden test, you can specify a gender, set ranges for the values for multiple phenotypes, and choose principal components or phenotypes to use as covariates. And all these parameters may be set differently for different ethnic groups.

The GAIT interface displays phenotype values within the sample set and allows you to filter samples by multiple criteria

Once you set parameters of your choice, GAIT computes associations on the fly, based on individual-level data. To protect patient confidentiality, GAIT will not display results from sample sets consisting of fewer than 100 individuals.

To help you get familiar with this versatile tool, we’ve created a User Guide (download PDF) that summarizes all the details of the interface. Please give GAIT a try and let us know what you think!

Tuesday, October 18, 2016

New and updated data in the T2D Knowledge Portal

As members of the T2D Knowledge Portal team arrive in Vancouver for the American Society of Human Genetics meeting, we are pleased to announce that we have added a new data set to the Portal and made extensive updates to existing data sets. 

The new data set, named “CAMP GWAS” in the Portal, comes from the MGH Cardiology and Metabolic Patient Cohort (CAMP). These data were contributed by Pfizer, Inc. as part of a public-private partnership to generate genotype data for a cardiometabolic and prediabetic cohort, and were analyzed by the Analysis Team of the Accelerating Medicines Partnership Data Coordinating Center (AMP-DCC) at the Broad Institute. The set adds individual-level genetic association data for type 2 diabetes (T2D), fasting glucose levels, and fasting insulin levels from nearly 3,500 samples to the Portal knowledgebase, and association data for more phenotypes will be added in the future.

CAMP data may be accessed on Gene and Variant pages in the Portal and via the Variant Finder, and may also be filtered and queried using the Genetic Association Interactive tool (GAIT).

Several other data sets in the Portal have been updated and improved:
  • The size of the CARDIoGRAM GWAS data set has nearly doubled, now consisting of 184,305 samples, and the data analysis has been updated.
  • The size of the CKDGen GWAS data set has also nearly doubled, to 133,814 samples; the data analysis has been updated; new subsets have been added that stratify serum creatinine associations by African American ancestry and stratify both serum creatinine and urinary albumin-to-creatinine ratio by the presence or absence of T2D.
  • The data set previously named “DIAGRAM GWAS” in the Portal has been updated and re-named “DIAGRAM Trans-ethnic meta-analysis;” its sample size has increased to 149,821. Several new subsets have been added, including gender-stratified, MetaboChip, and fine mapping data.
  • The GIANT GWAS data have been updated and European cohorts have been added for BMI and height traits.
  • The GLGC GWAS data set has increased in size to 188,577 samples and has been updated.
  • The number of samples in the MAGIC GWAS dataset has more than doubled, to 133,010; the data have been updated, and associations with 2 hour glucose, fasting glucose, and fasting insulin have been added for MetaboChip data.
Full details about all of these data sets are available on our Data page.

Because of compatibility issues with the updated data, we have temporarily removed the “GWAS results summary” section from Gene pages of the Portal. This feature will be restored within the next week.

As always with major updates, issues or bugs may have been introduced and we may not have found all of them during our routine testing. We encourage you to let us know of any problems that you encounter in using the Portal, and we welcome your questions and suggestions.

Friday, October 14, 2016

See you at ASHG 2016!

Members of the Type 2 Diabetes Knowledge Portal team will be attending the American Society of Human Genetics meeting next week in Vancouver, BC. You can catch us nearly every day of the meeting:

Tuesday 10/18

3 PM: Nöel Burtt will be one of the speakers in an informational session on the T2D Knowledge Portal and new funding opportunities offered by the Foundation for the NIH. Complimentary snacks, beer, and wine will be served! Please pre-register here.

Wednesday 10/19

10 AM - 4 PM: Find us in the exhibit hall at booth #428. We’ll be there to answer your questions and give tours and tutorials on the Portal.

Thursday 10/20

10 AM - 4 PM: We will again be in the exhibit hall at booth #428.

2 - 3 PM: Ryan Koesterer will present his poster on an automatic, scaleable quality control method for genetic association data that improves on current “gold-standard” methods (program #1943T).

2 - 3 PM: Maria Costanzo will present her poster giving an overview of data in the Portal and the global collaborative efforts behind its aggregation (program #329T).

Friday 10/21

10 AM - 4 PM: This is our last day in the exhibit hall at booth #428.

2 - 3 PM: Marcin von Grotthuss will present his poster on improving predictions of significant variants by taking protein structure into account (program #489F).

3 - 4 PM: Ben Alexander will present his poster on the software platform that powers the T2D Knowledge Portal user interface and custom analysis tools (program #1650F).

T2D Knowledge Portal staff attending ASHG

We look forward to meeting you at ASHG! If you have questions and cannot meet us any of these times, or if you won’t be at ASHG, our mailbox is always open at

Monday, October 3, 2016

Come to a T2D Knowledge Portal information session at ASHG

The American Society of Human Genetics meeting is happening in Vancouver, B.C. in a little over two weeks! The Portal team will be presenting and exhibiting at multiple venues at ASHG, and the first event will take place immediately before the conference starts: an information session including an overview of the Accelerating Medicines Partnership in Type 2 Diabetes, a progress update on T2D Knowledge Portal functionality, and information on new funding opportunities. Complimentary hors d'oeuvres, beer and wine will be served!

Information session
Tuesday - October 18, 2016
3:00 pm - 4:00 pm PDT
Fairmont Waterfront

900 Canada Place Way

Vancouver, British Columbia

Please register here for this free event, hosted by FNIH.  Contact Nicole Spear at with any questions.

Watch this space over the next two weeks for a complete listing of opportunities to learn about the Portal and talk with the Portal team at ASHG!

Thursday, September 15, 2016

New funding opportunities for T2D genetic research

The Foundation for the National Institutes of Health (FNIH) has released three new funding opportunities that aim to add to the growing body of data housed in the T2D Knowledge Portal. The new Request for Proposals (RFPs) solicit data on T2D related complications and individual level and whole exome sequencing data related to T2D.

FNIH awards will provide successful applicants with up to $200,000 per individual award for proposals to harmonize and transfer existing datasets and up to $500,000 per individual award for proposals that include the generation of new genotyping data. Awards will be made over two years and aim to enhance the NIH-funded T2D Knowledge Portal hosted by the Broad Institute at the Massachusetts Institute of Technology (MIT).

Responses to the FNIH Requests for Proposals are due by December 31, 2016. Details on the new funding opportunities can be found here

Wednesday, August 10, 2016

Insulin sensitivity comes into focus

Many different things can be seen in any landscape, depending on your focal point.
Image by Nicooo76 via Pixabay.
When photographing a landscape, different photographers choose different perspectives. Some capture a wide-angle view, while others focus on particular details.

It’s no different for researchers who use genome-wide association studies (GWAS) to investigate the genetic landscape of type 2 diabetes (T2D). A common perspective is to study the wide range of variants that are significantly associated with the presence of T2D in patients. But it can also be very informative to concentrate on individual traits related to the physiology of T2D. In a new paper in Diabetes, co-first authors Geoffrey Walford, Stefan Gustafsson, Denis Rybin, and fellow members of the Meta-Analyses of Glucose and Insulin-related traits Consortium (MAGIC) took this focused perspective to discover associations of genetic variants with insulin sensitivity.

Along with reduced insulin levels, the loss of insulin sensitivity (often termed insulin resistance) is a major hallmark of T2D. When muscle, liver, and fat cells become less able to respond to insulin, blood glucose levels rise. Since this can contribute to development of T2D and exacerbate its symptoms, knowing which genetic variants are associated with sensitivity to insulin could be informative for understanding pathways that contribute to T2D risk.

But insulin sensitivity is difficult to measure. Earlier GWAS have used simple estimates of insulin sensitivity, such as fasting levels of insulin, and have discovered a handful of genetic variants that influence insulin sensitivity. The “gold standard” test, the euglycemic clamp, involves giving patients continuous infusions of insulin and glucose and monitoring their blood glucose every few minutes. It’s expensive and time-consuming—not a test that is practical to perform on the tens of thousands of subjects that are commonly used in GWAS.

The authors wondered whether they could instead use an index that combines several measurements, each relatively easy to make. It’s an index with a long name: the modified Stumvoll Insulin Sensitivity Index (ISI). Developed by Stumvoll and colleagues in 2001, this index can be derived in a variety of ways. The authors chose the ISI requiring just three measurements: fasting insulin levels; glucose levels two hours after a glucose load; and insulin levels two hours after a glucose load. This ISI is as good as or better than other estimates of insulin sensitivity and correlates well with the euglycemic clamp.

So the researchers looked for variants associated with the Stumvoll ISI in nearly 17,000 participants in the discovery phase of the work. They added another 13,300 in the replication phase, adding up to about 30,000 in the combined meta-analysis. Since obesity, measured by body mass index (BMI), can affect insulin sensitivity, the authors added BMI to some of their statistical models.

First, the authors found associations between the ISI and other variants already known to affect simple measures of insulin sensitivity. This provided reassurance that the ISI was properly detecting genetic influences on insulin sensitivity. After discovery, replication, and meta-analysis, two novel genetic variants were associated with ISI at genome-wide significance (P-value < 5.0 ×10-8) in a model that tested the effect of the variant, age, sex, and the interaction between the variant and BMI: variant rs12454712, near the gene BCL2, and variant rs10506418, near the gene FAM19A2.

How might these variants affect insulin sensitivity? There’s a lot more work to be done before that question can be answered. Additional studies will need to clarify whether these variants, which are near BCL2 and FAM19A2, affect these or other genes, and then how these variants actually cause changes in insulin sensitivity. 

There are some clues already in the published literature. The variant rs12454712 near BCL2 has previously been found to be associated with T2D, supporting the hypothesis that this region of the genome contributes to T2D risk through reducing insulin sensitivity. And the gene itself (BCL2) has already been implicated in glycemic metabolism: inhibiting bcl2 improves glucose tolerance in a mouse model, while a drug that inhibits the protein product of the gene (BCL2) increases blood glucose levels in certain chronic lymphocytic leukemia patients. So there’s even more reason to suspect that the rs12454712 variant might affect insulin sensitivity via BCL2.

There is as yet no evidence linking the protein FAM19A2 function to glucose metabolism, so the jury is out on whether the variant rs10506418 affects FAM19A2 or some other nearby gene. 

By focusing on a detail of the T2D-related genetic landscape, this study has teased out two variants that may give us clues about the physiology of insulin sensitivity and the development of T2D. And that’s a valuable addition to our overall picture of T2D genetics!

Monday, July 11, 2016

World-wide cooperation to address a world-wide problem

If you’re reading this post, you’re likely well aware that type 2 diabetes (T2D) is one of the biggest health problems we face and that its incidence is rising. Clearly, we need a better understanding of how T2D develops and what the risk factors are, along with more effective treatments.

Along with environmental and behavioral factors, variation in the human genome plays an important role in susceptibility to T2D. Mutations that alter gene expression or affect the function of proteins and noncoding RNAs can lead to differences in physiology and, ultimately, to differences in T2D risk. To begin to understand this, we first need to know which variants contribute to T2D and by how much. And for that, we need genetic association data—lots of it. Large amounts of data allow us to refine the genetic association map: reconfirming some previous signals, establishing that others are not significant, and adding evidence for or against the causal roles of variants.

Addressing this need, a study published today in Nature (Fuchsberger, Flannick, Teslovich, Mahajan, Agarwala, Gaulton et al.) presents the results of an international collaboration that has generated an unprecedented amount of T2D genetic data. As befits an approach to a huge problem, everything about this study is huge: the number of collaborators (more than 300, from 22 countries), the number of individual genomes sampled (120,000), the number of variants analyzed (tens of millions); and the number of funding organizations (more than 60). The result is the most comprehensive look at the genetics of T2D available to date.

One of the major projects described in the paper, led by the Genetics of Type 2 Diabetes (GoT2D) Consortium, was whole-genome sequencing for 2,657 people, half T2D cases and half controls. Whole-genome sequence analysis is the only way in which the influence of rare variants can be assessed comprehensively.

An open question in the T2D genetics community has been whether rare variants account for most of the T2D risk, or whether it is due to the effects of many common variants of small effect. This study begins to answer this question. It shows that most T2D risk can be ascribed to the modest effects of a large number of common alleles, and that there is likely no treasure trove of rare variants of large effect waiting to be found.
This project uncovered more than a dozen loci that were associated with T2D at genome-wide significance. Most were common variants, and some, such as the variant rs11759026 near CENPW, had not been seen before in genome-wide association studies. This study also called into question the previously identified associations of some variants and supplied better candidates for the actual T2D risk variant. For example, the noncoding variant rs10401969 had been associated with the CILP2 locus, but the additional data from this project now point to a linked missense variant in TM6SF2 as causal—an exciting finding, since TM6SF2 is involved in fat metabolism and could have a direct role in the development of T2D.
In another project reported by Fuchsberger and colleagues, combining exome sequence data from the T2D-GENES (Type 2 Diabetes Genetic Exploration by Next-generation sequencing in multi-Ethnic Samples) Consortium with the exome sequences obtained by the GoT2D project resulted in a data set of sequences from nearly 13,000 individuals, from five different ethnic groups.   Data sets stratified by different ancestries allow investigation of population-specific associations that might otherwise be obscured. The larger sample size and the focus on coding variation, with presumably larger effects on protein function, was another approach to maximize discovery of rare variants if such were present. Another benefit was to help implicate specific genes in previously associated genomic regions.
One variant identified by this approach has an immediately understandable relationship to diabetes: the rs2233580 variant causes a missense mutation in the PAX4 gene, which encodes a transcription factor that has been implicated in pancreatic islet differentiation. Interestingly, this is a common variant in East Asian populations but is nearly absent in the other ancestries studied. Other variants in the same gene have previously been associated with early-onset monogenic diabetes, so this result is a reminder that different mutations in same gene can have very different effects on the disease process. Other work in this study reaffirmed this conclusion for other genes.
The scale of this study is unprecedented, and we’ve only touched upon a small piece of it here. But something else is unprecedented about these data: they are available for anyone to explore, right now, in the T2D Knowledge Portal. Researchers don’t need to go to various sites to gather bits and pieces of the data, harmonize them, and analyze them; the data sets are globally accessible in the Portal along with pre-computed analyses and sophisticated tools for custom analyses.
The data sets from this study in the Portal are:

  • GoT2D WGS - whole-genome sequence data
  • GoT2D WGS + replication – whole-genome sequence data plus imputed genotypes
  • 13K exome sequence analysis
  • 82K exome chip analysis

All of these are described in more detail on our Data page. You can see a list of the cohorts and even view their case/control selection criteria. Our Variant Finder tool may be applied to all of these sets, and the Genetic Association Interactive Tool (GAIT) accesses the 17K exome sequence analysis data set that includes the 13K exome sequence analysis data from this study along with additional data from the SIGMA Consortium, previously published by Estrada et al. in JAMA. You’ll also see results from these data sets in various tables and displays on the Gene and Variant pages of the Portal.

In a review article that was also published today in Nature Reviews Genetics, Flannick and Florez advocate for the aggregation of genetic data in general, and the T2D Knowledge Portal in particular, as a way to democratize the study of T2D and accelerate discoveries that will improve patient care.

“Data from human genetics is highly valuable in identifying and validating the role of specific targets for development of new medicines,” said David Altshuler, who was previously the principal investigator at Broad for the T2D genetics studies and Portal at Broad, and is now Chief Scientific Officer at Vertex Pharmaceuticals.  “When government, non-profits and companies work together with patients to increase our knowledge of the genetic causes of disease, everyone benefits.”  

The Accelerating Medicines Partnership in Type 2 Diabetes funds the T2D Knowledge Portal as a means to facilitate collaboration, with the goal of benefitting patients with T2D world-wide. “Whether you are a biologist exploring a specific pathway in a model system, a pharmaceutical investigator examining an appealing drug target, or a clinician pondering whether a newly identified variant is the cause of a patient’s symptoms, having well curated human genetic data matched to carefully defined phenotypes at your fingertips should provide rapid insight and accelerate discovery,” said Jose Florez, the Chief of the Diabetes Unit at the Massachusetts General Hospital and a human geneticist at the Broad Institute, who leads one of the groups developing the Knowledge Portal. The deposition of the huge data sets from the Fuchsberger et al. study into the Portal has demonstrated that the processes in place for data intake, harmonization, and quality control are functional and can work at scale. We hope that other researchers and consortia will follow suit and help to make the Portal an even more powerful catalyst for new insights into T2D.

Monday, June 20, 2016

Report from New Orleans: 76th Scientific Sessions of the American Diabetes Association

Members of the T2D Knowledge Portal team braved extreme heat and humidity, as well as icy air conditioning, to attend the American Diabetes Association conference in New Orleans, LA. Our booth in the conference exhibit hall was a great way to interact personally with conference attendees and showcase the Portal. Many genetics researchers stopped by for one-on-one tutorials on our new tools and features. And clinicians and diabetes patients, even if they had no immediate use for genetic information, were happy to hear the goals of the project—to accelerate the identification of genes involved in T2D and, ultimately, to find new treatments and better understand the disease mechanism. 

We were pleased to welcome some special visitors to our booth: National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK) Director Griffin Rodgers and Deputy Division Director Philip Smith. NIDDK is a major supporter of the T2D Knowledge Portal project.

Drs. Philip Smith (left) and Griffin Rodgers visit the Portal booth

Dr. Smith also made an video statement as part of the media coverage at ADA, eloquently explaining the rationale behind the Portal and the needs that it can address.

If you missed us at ADA, come visit us at our booth at the American Society for Human Genetics meeting next October! And if you can’t meet us in person, please feel free to email us at any time. We’re happy to answer questions or provide help in understanding the Portal data and tools.

Type 2 Diabetes Knowledge Portal team at ADA

Friday, June 10, 2016

Come meet the Portal team at ADA, booth #1762!

Today’s news comes to you from the Big Easy—New Orleans, LA, where the 76th Scientific Sessions of the American Diabetes Association are in full swing this weekend. Members of the Knowledge Portal team have traveled here to talk to researchers about how the Portal can become even more useful in helping to generate hypotheses that spark insights into the mechanism of T2D and the development of new therapies. Starting at 10am on Saturday June 11, we’ll be at booth #1762 in the exhibit hall, ready to hear your suggestions and give you an individual tutorial on the Portal’s tools and features. There just might be a gift waiting for you, too!

We’ve been working hard and we have an incredible number of new features to show off at #2016ADA. We’ll be featuring them individually in this space in the coming weeks, with in-depth explanation of each. To list some of the highlights:

  • a collaborative project between software engineers at the University of Michigan and the Broad Institute has come to fruition with the integration of LocusZoom into the Portal. This interactive visualization looks, superficially, like a Manhattan plot—but it’s so much more. It shows the significance of variant associations with any of several phenotypes and also displays linkage disequilibrium among nearby variants, and you can choose to do conditional analysis based on any variant.
  • engineers at the Broad Institute have developed a completely new tool, called Genetic Association Interactive Tool (GAIT), that offers a multitude of options allowing you to compute custom association statistics for a variant. You can specify the phenotype to test for association, stratify samples by ancestry, choose a subset of samples to analyze based on specific phenotypic criteria, and control for specific covariates. 
  • we’ve also redesigned and augmented many of the displays of pre-computed information that are available in the Portal
  • finally, we’ve added a lot of new, informative content: a Data page with a complete description of each data set in the Portal, more background about the AMP-T2D project that supports the Portal, and more help text to guide you as you use the Portal’s interfaces

Come to the booth and let us give you a tour of these new features—or, if you're not at ADA, take a look and let us know what you think. And take a look at this great press release from NIH about the project!

Wednesday, May 18, 2016

Expanding the landscape of human genetic variation data in the Type 2 Diabetes Knowledge Portal

With the addition of four new sequence data sets to our database, the number of variants and associations accessible via the Portal pages and tools has increased by millions.

Two of the new data sets are from projects that have obtained sequence data from a wide range of individuals. The ExAC data set, comprising exome sequences collected and harmonized by the Exome Aggregation Consortium, includes sequence data from 60,706 unrelated people of multiple ancestries. The 1000 Genomes data set, from the International Genome Sample Resource project (IGSR), is composed of whole-genome sequences from 2,504 individuals in four different ethnic groups. 

The allele frequencies of variants in the different ethnic groups surveyed in the 1000 Genomes data set can be seen in the “How common is…?” section on the Variant pages (view an example). And both the ExAC and 1000 Genomes data sets can be queried using the Variant Finder tool. You can select them via a new tab on the interface, “Additional search options”, where you can choose these data sets and also add more criteria to your search. 

The Data set pull-down menu on the "Additional Search Options" tab of the Variant Finder lets you specify 1000 Genomes or ExAC data.

Available selections in the Data set pull-down menu.

The other two new data sets in the Portal were both generated by the GoT2D consortium. A whole-genome sequence data set (GoT2D WGS) adds data from 2,657 individuals, including the associations of noncoding variants that were not present in the previous whole-exome sequence data set from the GoT2D project. This new data set brings T2D association data across 30 million variants to the Portal. The GoT2D WGS + replication data set adds imputation to that set, bringing the sample size to over 47,000 and including most low-frequency and common variants.  

The new GoT2D data can be seen in multiple sections of the Portal’s Gene and Variant pages, and may also be accessed by selecting these data sets in the Variant Finder.

In addition to these major new additions, today’s release of data also includes some bug fixes and data harmonization.

Get out there and explore the new data landscape in the Portal, and let us know what you think!

Monday, May 9, 2016

Better summaries of variant information convey the most important information at a glance

We’ve made significant improvements to the information we display on the Variant pages of the T2D Knowledge Portal. The summary at the top of each Variant page (view an example) now shows the reference nucleotide and the variant nucleotide at that position. Transcripts covering the variant are listed, along with several important details for each transcript: the change caused by the variant in the encoded protein sequence (if applicable); the Sequence Ontology term describing the consequence of the variation (for example, “missense variant”); and the expected effect of the variant on protein function, as predicted by the PolyPhen and Sift algorithms.

Summary section of the Variant page

Just below the summary on the Variant page, we’ve also improved the graphic showing the association of the variant with T2D and related traits. We’ve re-named this section “associations at a glance” because it immediately shows the most important information about these associations. 

At-a-glance section of the Variant page. Click the image to view a larger version.

The boxes in this graphic represent the associations of this variant with T2D (at the top) and with other traits (below, in an expandable section). Under the hood, the software is now pulling up information more quickly so that the display is more responsive. We’ve also made it more pleasant to look at, tidying up the shape of the boxes and the alignment of the information they contain.

But beyond the style improvements, we’ve added a lot of substance. Where available, each association now includes the odds ratio (for dichotomous traits) or the effect size (for continuous traits) and the direction of effect. Positive effects are shown in blue, and negative effects in purple. 

We’ve also added the sample size, in black text in the bottom left corner of the box, for each data set. This indicates the total number of individuals involved in the study. And if available, the frequency and count of the variant in the data set are shown in red and blue text at the bottom middle and bottom right corner of the box, respectively. The count indicates the number of haplotypes in the set that contain the variant, while the frequency indicates the occurrence of the variant allele in the sampled population.

This additional information can help you evaluate the significance of associations. The sample size and variant count determine the power of the data set to establish the association. The higher the power, the more accurate the estimate of the variant’s effect.

Finally, when a variant is associated with other traits in addition to T2D, those traits in the same category are labeled with the same color. For example, in the display above, proinsulin levels, fasting glucose, HOMA-B, and two-hour glucose—all glycemic phenotypes—are labeled in orange, while triglycerides, LDL cholesterol, and cholesterol—lipid phenotypes—are labeled in red. This lets you see easily when a variant is linked to multiple traits that could reflect a common process or pathway, possibly offering a clue to the mechanism by which it affects physiology.

So this improved graphic now gives you an idea, literally at a single glance, of how strongly a variant is associated with T2D, how significant that association is, and whether it is also associated with other traits. 

We made these improvements in response to suggestions from scientists who use the T2D Knowledge Portal. We hope to hear your feedback too!

Friday, May 6, 2016

T2D Knowledge Portal in the news

The poster that we presented at the Biocuration 2016 conference was selected by F1000Research as the featured poster or slide of the month! As an organization promoting open access to publications and data, they were particularly interested in the challenge we face at the Portal in designing tools that allow researchers to gain valuable insights from the data while still protecting confidential patient information. Read their take on it in their blog post.

Thursday, April 28, 2016

Variant Finder results may be saved, shared, and bookmarked

You may have noticed that our Variant Finder tool has a cleaner look and clearer instructions. But did you know that you can also save your search parameters, to re-create your search later or share it with a colleague?

First, construct your search. Here’s an example:

Click the image to view a larger version

After you click “Submit search request” you’ll be taken to the results page:

Click the image to view a larger version

And here’s the URL of the results page for this example search:

It isn’t pretty, but it encodes the search. You can bookmark it, save it, or email it and you’ll get back the same result next time you enter it in a browser.

There’s one small caveat here. On the results page, you can modify the results table by clicking on the + signs in the table header to see options for adding more data to the table. But if you do this, those changes will not be encoded in the URL (we plan to enable this in the future); only the original search is encoded.

Let us know how you like this feature and what other features might be useful to you. And check out our mini-tutorial on the Variant Finder to see full instructions on how to use this tool.