Wednesday, May 18, 2016

Expanding the landscape of human genetic variation data in the Type 2 Diabetes Knowledge Portal

With the addition of four new sequence data sets to our database, the number of variants and associations accessible via the Portal pages and tools has increased by millions.

Two of the new data sets are from projects that have obtained sequence data from a wide range of individuals. The ExAC data set, comprising exome sequences collected and harmonized by the Exome Aggregation Consortium, includes sequence data from 60,706 unrelated people of multiple ancestries. The 1000 Genomes data set, from the International Genome Sample Resource project (IGSR), is composed of whole-genome sequences from 2,504 individuals in four different ethnic groups. 


The allele frequencies of variants in the different ethnic groups surveyed in the 1000 Genomes data set can be seen in the “How common is…?” section on the Variant pages (view an example). And both the ExAC and 1000 Genomes data sets can be queried using the Variant Finder tool. You can select them via a new tab on the interface, “Additional search options”, where you can choose these data sets and also add more criteria to your search. 

The Data set pull-down menu on the "Additional Search Options" tab of the Variant Finder lets you specify 1000 Genomes or ExAC data.

Available selections in the Data set pull-down menu.


The other two new data sets in the Portal were both generated by the GoT2D consortium. A whole-genome sequence data set (GoT2D WGS) adds data from 2,657 individuals, including the associations of noncoding variants that were not present in the previous whole-exome sequence data set from the GoT2D project. This new data set brings T2D association data across 30 million variants to the Portal. The GoT2D WGS + replication data set adds imputation to that set, bringing the sample size to over 47,000 and including most low-frequency and common variants.  

The new GoT2D data can be seen in multiple sections of the Portal’s Gene and Variant pages, and may also be accessed by selecting these data sets in the Variant Finder.

In addition to these major new additions, today’s release of data also includes some bug fixes and data harmonization.

Get out there and explore the new data landscape in the Portal, and let us know what you think!

Monday, May 9, 2016

Better summaries of variant information convey the most important information at a glance

We’ve made significant improvements to the information we display on the Variant pages of the T2D Knowledge Portal. The summary at the top of each Variant page (view an example) now shows the reference nucleotide and the variant nucleotide at that position. Transcripts covering the variant are listed, along with several important details for each transcript: the change caused by the variant in the encoded protein sequence (if applicable); the Sequence Ontology term describing the consequence of the variation (for example, “missense variant”); and the expected effect of the variant on protein function, as predicted by the PolyPhen and Sift algorithms.


Summary section of the Variant page

Just below the summary on the Variant page, we’ve also improved the graphic showing the association of the variant with T2D and related traits. We’ve re-named this section “associations at a glance” because it immediately shows the most important information about these associations. 


At-a-glance section of the Variant page. Click the image to view a larger version.


The boxes in this graphic represent the associations of this variant with T2D (at the top) and with other traits (below, in an expandable section). Under the hood, the software is now pulling up information more quickly so that the display is more responsive. We’ve also made it more pleasant to look at, tidying up the shape of the boxes and the alignment of the information they contain.

But beyond the style improvements, we’ve added a lot of substance. Where available, each association now includes the odds ratio (for dichotomous traits) or the effect size (for continuous traits) and the direction of effect. Positive effects are shown in blue, and negative effects in purple. 

We’ve also added the sample size, in black text in the bottom left corner of the box, for each data set. This indicates the total number of individuals involved in the study. And if available, the frequency and count of the variant in the data set are shown in red and blue text at the bottom middle and bottom right corner of the box, respectively. The count indicates the number of haplotypes in the set that contain the variant, while the frequency indicates the occurrence of the variant allele in the sampled population.

This additional information can help you evaluate the significance of associations. The sample size and variant count determine the power of the data set to establish the association. The higher the power, the more accurate the estimate of the variant’s effect.

Finally, when a variant is associated with other traits in addition to T2D, those traits in the same category are labeled with the same color. For example, in the display above, proinsulin levels, fasting glucose, HOMA-B, and two-hour glucose—all glycemic phenotypes—are labeled in orange, while triglycerides, LDL cholesterol, and cholesterol—lipid phenotypes—are labeled in red. This lets you see easily when a variant is linked to multiple traits that could reflect a common process or pathway, possibly offering a clue to the mechanism by which it affects physiology.

So this improved graphic now gives you an idea, literally at a single glance, of how strongly a variant is associated with T2D, how significant that association is, and whether it is also associated with other traits. 

We made these improvements in response to suggestions from scientists who use the T2D Knowledge Portal. We hope to hear your feedback too!

Friday, May 6, 2016

T2D Knowledge Portal in the news

The poster that we presented at the Biocuration 2016 conference was selected by F1000Research as the featured poster or slide of the month! As an organization promoting open access to publications and data, they were particularly interested in the challenge we face at the Portal in designing tools that allow researchers to gain valuable insights from the data while still protecting confidential patient information. Read their take on it in their blog post.


Thursday, April 28, 2016

Variant Finder results may be saved, shared, and bookmarked

You may have noticed that our Variant Finder tool has a cleaner look and clearer instructions. But did you know that you can also save your search parameters, to re-create your search later or share it with a colleague?

First, construct your search. Here’s an example:

Click the image to view a larger version

After you click “Submit search request” you’ll be taken to the results page:

Click the image to view a larger version


And here’s the URL of the results page for this example search:


It isn’t pretty, but it encodes the search. You can bookmark it, save it, or email it and you’ll get back the same result next time you enter it in a browser.

There’s one small caveat here. On the results page, you can modify the results table by clicking on the + signs in the table header to see options for adding more data to the table. But if you do this, those changes will not be encoded in the URL (we plan to enable this in the future); only the original search is encoded.

Let us know how you like this feature and what other features might be useful to you. And check out our mini-tutorial on the Variant Finder to see full instructions on how to use this tool. 

Thursday, April 21, 2016

Type 2 Diabetes Knowledge Portal represented at Biocuration 2016 conference

Last week, the International Society for Biocuration held its 9th annual conference in Geneva, Switzerland. You might ask, “What is biocuration, anyway?” In a nutshell, it’s all about organizing biological data and making it accessible and understandable. It can be as small-scale as capturing the fine details about the function and role of a particular protein, or as large-scale as designing interfaces to analyze and explore genomes or huge genomic data sets. (See this article if you’re interested in the nitty-gritty details about what it’s like to be a biocurator.) The conference covered major topics in biocuration such as the visualization and integration of data, controlled vocabularies and ontologies, functional annotation, community curation, text mining, and more.

Our Manager of Content and Community attended the conference, since many of these issues are relevant to the T2D Knowledge Portal. As we tackle a relatively new challenge in biocuration—the integration of human genetic association data sets—it’s important for our project to be part of the biocuration community, to get feedback and become aware of others’ work in this area. And as we consider adding more biological information about human genes to the Portal, it’s important that whatever we do is consistent with ongoing efforts in the biocuration of human genes; we don’t want to reinvent the wheel or duplicate work.

Besides getting to attend a fascinating and energizing conference in a beautiful setting, the icing on the cake was that our poster on the Portal received one of five “Best poster” awards! We’re honored and pleased that our project had such a warm welcome into the biocuration community.


View or download original

Friday, April 1, 2016

It’s no April Fool’s joke: we’ve rolled out big improvements to the T2D Knowledge Portal today

The first thing you’ll see when you visit our home page is that it has a fresh new look. We’ve refined our mission statement and clarified the other text, and added a “What’s new” section featuring our latest news items. There are also new links to sign up for our email list (more on that below) and to see our Twitter feed.



Another major change is that we’ve redesigned the interface to our Variant Finder tool to make it much more user-friendly.  We even gave it a shorter name that's easier to remember! This tool lets you build simple or complex queries to retrieve sets of variants that meet your custom criteria. You can specify association with any of 25 phenotypes, significance, genomic location, effect on the encoded protein, and much more. For some extra help with this tool, we’ve created a tutorial (download PDF) that leads you step by step through the interface.


Finally, we’re reaching out to you, the Portal user, in a variety of ways. If you sign up for email updates, we’ll notify you when new features and new data are added. You can also follow us on Twitter, and join our LinkedIn group where you can ask questions about the Portal or suggest new features. And as always, you can contact us anytime at help@type2diabetesgenetics.org - no fooling!

Tuesday, March 29, 2016

New graphics and table summarize variant associations at a glance

Our variant information pages now contain two new sections that make it easy to see quickly whether a variant is associated with type 2 diabetes or related traits, and just how significant those associations are.

At the top of the Variant page for one particular variant, the section titled “Is (variant name) associated with disease?” opens to show the associations of that variant with T2D in all datasets that are currently available via the Portal (view an example). Click the link “expand associations for all traits” to see significant associations with other T2D-related traits.

Each box represents an association between this variant and a trait as detected in one data set, and the color of the box indicates the significance of the association. Dark green shows genome-wide significance (p-value < 5 x 10e-8); medium green shows locus-wide significance (p-value < 5 x 10e-4); and light green denotes nominal significance (p-value < 0.05). Associations that do not meet the threshold for significance are shown in a white box.


These new graphics make it easy to see quickly that the variant rs13266634 is strongly linked to T2D, fasting glucose levels, and proinsulin levels.

Just below this section, the “Association statistics across traits” table gives complete details about the associations between the variant and multiple traits. The same shades of green show the most significant associations.


In this table with more details about the associations of this variant, the consistent color scheme highlights significance levels.

Information shown in this table for the variant-trait associations may include p-value, direction of effect, odds ratio, minor allele frequency, and effect size. The table can be sorted by trait name. Where a variant-trait association was detected in more than one study, the most significant result is shown; plus signs allow you to expand the table and view results from additional studies. Some datasets can also be expanded to show associations in different ancestries or cohorts.

We’re still developing these new features, and your feedback could help us make them even better. Please explore them and let us know what you think!