Wednesday, March 15, 2017

The Portal’s interactive burden test: now more versatile than ever

Significant associations between genes and T2D or related phenotypes can provide powerful insights into disease mechanisms and possible therapies. The T2D Knowledge Portal includes results from pre-computed analyses of genetic associations for a large, and growing, number of datasets. But what if you want to do a more fine-grained analysis? You might want to test whether the disease burden for a gene differs between groups of people with specific characteristics—for example, lean people with T2D versus obese people without T2D. Or you might want to test the aggregate effect of a specific subset of variants, such as those that are likely to knock out the function of a protein of interest.

Our interactive burden test on Gene pages, powered by the Genetic Association Analysis Tool (GAIT), allows you to do all that and more. The burden test considers a gene as the unit of inquiry, including all the variants it contains in a statistical test of disease association. We described the basics of the burden test and GAIT in a recent blog post. Now, we’ve added some options for selecting variants in the interactive burden test that make this tool even more versatile.

The variant selection step of the burden test on a Gene page is pre-populated with all of the variants present in the selected dataset that are located within the gene and its 100 kb up- and downstream flanking regions. You can create a specific subset of these by checking or un-checking individual variants. The table may be sorted by multiple criteria in order to find variants of interest: chromosomal coordinate; minor allele count; predictions of the effect allele’s impact on the encoded protein; and the protein change or type of mutation caused by the effect allele.


Section of the interactive burden test interface showing the default list of variants for the SLC30A8 gene. Options for customizing the list are located above the variant table.

The table of variants may be filtered so that the test considers only certain categories of variants, with varying predicted impacts on the encoded protein. Previously, the burden test offered filters based on an unpublished method. Now, we have replaced those filters with the set that was used in a recent major publication: The genetic architecture of type 2 diabetes, by Fuchsberger, Flannick, Teslovich, Mahajan, Agarwala, Gaulton, et al.

Variant filters in the interactive burden test

All coding variants--selects variants within the coding sequence, from the dataset that was initially selected for the burden test

Protein-truncating + missense with MAF<1%--selects variants in both of these categories:
  • protein-truncating (predicted to cause a truncated protein to be generated, either by creating a premature stop codon or by causing a frameshift) 
  • cause a missense mutation AND have minor allele frequency (MAF) of less than 1%. The MAF limit eliminates common variants, which would not be expected to have very deleterious effects. 

Protein-truncating + possibly deleterious missense with MAF<1%--selects variants in both of these categories:

Protein-truncating + probably deleterious missense--selects variants in both of these categories:

Protein-truncating only--selects variants predicted to cause a truncated protein to be generated, either by creating a premature stop codon or by causing a frameshift.

Using these filters, you can tailor the list of variants to those with specific impact on the encoded protein. If you would like to customize the list even further by adding variants that were not present in the default list, there is now an option to add single or multiple variants, using dbSNP IDs (e.g., rs112881768) or identifiers in the format “chromosome_coordinate_reference-nucleotide_variant-nucleotide” (e.g., 8_112881768_G_A).

When “single variant” is selected, once you begin typing, variant IDs that match your entry are suggested. When “multiple” is selected, you may type or paste in a list of variant IDs, separated by commas or returns. Note that any added variants are not subject to the filters, which act only on the default list of variants for a gene.

Our GAIT User Guide (download PDF) that summarizes all the details of the interface has been updated with the latest changes. Please check out our new, improved interactive burden test and let us know if you have comments or suggestions.

Sunday, February 5, 2017

Introductory guide to genetic association analysis now available

P-values. Odds scores and betas. GWAS. Linkage disequilibrium. What does it all mean?

Human geneticists are, of course, intimately familiar with these concepts. But for people who are not human geneticists, just getting past the terminology can be frustrating. So we’ve written a basic primer and reference guide that can help users of the T2D Knowledge Portal understand the information presented in our interfaces and tools.

Our Introduction to genetic association analysis guide is available from our Resources page. Or download it here (PDF).

This guide provides a basic introduction to the rationale behind applying human genetic association studies to complex diseases like T2D, explains some of the parameters of genetic associations such as p-values and odds ratios, and describes the different types of experiment used to determine genetic associations.

Many thanks to Andrew Morris, University of Oxford, for his thoughtful review and helpful comments on this guide.

We would be happy to hear your suggestions for improvements and additions!

Monday, January 23, 2017

Insulin Sensitivity Index data added to the Portal

The loss of sensitivity to insulin, often termed insulin resistance, is characteristic of type 2 diabetes. Since this sensitivity is difficult to measure directly, researchers have developed an index that reflects it: the modified Stumvoll Insulin Sensitivity Index (ISI). The index is derived by a formula that combines fasting insulin levels with glucose and insulin levels measured two hours after a glucose load.

Now, the results of a study of genetic associations of variants with ISI are available in the T2D Knowledge Portal. These results are from a recent paper in Diabetes by co-first authors Geoffrey Walford, Stefan Gustafsson, Denis Rybin, and fellow members of the Meta-Analyses of Glucose and Insulin-related traits Consortium (MAGIC). (For an overview of the results, see our blog post about the paper.)

In this study, ISI was calculated for 16,753 non-diabetic individuals, and associations of their variants with ISI values were analyzed. The associations were adjusted in one of three ways: for age and sex; for age, sex, and body mass index (BMI); or according to a model that analyzed the combined influence of the genotype effect adjusted for BMI and the interaction effect between the genotype and BMI on ISI. More details about this data set and others from MAGIC may be found on our Data page.

ISI associations are a subset of the MAGIC GWAS data set. They may be viewed in the Portal by selecting one of these phenotypes:
  • ISI adjusted for age-sex
  • ISI adjusted for age-sex-BMI
  • ISI adjusted for genotype-BMI interaction
Associations with these phenotypes can be found in these locations on Portal pages:
  • On Gene Pages (see an example) in the Variants & Associations table
  • On Variant Pages (see an example) in the Associations at a glance section and in the Association statistics across traits table
  • Via the Variant Finder tool, for the phenotypes listed above
  • A "Manhattan plot" of associations across the genome may be seen by selecting one of the phenotypes listed above in the View full genetic association results for a phenotype scroll box on the Portal home page.

Thursday, January 19, 2017

CAMP GWAS data set moves to Early Access Phase 2

Three months ago, we incorporated a data set from the MGH Cardiology and Metabolic Patient Cohort (CAMP) into the T2D Knowledge Portal. These data were contributed by Pfizer, Inc. as part of a public-private partnership to generate genotype data for a cardiometabolic and prediabetic cohort; they add individual-level genetic association data for type 2 diabetes (T2D), fasting glucose levels, and fasting insulin levels from more than 3,500 samples to the Portal knowledgebase. Now, the CAMP GWAS data set has transitioned to Early Access Phase 2 status in the Portal.

The CAMP GWAS data set was the first to be included in the Portal with “Early Access” status, which is assigned to new data. As described on our Policies page, all newly added data sets have Early Access status for the first six months that they are in the Portal. In the first three months, Phase 1 of the Early Access period, the data have undergone quality control checks but they are not considered to be in their final form. The purpose of Phase 1 is to allow Portal users to review and analyze the data in order to identify any potential problems or areas needing further analysis. After this three-month period, data sets move to Phase 2, indicating that the data are in final form and are fully integrated into the Portal.

Portal users must not submit manuscripts concerning new data until both Phase 1 and Phase 2 of the Early Access period have passed, and any results of analyses or proposed publications are subject to the "Fort Lauderdale Principles" articulated for the sharing of genomic data.

In three months, the CAMP GWAS data set will become Open Access, meaning that it may be freely used for research as long as Portal users comply with our guidelines on user responsibilities and proper citation. It is important to note that in order to protect patient privacy, individual-level data in the Portal are never directly accessible to users. Rather, the Portal makes available summary statistics derived from the data, and also provides tools (such as the Genetic Association Interactive Tool (GAIT) and the Interactive Burden Test) that allow users to perform custom analyses based on individual-level data while protecting the security and privacy of those data.

Find CAMP data at all of these locations in the Portal:

  • On Gene Pages (e.g.,  HLA-C) in the Variants & Associations table.
  • On Variant Pages (e.g., rs9468919) in the Associations at a glance section and in the Association statistics across traits table.
  • Via the Variant Finder tool, for the phenotypes T2D, fasting glucose, and fasting insulin.
  • Via the Genetic Association Interactive Tool (GAIT), which enables custom association analysis for either single variants (available on Variant Pages) or for the set of variants in and near a gene (Interactive burden test, available on Gene Pages).
  • A "Manhattan plot" of genetic associations across the genome may be accessed by selecting the phenotype T2D, fasting glucose, or fasting insulin in the "View full genetic association results for a phenotype" selection box on our home page, and then choosing the CAMP GWAS data set.

Find many more details about the CAMP GWAS data set on our Data page, or read a summary in this blog post.

Tuesday, January 17, 2017

New Year, New Data: BioMe AMP T2D GWAS

We’re happy to announce the first addition of data to the Type 2 Diabetes Knowledge Portal in 2017: the BioMe AMP T2D GWAS data set. The generation of these data was funded by the Accelerating Medicines Partnership in Type 2 Diabetes (AMP T2D), a collaboration between multiple stakeholders that aims to catalyze the clinical translation of genetic discoveries by producing and aggregating data, developing and implementing novel analytical methods and tools, and building infrastructure for data storage and presentation.

The BioMe AMP T2D GWAS data set is the first set to be entirely produced by the AMP T2D project, which supplied the funding and carried out every step of its production, from data generation to analysis, quality control, and presentation. Its immediate availability in the Portal, prior to publication, fulfills the mission of AMP T2D to speed up access to and utilization of new data.

These data were generated at the Charles Bronfman Institute for Personalized Medicine BioMe BioBank, a biorepository located at the Mount Sinai Medical Center (MSMC) in the upper Manhattan area of New York City. MSMC serves a diverse population of over 800,000 outpatients each year. Importantly, since many BioMe participants are African American or Hispanic Latino, this data set adds significant ethnic diversity to the Portal’s genetic association data.

The BioMe AMP T2D GWAS data set is comprised of about 13,000 unique individuals, 41.5% of whom are admixed American, 38% African American, and 20% European. Subjects were genotyped using at least one of three platforms: the Illumina Exome Array, the Illumina GWAS array, or the Affymetrix GWAS array. Their T2D status was assessed by an algorithm, and many additional traits were also measured.

The data were subjected to quality control and association analysis by the Analysis Team at the AMP Data Coordinating Center (DCC) at the Broad Institute. Variant associations with T2D, fasting glucose levels, and HbA1c levels were analyzed. The top results included both previously known and novel variants, with only a single variant reaching genome-wide significance: T2D association of the variant rs7903146, within the well-established T2D risk gene TCF7L2. Now that these results are available in the T2D Knowledge Portal, the ability to analyze them further in the context of all other available T2D association data may lead to additional insights.

The BioMe AMP T2D GWAS data currently has the “Early Access Phase 1” status that is assigned to new data. This status denotes that although analysis and quality control checks have been performed, the data are not yet considered to be in their final state. During the early access period, users may analyze the data but may not submit the results of these analyses for publication. Find the full details about the different phases of data release on our Policies page. More information about the data set, along with links to download even more detailed reports on its quality control and analysis, may be found in the BioMe AMP T2D GWAS section of our Data page.

BioMe AMP T2D GWAS data are available at these locations in the Portal:

  • On Gene Pages (see an example) in the Variants & Associations table and the Minor allele frequencies across data sets table
  • On Variant Pages  (see an example) in the Associations at a glance section and in the Association statistics across traits table
  • Via the Variant Finder tool, for these phenotypes: type 2 diabetes; fasting glucose adjusted for age and sex; HbA1c adjusted for age and sex; and HbA1c adjusted for age, sex, and body mass index
  • A "Manhattan plot" of associations across the genome may be seen by selecting one of the phenotypes above in the View full genetic association results for a phenotype scroll box on the Portal home page, and then selecting the BioMe AMP T2D GWAS data set.

As always, please contact us with any questions, comments, or suggestions.

Thursday, November 17, 2016

Collaborate with us!

One of the goals of the Type 2 Diabetes Knowledge Portal project is to bring together the world-wide T2D and genetics research communities to share data, knowledge, methods, and tools. In keeping with that goal, we welcome contributions of data to the Portal and we are also open to collaboration as we develop new and better ways to analyze and display data.

We’ve added a new page to the Portal, "Collaborate," that answers frequently asked questions about how to get involved. It includes links to our Data Submitter’s Guide and Data Transfer Agreement, gives an overview of the kinds of data we’re looking for, and tells you how to get in touch with our team.

The “Collaborate” page also links to information about funding opportunities offered by the Foundation for the NIH. Check this out if you’re interested in starting a new project to generate data for the Portal!

Monday, November 7, 2016

New MGH Cardiology and Metabolic Patient Cohort data in the T2D Knowledge Portal

We are pleased to announce a new data set in the T2D Knowledge Portal, from the MGH Cardiology and Metabolic Patient Cohort (CAMP). These data were contributed by Pfizer, Inc. as part of a public-private partnership to generate genotype data for a cardiometabolic and prediabetic cohort. This data set adds individual-level genetic association data for type 2 diabetes (T2D), fasting glucose levels, and fasting insulin levels from more than 3,500 samples to the Portal knowledgebase. Association data for additional phenotypes from this cohort will be incorporated in the future.

The inclusion of this data set in the T2D Knowledge Portal illustrates the uniqueness of the Accelerating Medicines Partnership, which brings together pharmaceutical companies and non-profit institutions with the goal of speeding up the discovery of new targets for treatment of T2D. The pharmaceutical partners in this collaboration have committed not only to providing funding, but also to sharing the data they generate. The CAMP data set contributed by Pfizer is the first set from a pharmaceutical partner to be made available in the Portal.

Another unique aspect of this data set is that it is the first to be included in the Portal with “Early Access Phase 1” status, which is assigned to new data. This status denotes that although analysis and quality control checks have been performed, the data are not yet considered to be in their final state. During the early access period, users may analyze the data but may not submit the results of these analyses for publication. Find the full details about the different phases of data release on our Policies page.

The CAMP cohort consists of 3,857 subjects who were recruited at the Massachusetts General Hospital Heart Center between 2008 and 2012. In addition to genotyping, the subjects had either vascular reactivity measurements (for T2D patients) or an oral glucose tolerance test (for patients not known to have T2D), and samples of their plasma and serum were analyzed. Most of the subjects were of European ancestry; about 10% were African American.

The analysis and quality control processes for this data set were performed by the Analysis Team of the Accelerating Medicines Partnership Data Coordinating Center (AMP-DCC) at the Broad Institute, and are completely transparent and fully documented. The experiment design and analysis are summarized on our Data page, and detailed reports are available for download. Going forward, all new data sets added to the Portal will be fully documented in this manner.

One intriguing—and somewhat puzzling—result from the analysis highlights the utility of incorporating data sets like this one into the Portal. The variant most strongly associated with T2D (at genome-wide significance) in this set is located in the major histocompatibility complex region near the HLA-C gene.

Known associations of genes in this region with type 1 diabetes, along with a high local recombination rate, make it challenging to interpret the meaning of this association. However, it certainly merits further investigation because of its genome-wide significance. The inclusion of this data set in the Portal, in the context of all other available data about T2D associations in the region, greatly facilitates the further analysis of this and other associations in the set.

The CAMP data may be accessed via multiple interfaces in the Portal. They are shown in tables of summary statistics and accessible in variant searches using the Variant Finder. Importantly, since the data are individual-level, samples may be filtered by various parameters and used for custom association analysis in our Genetic Association Interactive Tool (GAIT).

Find CAMP data at all of these locations in the Portal:

On Gene Pages (e.g.,  HLA-C) in the Variants & Associations table.
On Variant Pages (e.g., rs9468919) in the Associations at a glance section and in the Association statistics across traits table.
Via the Variant Finder tool, for the phenotypes T2D, fasting glucose, and fasting insulin.
Via the Genetic Association Interactive Tool (GAIT), which enables custom association analysis for either single variants (available on Variant Pages) or for the set of variants in and near a gene (Interactive burden test, available on Gene Pages).