Wednesday, March 15, 2017

The Portal’s interactive burden test: now more versatile than ever

Significant associations between genes and T2D or related phenotypes can provide powerful insights into disease mechanisms and possible therapies. The T2D Knowledge Portal includes results from pre-computed analyses of genetic associations for a large, and growing, number of datasets. But what if you want to do a more fine-grained analysis? You might want to test whether the disease burden for a gene differs between groups of people with specific characteristics—for example, lean people with T2D versus obese people without T2D. Or you might want to test the aggregate effect of a specific subset of variants, such as those that are likely to knock out the function of a protein of interest.

Our interactive burden test on Gene pages, powered by the Genetic Association Analysis Tool (GAIT), allows you to do all that and more. The burden test considers a gene as the unit of inquiry, including all the variants it contains in a statistical test of disease association. We described the basics of the burden test and GAIT in a recent blog post. Now, we’ve added some options for selecting variants in the interactive burden test that make this tool even more versatile.

The variant selection step of the burden test on a Gene page is pre-populated with all of the variants present in the selected dataset that are located within the gene and its 100 kb up- and downstream flanking regions. You can create a specific subset of these by checking or un-checking individual variants. The table may be sorted by multiple criteria in order to find variants of interest: chromosomal coordinate; minor allele count; predictions of the effect allele’s impact on the encoded protein; and the protein change or type of mutation caused by the effect allele.


Section of the interactive burden test interface showing the default list of variants for the SLC30A8 gene. Options for customizing the list are located above the variant table.

The table of variants may be filtered so that the test considers only certain categories of variants, with varying predicted impacts on the encoded protein. Previously, the burden test offered filters based on an unpublished method. Now, we have replaced those filters with the set that was used in a recent major publication: The genetic architecture of type 2 diabetes, by Fuchsberger, Flannick, Teslovich, Mahajan, Agarwala, Gaulton, et al.

Variant filters in the interactive burden test

All coding variants--selects variants within the coding sequence, from the dataset that was initially selected for the burden test

Protein-truncating + missense with MAF<1%--selects variants in both of these categories:
  • protein-truncating (predicted to cause a truncated protein to be generated, either by creating a premature stop codon or by causing a frameshift) 
  • cause a missense mutation AND have minor allele frequency (MAF) of less than 1%. The MAF limit eliminates common variants, which would not be expected to have very deleterious effects. 

Protein-truncating + possibly deleterious missense with MAF<1%--selects variants in both of these categories:

Protein-truncating + probably deleterious missense--selects variants in both of these categories:

Protein-truncating only--selects variants predicted to cause a truncated protein to be generated, either by creating a premature stop codon or by causing a frameshift.

Using these filters, you can tailor the list of variants to those with specific impact on the encoded protein. If you would like to customize the list even further by adding variants that were not present in the default list, there is now an option to add single or multiple variants, using dbSNP IDs (e.g., rs112881768) or identifiers in the format “chromosome_coordinate_reference-nucleotide_variant-nucleotide” (e.g., 8_112881768_G_A).

When “single variant” is selected, once you begin typing, variant IDs that match your entry are suggested. When “multiple” is selected, you may type or paste in a list of variant IDs, separated by commas or returns. Note that any added variants are not subject to the filters, which act only on the default list of variants for a gene.

Our GAIT User Guide (download PDF) that summarizes all the details of the interface has been updated with the latest changes. Please check out our new, improved interactive burden test and let us know if you have comments or suggestions.