Showing posts with label Burden test. Show all posts
Showing posts with label Burden test. Show all posts

Wednesday, January 3, 2018

Complete data description now available for T2DKP WES and WGS datasets

A new Data Descriptor publication from Jason Flannick, Christian Fuchsberger, Anubha Mahajan, and colleagues (Scientific Data 4, Article number: 170179 (2017) doi:10.1038/sdata.2017.179), presents absolutely everything there is to know about four large, important datasets that are included in the Type 2 Diabetes Knowledge Portal. These datasets are the product of the GoT2D and T2D-GENES consortia, large international groups that seek to uncover the genetic basis of type 2 diabetes.

The investigators took a variety of approaches to generate the most complete view of the genetic architecture of T2D available to date. They performed whole-exome sequencing on a group of 12,940 individuals of multiple ancestries (6,504 T2D cases and 6,436 controls) and whole-genome sequencing on 2,657 individuals of European descent, and tested the association of variants with T2D. They also used an exome chip to test coding variants in more than 80,000 people, and used imputation to test non-coding variants in an additional 44,000.

In total, the researchers sampled more than 120,000 genomes and identified more than 27 million single nucleotide polymorphisms, indels, and structural variants, testing their association with T2D. The new publication documents the experimental and analytical methods and results in complete detail. Analysis and interpretation of these data were also discussed in a previous publication (Fuchsberger, Flannick, Teslovich, Mahajan, Agarwala, Gaulton et al., 2016).

This comprehensive catalog of T2D associations is available for you to search and explore via the T2D Knowledge Portal. The datasets from this study are named as follows in the T2DKP:

  • GoT2D WGS (whole-genome sequence data)
  • GoT2D WGS + replication (whole-genome sequence data plus imputed genotypes)
  • 13K exome sequence analysis
  • GoT2D exome chip analysis

All of these sets are described in more detail on our Data page, including lists of the cohorts studied and case/control selection criteria for each. Our Variant Finder tool searches all of these sets, and results from these datasets are displayed in various tables and interfaces on the Gene and Variant pages of the T2DKP.

The individual-level data in the 13K exome sequence set are also available for custom analysis via the Genetic Association Interactive Tool (GAIT) on Variant pages and the custom burden test on Gene pages. These tools allow researchers to interact with the individual-level data while protecting patient privacy. They access the 19K exome sequence analysis dataset, which includes the 13K exome sequence data from this study along with 6,000 additional exome sequences from the SIGMA and LuCamp consortia. Both tools allow you to filter samples by multiple criteria (for example, age, BMI, cholesterol levels of the subjects) and to choose covariates before running on-the-fly association analysis. The custom burden test also offers the ability to select the set of variants to consider in the analysis.

Please explore these datasets and, as always, let us know what you think!

Wednesday, March 15, 2017

The Portal’s interactive burden test: now more versatile than ever

Significant associations between genes and T2D or related phenotypes can provide powerful insights into disease mechanisms and possible therapies. The T2D Knowledge Portal includes results from pre-computed analyses of genetic associations for a large, and growing, number of datasets. But what if you want to do a more fine-grained analysis? You might want to test whether the disease burden for a gene differs between groups of people with specific characteristics—for example, lean people with T2D versus obese people without T2D. Or you might want to test the aggregate effect of a specific subset of variants, such as those that are likely to knock out the function of a protein of interest.

Our interactive burden test on Gene pages, powered by the Genetic Association Analysis Tool (GAIT), allows you to do all that and more. The burden test considers a gene as the unit of inquiry, including all the variants it contains in a statistical test of disease association. We described the basics of the burden test and GAIT in a recent blog post. Now, we’ve added some options for selecting variants in the interactive burden test that make this tool even more versatile.

The variant selection step of the burden test on a Gene page is pre-populated with all of the variants present in the selected dataset that are located within the gene and its 100 kb up- and downstream flanking regions. You can create a specific subset of these by checking or un-checking individual variants. The table may be sorted by multiple criteria in order to find variants of interest: chromosomal coordinate; minor allele count; predictions of the effect allele’s impact on the encoded protein; and the protein change or type of mutation caused by the effect allele.


Section of the interactive burden test interface showing the default list of variants for the SLC30A8 gene. Options for customizing the list are located above the variant table.

The table of variants may be filtered so that the test considers only certain categories of variants, with varying predicted impacts on the encoded protein. Previously, the burden test offered filters based on an unpublished method. Now, we have replaced those filters with the set that was used in a recent major publication: The genetic architecture of type 2 diabetes, by Fuchsberger, Flannick, Teslovich, Mahajan, Agarwala, Gaulton, et al.

Variant filters in the interactive burden test

All coding variants--selects variants within the coding sequence, from the dataset that was initially selected for the burden test

Protein-truncating + missense with MAF<1%--selects variants in both of these categories:
  • protein-truncating (predicted to cause a truncated protein to be generated, either by creating a premature stop codon or by causing a frameshift) 
  • cause a missense mutation AND have minor allele frequency (MAF) of less than 1%. The MAF limit eliminates common variants, which would not be expected to have very deleterious effects. 

Protein-truncating + possibly deleterious missense with MAF<1%--selects variants in both of these categories:

Protein-truncating + probably deleterious missense--selects variants in both of these categories:

Protein-truncating only--selects variants predicted to cause a truncated protein to be generated, either by creating a premature stop codon or by causing a frameshift.

Using these filters, you can tailor the list of variants to those with specific impact on the encoded protein. If you would like to customize the list even further by adding variants that were not present in the default list, there is now an option to add single or multiple variants, using dbSNP IDs (e.g., rs112881768) or identifiers in the format “chromosome_coordinate_reference-nucleotide_variant-nucleotide” (e.g., 8_112881768_G_A).

When “single variant” is selected, once you begin typing, variant IDs that match your entry are suggested. When “multiple” is selected, you may type or paste in a list of variant IDs, separated by commas or returns. Note that any added variants are not subject to the filters, which act only on the default list of variants for a gene.

Our GAIT User Guide (download PDF) that summarizes all the details of the interface has been updated with the latest changes. Please check out our new, improved interactive burden test and let us know if you have comments or suggestions.

Tuesday, October 25, 2016

Design your own association analysis with our Genetic Association Interactive Tool (GAIT)

Genetic association analysis—identifying polymorphisms in the human genome that are correlated with altered risk of disease—is a powerful method for discovering disease mechanisms. These polymorphisms can indicate what goes wrong at the cellular level in the disease process, knowledge that is critically important for developing better diagnostics and therapies.

The Type 2 Diabetes Knowledge Portal offers a wealth of pre-calculated information on genetic associations between variants and type 2 diabetes (T2D) or other related traits. These results are computed using broadly defined groups of samples: either an entire sample set from a project, or ancestry-specific cohorts. This approach, while it generates very valuable results, masks effects that could only be detected in even more narrowly defined groups: for example, individuals within a certain range of age, body mass index, or cholesterol level. 

Until now, analysis of such fine-grained subsets of individual-level data has only been possible for expert geneticists with access to protected data. But our new Genetic Association Interactive Tool (GAIT) offers everyone an unprecedented amount of access to individual-level data along with an easy-to-use interface for analyzing genetic associations using custom subsets of samples and variants.

Two versions of GAIT are available in the Portal. One, on Variant pages (see an example) computes association statistics for the single variant featured on that page. The other, accessible on Gene pages (see an example) powers an interactive burden test that considers the collection of variants in or near a gene, or a selected subset of those variants. 

Where to find GAIT on Gene pages (left) and Variant pages (right)


The GAIT interface offers incredible flexibility for designing custom analyses. In the interactive burden test, you can filter variants by their predicted effects, or pick and choose individual variants to include. When creating sample sets for either single-variant association analysis or a gene burden test, you can specify a gender, set ranges for the values for multiple phenotypes, and choose principal components or phenotypes to use as covariates. And all these parameters may be set differently for different ethnic groups.

The GAIT interface displays phenotype values within the sample set and allows you to filter samples by multiple criteria


Once you set parameters of your choice, GAIT computes associations on the fly, based on individual-level data. To protect patient confidentiality, GAIT will not display results from sample sets consisting of fewer than 100 individuals.

To help you get familiar with this versatile tool, we’ve created a User Guide (download PDF) that summarizes all the details of the interface. Please give GAIT a try and let us know what you think!