Friday, June 22, 2018

New data release brings new phenotypes and huge sample sizes to the T2DKP

Progressing towards the goal of the Accelerating Medicines Partnership in Type 2 Diabetes (AMP T2D) to aggregate, analyze, and present comprehensive genetic data relative to T2D in order to speed up the validation of new drug targets, today we release 10 new datasets to the Type 2 Diabetes Knowledge Portal. These datasets contain variant associations for 17 phenotypes, including 7 that are new to the T2DKP, from over 1.4 million samples.

Four of the new datasets were generated by collaborators in AMP T2D, the parent organization of the T2DKP. AMP T2D is a pre-competitive partnership among the National Institutes of Health, industry, and not-for-profit organizations, managed by the Foundation for the National Institutes of Health, that supports the generation of genetic association data and many other kinds of genomic data as well as providing access to these data in the T2DKP, to facilitate the translation of these data into biological knowledge about T2D.

For all four of these datasets, quality control and association analysis were performed by the Analysis Team of the AMP Data Coordinating Center (AMP DCC) at the Broad Institute, using standard, state-of-the-art methods. These processes are completely transparent and fully documented: the experimental design and analysis are summarized on our Data page, and detailed reports are available for download. In this first phase of analysis, associations were determined for type 2 diabetes, fasting glucose levels, and fasting insulin levels--both unadjusted, and adjusted for body mass index. Future analyses will add more phenotypes.

One of these datasets,  Diabetic Cohort - Singapore Prospective Study GWAS, was contributed by collaborators at the National University of Singapore. Consisting of 3,864 samples, it is a T2D case-control study to identify genetic and environmental risk factors for diabetes in Singapore Chinese. The other three new sets that were analyzed at the AMP DCC, contributed by collaborators at the University of Michigan, are from the Finland-United States Investigation of NIDDM Genetics (FUSION) Study that seeks to to map and identify genetic variants that predispose to type 2 diabetes or affect variability in diabetes-related traits. The three FUSION datasets include FUSION GWAS, with 1,681 samples; FUSION Metabochip, with 2,163 samples, and FUSION exome chip analysis, with 3,485 samples.

All four of these datasets now have “Early Access Phase 1” status, which is assigned to new data. This status denotes that although analysis and quality control checks have been performed, the data are not yet considered to be in their final state. During the early access period, users may analyze the data but may not submit the results of these analyses for publication. Find the full details about the different phases of data release on our Policies page.

In addition to the datasets from APM T2D partners, we have also added or updated 6 new sets of publicly-available association summary statistics for phenotypes relevant to T2D:

  • The previous CKDGen GWAS dataset for chronic kidney disease has been replaced with a newer study from the CKDGen consortium, imputed to the 1000 Genomes reference set (Gorski et al., 2017), with 110,517 samples;
  • Early Growth Genetics Consortium GWAS associations for childhood obesity (Bradfield et al., 2012), with 13,848 samples;
  • Body fat distribution associations (Shungin et al., 2015), with 245,749 samples, have been added to the existing GIANT GWAS dataset;

Results from all the new datasets may be viewed at these locations in the T2D Knowledge Portal:

• On Gene Pages (e.g., GCKR) in the Common variants and High-impact variants tables and in LocusZoom plots;

• On Variant Pages (e.g.rs1260326) in the Associations at a glance section, the Association statistics across traits table, and in LocusZoom static plots;

• From the View full genetic association results for a phenotype search on the home page: select a phenotype and view the top variants in a Manhattan plot and table;

• Using the Variant Finder tool: specify multiple criteria and retrieve the set of variants meeting those criteria from any of these datasets.

Additionally, individual-level data from the Diabetic Cohort - Singapore Prospective Study GWAS and FUSION GWAS datasets are available for secure custom interactive analyses using these tools in the T2DKP:

• Using the Genetic Association Interactive Tool (GAIT) on Variant Pages, you may choose a phenotype for association analysis, choose custom covariates, filter the sample pool by specifying a range of values for one or more phenotypes, then run on-the-fly analysis.

• Dynamic LocusZoom plots on Gene and Variant pages allow you to run association analysis using one or more variants of your choice as covariates, in order to test whether associations are independent.

With today's release, the T2DKP includes genetic associations for 68 phenotypes from a total of 35 datasets. We welcome submissions of new datasets for incorporation into the T2DKP. Find information about collaboration here, and please contact us with questions.

Monday, June 18, 2018

See you at ADA!

The 78th Scientific Sessions of the American Diabetes Association are coming up in just a few days, and the T2D Knowledge Portal team will be there!

As usual, we'll have a booth in the exhibit hall. We'll be at booth #1075 from 10am to 4pm on Saturday and Sunday 6/23-24, and from 10am to 2pm on Monday 6/25. Come say hello, get a demonstration of the T2D, Cardiovascular Disease, or Cerebrovascular Disease Knowledge Portals, and pick up some of the T2DKP sticky notes that we'll be giving away!

Here's who you might find at the booth when you stop by:

There will also be presentations from several members of our group on Saturday, June 23:
  • Jason Flannick, PhD will give a talk on "The Type 2 Diabetes Knowledge Portal" at 11:30am.
Session: Quantifying Diabetes: Genomics, Electronic Health Records, and Automated Control
Location: W312
  • Jose C. Florez, MD, PhD, will moderate an interactive poster session, "Delving into Type 2 Diabetes Genetics", at 12:30 pm.
Location: Poster hall
  • Miriam Udler, MD, PhD will present "Genetic testing for Monogenic Diabetes--Whom to Test, What and How to Order?" at 2:15pm.
Session: Monogenic Diabetes Testing is Ready for Prime Time--Integrating Genetics into Your Practice
Location: W304E-H

We hope to meet you in Orlando!

Friday, June 1, 2018

New T2DKP features help distill knowledge from data

We are pleased to announce four new features in the Type 2 Diabetes Knowledge Portal that simplify the interpretation of genetic association data, making it easier to pinpoint variants and datasets that are informative for a disease or phenotype of interest.

"Clumping" variants by linkage disequilibrium

The first step in getting an overview of the results of a particular experiment is typically to plot variant associations vs. chromosomal location, in a so-called "Manhattan plot." These plots are available from the T2DKP home page after choosing a phenotype from the list:

After selecting a phenotype, you may select a dataset, and the Manhattan plot is displayed above a table of the top variants:

Now, in addition to selecting a dataset to view associations, you may select a threshold for linkage disequilibrium (LD) in order to reduce the number of linked variants that represent a single association signal. For example, without "clumping" variants by LD (r2 = 1), when viewing the DIAGRAM 1000G GWAS dataset there are 70 significantly associated variants in the IGFBP2 gene; but setting the most stringent LD threshold  (r2 = 0.1) reduces that number to just 8 variants by displaying only the most significant associations after clumping variants by LD. Intermediate LD thresholds of r2 = 0.2. 0.4, 0.6, or 0.8 may also be set, allowing more versatility in this analysis.

New Region page

The Gene page of the T2DKP (see an example) integrates and summarizes information about the associations of variants across the region of a gene. Now, you can see this integration and summation for any region of the genome, not just the areas surrounding protein-coding genes. Simply enter a chromosome and coordinates in the home page search box:

The resulting page resembles a Gene page. The traffic light integrates all associations across the region to give you an immediate indication of whether there are significant associations found in any of the datasets in the T2DKP. Further down the page, tools and displays let you drill down to the specifics for a phenotype or variant of interest. This new Region page provides a way to explore any part of the genome in great detail.

PheWAS graphic on the Variant page

Previously, the Variant page of the T2DKP displayed significant associations for each variant in a graphic that showed a color-coded box for each phenotype-dataset combination. But the rapidly increasing number of phenotypes becoming available from biobank studies has made this view unsustainably large. In its place, we have incorporated a phenome-wide association study (PheWAS) visualization developed at the University of Michigan. The graphic shows at a glance which phenotype associations are most significant for a particular variant. Mouse over a point to see more details.

All Associations graphic on the Variant page

The PheWAS graphic distills variant associations in order to highlight the most significant ones. But suppose you want to drill down to the details and explore associations in every dataset, viewing parameters like sample size, odds ratio, and more? There's a graphic for that too: our new All Associations interactive graphic, located in the "Associations across all datasets" section of the variant page. Start by using keywords to filter phenotypes. Filtering allows you to view one specific phenotype, several related phenotypes, or phenotypes in a broad category, such as glycemic phenotypes; both the graphic and the table below it change in response to phenotype filtering.  There are also options to filter by setting ranges of p-values and/or sample sizes.

The graph plots p-value (vertical axis) vs. dataset sample size (horizontal axis) for each association. Points in the graph are triangular; whether the triangle points up or down indicates a positive or negative direction of effect, respectively. Mousing over a point shows you more details about the association and the dataset. This graphic can help you evaluate whether an association is likely to be real. As shown in the illlustration below, a genuine signal should increase in significance (i.e., decrease in p-value) with increasing sample size.

Stay in touch!

Like the rest of the T2DKP, these features are under continuous development. Please give them a try and let us know what you think.

Friday, May 11, 2018

T2DKP Spring Newsletter

The latest issue of our quarterly newsletter is now available. Download it here and get the latest!

Tuesday, May 8, 2018

NIDDK Workshop: Towards a Functional Understanding of the Diabetic Genome 2018

Recently, members of the T2D Knowledge Portal team were fortunate to participate in a fascinating workshop hosted by the NIDDKTowards a Functional Understanding of the Diabetic Genome. Speakers highlighted the diversity of ongoing research projects that aim to translate disease-associated variants into functional insights in type 2 diabetes.

The workshop featured presentations on multiple data types that can provide clues about the mechanisms by which sequence variants affect T2D risk. Many of these offer insights into transcriptional regulation: epigenomic chromatin modifications; tissue-specific RNA levels; eQTLs; transcription factor binding sites; long-range interactions between chromosomes that bring promoters and enhancers into proximity; and regulatory pathways. Others focus on downstream processes such as protein-protein interactions, biochemical pathways, and metabolomics.

It will be crucial to integrate all of these data types with genetic association data in order to get a complete picture of how particular genomic regions influence T2D biology, and at the T2DKP we are working towards incorporating as many of these data types as possible.

Although the presentations in this workshop were diverse, some common themes were evident. One was that although the insulin-secreting beta cells in pancreatic islets are hugely significant to T2D, and most T2D risk variants influence insulin secretion, current research projects are confirming and underscoring the importance of other tissues. Fat, liver, skeletal muscle (which comprises 40% of human body weight), and brain are all intimately involved in the development of T2D.

Another common theme for ongoing T2D research is that things may often be much more complicated than they first appear. A single genomic region associated with T2D risk may harbor multiple independent causal variants, each potentially having different regulatory effects, possibly affecting different tissues, and causing varied phenotypic consequences. Even if these variants alter a protein-coding sequence, they may not act through their effects on that sequence. These genetically complicated regions, such as those elucidated in FTO or TCF7L2, may be more common than we previously thought.

A third overall conclusion from the workshop is that model organism research can accelerate the investigation of candidate genes. The short life cycles of Drosophila and zebrafish, and the versatile genetic tools available for these systems, allow for rapid and systematic interrogation of gene function. Zebrafish glucose and lipid metabolism have much in common with those processes in human cells, and with their transparent bodies, zebrafish literally give us a window into pancreatic development.  In addition to being a well-developed model system, the mouse offers much greater genetic diversity than human, with about 40 million SNPs in the mouse genome as compared to about 10 million in the human genome.

At the T2DKP, efforts to integrate many of these data types are in progress, and integration of others is being planned. We continue to work towards making the T2DKP a comprehensive resource for the T2D research community, to help accelerate the translation of variant associations into knowledge about disease mechanisms and identification of potential drug targets.

Many of the presentations at the workshop featured web resources of potential interest to T2D researchers, listed below. The T2DKP is connected with the first, the Diabetes Epigenome Atlas. We are interested providing better connections between the T2DKP and other relevant resources. If you would be particularly interested in seeing links from the T2DKP to one of the resources below, or if you know of a resource that would be informative, we would love to hear your suggestions!

  • HaploReg: explore annotations of the noncoding genome at variants on haplotype blocks
  • ExPecto: tissue-specific gene expression effect predictions for human mutations
  • DeepSea: predict the cell type-specific epigenetic state of a sequence and the chromatin effects of sequence variants
  • GeNets: unified web platform for network-based analyses of genetic data
  • DCell: a deep neural network simulating cell structure and function

Wednesday, May 2, 2018

Join the Knowledge Portal Network team!

At the Knowledge Portal Network (currently consisting of the Type 2 Diabetes, Cerebrovascular Disease, and Cardiovascular Disease Knowledge Portals), we are looking for energetic, talented people to help us produce web portals that aggregate and serve genetic association results to the world in order to spark insights into complex diseases. There are positions open for a software engineer to help in developing and producing these web portals, and for a technical release manager to manage and coordinate tasks during production and maintenance of the portals.

The positions are located at the Broad Institute in Cambridge, MA, a dynamic and exciting work environment where cutting-edge science is applied to critical biomedical problems.

Find more details and apply for the software engineer or technical release manager positions at the Broad Careers site.

Friday, April 27, 2018

New T2DKP release adds individual-level data for interactive analysis

With the April release of the Type 2 Diabetes Knowledge Portal, we are increasing the number of datasets and samples available for interactive analysis via the LocusZoom and GAIT tools. These tools now access individual-level data from three additional datasets, all of which were quality controlled and analyzed at the Accelerating Medicines Partnership in Type 2 Diabetes (AMP T2D) Data Coordinating Center (DCC):
  • CAMP GWAS: 3,628 multi-ancestry samples from the MGH Cardiology and Metabolic Patient cohort, generated by a public-private partnership between Pfizer Inc. and Massachusetts General Hospital;
  • METSIM GWAS: 8,791 European ancestry samples from the Metabolic Syndrome in Men study.
These individual-level data are available as "dynamic" datasets, powered by Hail software, in LocusZoom on Gene pages and Variant pages of the T2DKP, for the following phenotypes: 
  • BioMe AMP T2D GWAS: type 2 diabetes, BMI, diastolic blood pressure, fasting glucose, HbA1c, HDL cholesterol, LDL cholesterol, systolic blood pressure
  • CAMP GWAS: type 2 diabetes, BMI, fasting glucose, fasting insulin
  • METSIM GWAS: type 2 diabetes, BMI, diastolic blood pressure, fasting glucose, fasting insulin, HbA1c, HDL cholesterol, LDL cholesterol, systolic blood pressure
To perform interactive analyses on these data in LocusZoom, select one of the available phenotypes in step 1 and then choose a "dynamic" dataset in step 2.

When you click on a variant in the resulting LocusZoom plot, the option to condition on that variant appears in the tooltip:

Clicking on that link starts on-the-fly association analysis for the region while conditioning on that variant, which can reveal whether association signals are independent of each other. You can choose to condition on multiple variants. The variants of your choice are listed in the upper left-hand corner of the plot, and the list may be edited:

Individual-level data from these three datasets are also available for interactive analysis via the Genetic Association Interactive Tool (GAIT) on Variant Pages. After selecting one of the datasets, you will be able to choose a phenotype for association analysis, filter the sample pool by specifying a range of values for one or more phenotypes, choose custom covariates, and then run on-the-fly association analysis for your chosen subset of samples. Find all of the details about how to use this tool in our GAIT guide.

We hope that the increased ability to interact with individual-level data in the T2DKP will be helpful to your research. As always, we are happy to answer any questions about these or other data and tools; please contact us for help.