With the addition of four new sequence data sets to our database, the number of variants and associations accessible via the Portal pages and tools has increased by millions.
Two of the new data sets are from projects that have obtained sequence data from a wide range of individuals. The ExAC data set, comprising exome sequences collected and harmonized by the Exome Aggregation Consortium, includes sequence data from 60,706 unrelated people of multiple ancestries. The 1000 Genomes data set, from the International Genome Sample Resource project (IGSR), is composed of whole-genome sequences from 2,504 individuals in four different ethnic groups.
The allele frequencies of variants in the different ethnic groups surveyed in the 1000 Genomes data set can be seen in the “How common is…?” section on the Variant pages (view an example). And both the ExAC and 1000 Genomes data sets can be queried using the Variant Finder tool. You can select them via a new tab on the interface, “Additional search options”, where you can choose these data sets and also add more criteria to your search.
|The Data set pull-down menu on the "Additional Search Options" tab of the Variant Finder lets you specify 1000 Genomes or ExAC data.|
|Available selections in the Data set pull-down menu.|
The other two new data sets in the Portal were both generated by the GoT2D consortium. A whole-genome sequence data set (GoT2D WGS) adds data from 2,657 individuals, including the associations of noncoding variants that were not present in the previous whole-exome sequence data set from the GoT2D project. This new data set brings T2D association data across 30 million variants to the Portal. The GoT2D WGS + replication data set adds imputation to that set, bringing the sample size to over 47,000 and including most low-frequency and common variants.
The new GoT2D data can be seen in multiple sections of the Portal’s Gene and Variant pages, and may also be accessed by selecting these data sets in the Variant Finder.
In addition to these major new additions, today’s release of data also includes some bug fixes and data harmonization.
Get out there and explore the new data landscape in the Portal, and let us know what you think!