Decomposed matrices used for the analysis described in 'Components of genetic associations across 2,138 phenotypes in the UK Biobank highlight adipocyte biology'

2019-08-01T10:00:38Z (GMT) by Yosuke Tanigawa Manuel Rivas

The dataset deposited here contains decomposed matrices of GWAS summary statistics across 2,138 phenotypes described in the following publication:

Y. Tanigawa*, J. Li*, et al., Components of genetic associations across 2,138 phenotypes in the UK Biobank

highlight adipocyte biology. Nature Communications (2019). doi:10.1038/s41467-019-11953-9.

The data are provided as three Python Numpy data (npz) files, each of which corresponds to the three datasets used in computational analysis described in our manuscript.

- "all" dataset: dev_allNonMHC_z_center_p0001_100PCs_20180129.npz

- "Coding only" dataset: dev_codingNonMHC_z_center_p0001_100PCs_20180129.npz

- "PTVs only" dataset: dev_PTVsNonMHC_z_center_p0001_100PCs_20180129.npz

Those files can be loaded with Python numpy package and were used in our analysis scripts and notebook (

Please read our publication for more information regarding this dataset.


Population-based biobanks with genomic and dense phenotype data provide opportunities for generating effective therapeutic hypotheses and understanding the genomic role in disease predisposition. To characterize latent components of genetic associations, we applied truncated singular value decomposition (DeGAs) to matrices of summary statistics derived from genome-wide association analyses across 2,138 phenotypes measured in 337,199 White British individuals in the UK Biobank study. We systematically identified key components of genetic associations and the contributions of variants, genes, and phenotypes to each component. As an illustration of the utility of the approach to inform downstream experiments, we report putative loss of function variants, rs114285050 (GPR151) and rs150090666 (PDE3B), that substantially contribute to obesity-related traits, and experimentally demonstrate the role of these genes in adipocyte biology. Our approach to dissect components of genetic associations across the human phenome will accelerate biomedical hypothesis generation by providing insights on previously unexplored latent structures.