1- Overview
This is a guide for an introductory analysis to 1) construct a polygenic risk score (PRS) using the base data (GWAS summary statistics, particularly effect-sizes and P-values, generally public available) via a clumping and thresholding method (C + T); and 2) test the constructed PRS for prediction using the target data (PLINK binary data format). In general, it is the ‘user’ data).
2- Learning Objectives
- Apply quality control measures to base/target sample prior to PRS analysis;
- Perform PRS analysis (hands-on);
- Understand the graphs and outputs (hands-on)
3- Material
Ideally for PRS analyses, you would be using the genome-wide genotype data. Here we use base data containing summary statistics and target data containing genotypes for chromosome 16 as an example to demonstrate the workflow for prediction of simulated body mass index (BMI) data. The procedure described below will be the same for the genome-wide dataset. All the materials required for this workshop are attached here. Relevant materials for this workshop are as follows:
BMI_1kgph3_chr16_snps_summarystat.txt
Base Data
We will use as base data part of GWAS Anthropometric 2015 BMI summary statistics ( https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4382211/), made available by the GIANT consortium and were extracted from their online portal