Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

3.1- Base Data

We will use as the base data part of GWAS Anthropometric 2015 BMI summary statistics ( https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4382211/), made available by the GIANT consortium and were extracted from their online portal

...

  1. Check if your Target data is in the same genome build as Base data.
  2. Basic QC: e.g., geno >0.99, mind <0.02, HWE P>1x10-6, 3SD HET, MAF >0.01, INFO >0.8. Also, remove indels and multi-allelic SNPs.
  3. Avoid sample overlap, as well as a high degree of relatedness between individuals of Base and Target data.
  4. Check strand and allele calls

...

  1. SNP IDs (for this workshop, both base and target datasets have rsIDs)
    • chromosome and map positions (we can double-check base vs target datasets if that information is available)
  2. allele 1 and allele 2 calls
    • ambiguous SNPs
    • strand
    • allele 1 vs allele 2


For the workshop, we are using simulated data (see above). For this dataset, we will only check strand and allele calls.

...

### To add column “B” for the beta to be used in polygenic risk scoring (note: “b” is the original beta)

...

### B=-b for SNPs with switched alleles above (beta will have an opposite sign)


Code Block
bmikg1b$B=0-bmikg1b$b

...

Tip: Effect sizes given as odds ratios (OR) will need to be converted to Beta (B) using the natural logarithm of the OR. In this way, the PRS can be computed using summation. (The effect estimates can be transformed from B back to OR afterwardsafterward).


5.1- CLUMPING using PLINK

### The most commonly used method for computing PRS is clumping and thresholding (C+T). Before calculating PRS, the variants are first clumped, and variants that are weakly correlated (r2) with one another are retained. The clumping step prunes redundant correlated effects caused by linkage disequilibrium (LD) between variants. Thresholding will remove variants with a p-value larger than a chosen level of significance (default: 0.0001).

...

In the commands below, we are combining the base-cleaned-alleles file (BMI_unambiguousSNPsposstrand-B_AllelesV5V6_20200226.txt) with SNPs retained after the clumping step.

...

Code Block
bmikg3a=bmikg3[bmikg3$P<0.5,]
bmikg3a1=subset(bmikg3a,select=c("SNP","V5","B"))
nrow(bmikg3a)    ##3254
colnames(bmikg3a1)<-c("SNP","A1","Score")
write.table(bmikg3a1,"1kgph3_chr16_test_clumped_1_threshold_5.raw",col.names=T,row.names=F,quote=F,sep='\t')


### Example (b) using a range of p-value thresholds in PLINK

#sample range list txt: The columns of this file are ‘Name of the threshold’, ‘Lower bound p-value’, and ‘Upper bound p-value’. 

...