...
- SNP IDs (for this workshop, both base and target datasets have rsIDs)
- chromosome Chromosome and map positions (we can double-check base vs target datasets if that information is available)
- Allele 1 and Allele 2 calls
- Ambiguous SNPs to be excluded
- Allele 1 and allele 2 calls
- between Base and Target data – 4 scenarios:
- (i) perfect allele matches with same strand
- (ii) perfect allele matches with opposite strands
- (iii) switched allele 1 and allele 2 calls with same strand
- (iv) switched allele 1 and allele 2 calls with opposite strand
- (i) perfect allele matches with same strand
- ambiguous SNPs
- strand
- allele 1 vs allele 2
For the workshop, we are using simulated data (see above). For this dataset, we will only check strand and allele calls.
...
Code Block |
---|
bmikg1=bmikg[is.na(bmikg$ambiguousSNPs),] ## to keep only non-ambiguous SNPs nrow(bmikg1) ##55,752 non-unambiguous SNPs remaining |
### Scenario (i): To find perfect allele matches between base GWAS summary statistics file and target GWAS genotype data (i.e., A1 (column “A1”) in GWAS summary statistics file is the same as A1 (column “V5”) in target GWAS data, and A2 (column “A2”) in base GWAS summary statistics file is the same as A2 (column “V6”) in target GWAS data) -- beta will be unchanged
...
Code Block |
---|
bmikg1aa=bmikg1[as.character(bmikg1$A1) == bmikg1$V5 & as.character(bmikg1$A2) == bmikg1$V6,] nrow(bmikg1aa) ##28,842 |
### Scenario (ii): To find allele matches for flipped strand between base GWAS summary statistics file and target GWAS data -- beta will be unchanged
...
### B=b for SNPs with matching alleles above (beta will remain unchanged for Scenarios (i) and (ii))
Code Block |
---|
bmikg1a$B=bmikg1a$b |
### To Scenario (c): To find perfect allele switches between A1 and A2 (i.e., A1 in base GWAS summary statistics file is the same as A2 in target GWAS data, and vice versa) -- beta will have opposite sign
...
Code Block |
---|
bmikg1ba=bmikg1[as.character(bmikg1$A1) == bmikg1$V6 & as.character(bmikg1$A2) == bmikg1$V5,] nrow(bmikg1ba) ##26,883 |
### Scenario (d): To find flipped strands but perfect allele switches between A1 and A2 (i.e., A1 (column “A1”) in base GWAS summary statistics file is the same as the complementary allele for A2 (column “V6”) in target GWAS data, and vice versa) -- beta will have opposite sign
...
### B=-b for SNPs with switched alleles above (beta will have an opposite sign for Scenarios (iii) and (iv))
Code Block |
---|
bmikg1b$B=0-bmikg1b$b |
...
Code Block |
---|
bmikg3a=bmikg3[bmikg3$P<0.5,]
bmikg3a1=subset(bmikg3a,select=c("SNP","V5","B"))
nrow(bmikg3a) ##3254
# Renaming column headings from "SNP", "V5", and "B" to "SNP", "A1", and "Score"
colnames(bmikg3a1)<-c("SNP","A1","Score")
write.table(bmikg3a1,"1kgph3_chr16_test_clumped_1_threshold_5.raw",col.names=T,row.names=F,quote=F,sep='\t') |
...
- Bulik-Sullivan BK, Loh PR, Finucane HK, Ripke S, Yang J; Schizophrenia Working Group of the Psychiatric Genomics Consortium, Patterson N, Daly MJ, Price AL, Neale BM. LD Score regression distinguishes confounding from polygenicity in genome-wide association studies. Nature Genetics, 2015; 47:291–295. PMID: 25642630 PMCID: PMC4495769 DOI: 10.1038/ng.3211
- Choi, S.W., Mak, T.S. & O’Reilly, P.F. Tutorial: a guide to performing polygenic risk score analyses. Nat Protoc. 2020; 15: 2759–2772. https://doi-org.myaccess.library.utoronto.ca/10.1038/s41596-020-0353-1
- Locke AE, Kahali B, Berndt SI, Justice AE, Pers TH, Day FR, Powell C, Vedantam S, Buchkovich ML, Yang J, Croteau-Chonka DC, Esko T et al. Genetic studies of body mass index yield new insights for obesity biology. Nature. 2015; 518:197-206. PMID: 25673413 PMCID: PMC4382211 DOI: 10.1038/nature14177
- Ni G, Zeng J, Revez JA, Wang Y, Zheng Z, Ge T, Restuadi R, Kiewa J, Nyholt DR, Coleman JRI, Smoller JW, Schizophrenia Working Group of the Psychiatric Genomics Consortium, Major Depressive Disorder Working Group of the Psychiatric Genomics Consortium, Yang J, Visscher PM, Wray NR. A comparison of ten polygenic score methods for psychiatric disorders applied across multiiple cohorts. medRxiv. May 5, 202. URL: https://www.medrxiv.org/content/10.1101/2020.09.10.20192310v2.full.pdf
- Speed D, Holmes J, Balding DJ. Evaluating and improving heritability models using summary statistics. Nature Genetics, 2020; 52: 458–462. PMID: 32203469 DOI: 10.1038/s41588-020-0600-y