Page History

...

43.1- Base Data

We will use as base data part of GWAS Anthropometric 2015 BMI summary statistics ( https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4382211/), made available by the GIANT consortium and were extracted from their online portal

https://portals.broadinstitute.org/collaboration/giant/index.php/GIANT_consortium_data_files#BMI_and_Height_GIANT_and_UK_BioBank_Meta-analysis_Summary_Statistics).

3.1.1- QC for Base data:

1)- Check SNPs Heritability: h2SNP>0.05

...

Tip: The base data may come in different formats. For example, if marker IDs (rs IDs) are not available, we may have to derive them based on available chromosome and map positions. The effect estimate may come in the form of an odds ratio (OR), in which case we will calculate beta using the formula beta=ln(OR). It is also ideal to have the allele calls based on the positive strand. Make sure to identify the effect allele.

3.2- Target Data

Our target data for this tutorial consist of genotype data on the subset of European samples for chromosome 16 from the 1000 Genomes datasets in PLINK format. Phenotype consists of simulated BMI data. For demonstration purposes, the BMI in the target dataset was simulated randomly from a normal distribution with no reference to the genotype data.

Tip: Base and target data should share the same ancestry background.

3.2.1- QC for Target data:

1)- Check if your Target data is in the same genome build as Base data.

LiftOver: https://genome.ucsc.edu/cgi-bin/hgLiftOver

2)- Basic QC: e.g., geno >0.99, mind <0.02, HWE P>1x10-6, 3SD HET, MAF >0.01, INFO >0.8. Also, remove indels and multi-allelic SNPs.

3)- Avoid sample overlap, as well as high degree of relatedness between individuals of Base and Target data.

4)- Check strand and allele calls

Code Block

# 1kgph3_chr16.bed:		
# binary plink pedigree genotype file; do not try to open this file
 
# 1kgph3_chr16.bim (as appearing in R):
# with 277,663 biallelic markers (SNPs or Ins/Del markers) – 6 columns without column names: “V1”=chromosome, “V2”=marker ID, “V3”=genetic distance, “V4”=chromosomal position in bp, “V5”=Allele1, “V6”=Allele2
 
  V1          V2 V3    V4 V5       V6
1 16 rs185537431  0 60778  A        G
2 16 rs377548396  0 62569  A        G
3 16 rs368745239  0 66640  T        G
4 16 rs187053456  0 70765  A        C
5 16 rs193118147  0 70767  A        G
6 16 rs201639477  0 75246  C CTTTTTTT
 
# 1kgph3_chr16.fam (as appearing in R):
# with 476 individuals – 6 columns without column names: “V1”=Family ID, “V2”=Individual ID, “V3”=Father ID, “V4”=Mother ID, “V5”=Sex (Male=1, Female=2), “V6”=Affected status (Yes=2, No=1, Unknown=-9)
 
       V1      V2 V3 V4 V5 V6
1 HG00096 HG00096  0  0  1 -9
2 HG00097 HG00097  0  0  2 -9
3 HG00099 HG00099  0  0  2 -9
4 HG00101 HG00101  0  0  1 -9
5 HG00102 HG00102  0  0  2 -9
6 HG00103 HG00103  0  0  1 -9
 
# 1kgph3_dummybmi20200804.csv (as appearing in R):
# with the columns with column names: “V1”=Family ID, “V2”=Individual ID, “V3”=Father ID, “V4”=Mother ID, “V5”=Sex, “V6”=Affective status, “dummybmi”=phenotype of interest, “Sex” (male=1, female=0)
 
V1        V2        V3   V4   V5   V6   dummybmi     sex
HG00096   HG00096   0    0    1    -9   13.42457034    1
HG00097   HG00097   0    0    2    -9   11.29648997    0
HG00099   HG00099   0    0    2    -9   10.98411446    0
HG00101   HG00101   0    0    1    -9   8.268308741    1
HG00102   HG00102   0    0    2    -9   11.1241505     0

3.3- Required Softwares

PLINK 1.9 (http://www.cog-genomics.org/plink2/). Use the stable version (19 Oct 2020 (b6.21)).

R (version 3.2.3+) (https://cran.r-project.org/)

Page tree

Versions Compared

Old Version 3

New Version 4

Key