Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

## All materials are located in the following links: haplocheck_results, haplogroups_workshopsamples.txt, HG0096c_test.bam, HG0097c_test.bam, HG0099c_test.bam, HG0100c_test.bam, HG0101c_test.bam, HG0102c_test.bam, HG0103c_test.bam, HG0105c_test.bam, HG0106c_test.bam, HG0107c_test.bam, Merged.txt, Merged.vcf.gz, Workshop_samples_05-17-23_nocont_homo_common.bim, AnnotatedVariants.txt.

## ***Powerpoint slides for this workshop: Workshop_mtDNA_QC_analysis.pptx

...

  • Download reference files.
  • Module load Java (e.g. module load lang/Java/11.0.6), Singularity, and install nextflow and mutserve. 
  • Clone your pipeline into your work directory: e.g. git clone pipeline_link_depository

...

  1. Continue the next steps using the Merged.txt file.
  2. First, call Create the first excel Excel tab asnamed "Merged_raw_data" using the Merged.txt file.
  3. Second, create Create a 2nd tab and call it as named "Merged_nocont". Copy the entire data from the raw results (1st tab - Merged_raw_data) and paste it into the 2nd tab. Using the Sample IDs from the samples_to_remove.txt file you will identify the variants from the each one of the contaminated samples and manually remove their respective rows on the Merged_nocont tab.
  4. Create a 3rd tab and call it as named "coverage>200_both_strand". Copy the entire data from 2nd sheet (Merge_nocont tab) and paste it into the 3rd tab. After that, remove all the rows containing variants with coverage lower than 200x in both strands (verify for both CoverageFWD and CoverageREV columns).
  5. Create a 4th tab and call it as named "Fwd-rev_ratio". Copy the entire data from the 3rd tab (overage>200_both_strand tab) and paste it into the 4th. Next, create a new column called FWD-Rev_ratio and calculate the ratio between the CoverageFWD and CoverageREV values (columns L and M). After that, filter out all rows with variants showing Fwd/Rev ratio below 0.5 or higher than 1.5.
  6. Create a 5th tab and call it as remove_del. Copy the entire data from the 4th tab (Fwd-rev_ratio tab) and paste it into the 5th . Check the Ref column and exclude all the rows containing the letter N (which means deletion) on this column.
  7. Create a 6th tab and call it as named "remove_primer_phantom". Copy the entire data from the 5th tab (remove_del tab) and paste it into the 6th . After that, remove all the rows containing variants in the primer regions (e.g. 0-500 bp and 16000-16655 bp). Also check if there is any variant at the known phantom mutation sites (72':['G','T'], 257':['A','C'], '414':['G','T'], 3492':['A','C'], 3511':['A','C'], 4774':['T','A'], 5290':['A','T'], '9801':['G','T'], 10306':['A','C'], '10792':['A','C'], '11090':['A','C']). If yes, you should remove the rows containing these variants as well.
  8. Create a 7th tab and call it as named "homoplasmy_only". Copy the entire data from the 6th tab (remove_primer_phantom tab) and paste it into the 7th . Check the VariantLevel column and remove all the rows containing values lower than 95%. 

    #NOTE: Here you can decide for other values such as 97% or 99%, depending You have the flexibility to adjust the threshold values based on the quality of your sequencing data and the sample sizespecific requirements of your analysis.

  9. Create a 8th sheet in excel and name it as heteroplasmy tab named "heteroplasmy_only". Copy the entire data fromthe 6th tab (remove_primer_phantom tab) and paste it into the 8th tab. Check the VariantLevel column and remove all the rows containing values lower than 3% and higher than 95%.

...

The effect of mutations-caused amino acid changes on protein function was predicted by a combination of tools that use sequence homology, evolutionary conservation, and protein structural information.The

  1. To run the functional analysis, first you need to create the variantsfile.txt

...

  1. . For that, you will utilize the information from the Workshop_samples_05-17-23_nocont_homo_common.bim file obtained

...

  1. in the previous step.

...

  1. The variantsfile.txt file should

...

  1. be structured with two columns

...

  1. , namely Pos and Variant

...

  1. . For further reference, please consult Figure 5.

Image Added

Figure 5 - variantsfile.txt example

2. Upload the variantsfile.txt into your work directory in the SCC cluster and run the command below.

./mutserve annotate --input variantsfile.txt --annotation rCRS_annotation_2020-08-20.txt --output AnnotatedVariants.txt

...