Page History

...

## All materials are located in the following links: haplocheck_results, haplogroups_workshopsamples.txt, HG0096c_test.bam, HG0097c_test.bam, HG0099c_test.bam, HG0100c_test.bam, HG0101c_test.bam, HG0102c_test.bam, HG0103c_test.bam, HG0105c_test.bam, HG0106c_test.bam, HG0107c_test.bam, Merged.txt, Merged.vcf.gz, Workshop_samples_05-17-23_nocont_homo_common.bim, AnnotatedVariants.txt.

## ***Powerpoint slides for this workshop: Workshop_mtDNA_QC_analysis.pptx

...

Download reference files.
Module load Java (e.g. module load lang/Java/11.0.6), Singularity, and install nextflow and mutserve.
Clone your pipeline into your work directory: e.g. git clone pipeline_link_depository

...

Continue the next steps using the Merged.txt file.
First, call Create the first excel Excel tab asnamed "Merged_raw_data" using the Merged.txt file.
Second, create Create a 2^nd tab and call it as named "Merged_nocont". Copy the entire data from the raw results (1^st tab - Merged_raw_data) and paste it into the 2^nd tab. Using the Sample IDs from the samples_to_remove.txt file you will identify the variants from the each one of the contaminated samples and manually remove their respective rows on the Merged_nocont tab.
Create a 3^rd tab and call it as named "coverage>200_both_strand". Copy the entire data from 2^nd sheet (Merge_nocont tab) and paste it into the 3^rd tab. After that, remove all the rows containing variants with coverage lower than 200x in both strands (verify for both CoverageFWD and CoverageREV columns).
Create a 4^th tab and call it as named "Fwd-rev_ratio". Copy the entire data from the 3^rd tab (overage>200_both_strand tab) and paste it into the 4^th. Next, create a new column called FWD-Rev_ratio and calculate the ratio between the CoverageFWD and CoverageREV values (columns L and M). After that, filter out all rows with variants showing Fwd/Rev ratio below 0.5 or higher than 1.5.
Create a 5^th tab and call it as remove_del. Copy the entire data from the 4^th tab (Fwd-rev_ratio tab) and paste it into the 5^th . Check the Ref column and exclude all the rows containing the letter N (which means deletion) on this column.
Create a 6^th tab and call it as named "remove_primer_phantom". Copy the entire data from the 5^th tab (remove_del tab) and paste it into the 6^th . After that, remove all the rows containing variants in the primer regions (e.g. 0-500 bp and 16000-16655 bp). Also check if there is any variant at the known phantom mutation sites (72':['G','T'], 257':['A','C'], '414':['G','T'], 3492':['A','C'], 3511':['A','C'], 4774':['T','A'], 5290':['A','T'], '9801':['G','T'], 10306':['A','C'], '10792':['A','C'], '11090':['A','C']). If yes, you should remove the rows containing these variants as well.
Create a 7^th tab and call it as named "homoplasmy_only". Copy the entire data from the 6^th tab (remove_primer_phantom tab) and paste it into the 7^th . Check the VariantLevel column and remove all the rows containing values lower than 95%.
#NOTE: Here you can decide for other values such as 97% or 99%, depending You have the flexibility to adjust the threshold values based on the quality of your sequencing data and the sample sizespecific requirements of your analysis.
Create a 8^th sheet in excel and name it as heteroplasmy tab named "heteroplasmy_only". Copy the entire data fromthe 6^th tab (remove_primer_phantom tab) and paste it into the 8^th tab. Check the VariantLevel column and remove all the rows containing values lower than 3% and higher than 95%.

...

The effect of mutations-caused amino acid changes on protein function was predicted by a combination of tools that use sequence homology, evolutionary conservation, and protein structural information.The

To run the functional analysis, first you need to create the variantsfile.txt

...

. For that, you will utilize the information from the Workshop_samples_05-17-23_nocont_homo_common.bim file obtained

...

in the previous step.

...

The variantsfile.txt file should

...

be structured with two columns

...

, namely Pos and Variant

...

. For further reference, please consult Figure 5.

Image Added

Figure 5 - variantsfile.txt example

2. Upload the variantsfile.txt into your work directory in the SCC cluster and run the command below.

./mutserve annotate --input variantsfile.txt --annotation rCRS_annotation_2020-08-20.txt --output AnnotatedVariants.txt

...

Page tree

Versions Compared

Old Version 37

New Version Current

Key

Figure 5 - variantsfile.txt example