Page History

...

Continue the next steps using the Merged.txt file.
First, call Create the first excel Excel tab asnamed "Merged_raw_data" using the Merged.txt file.
Second, create Create a 2^nd tab and call it as named "Merged_nocont". Copy the entire data from the raw results (1^st tab - Merged_raw_data) and paste it into the 2^nd tab. Using the Sample IDs from the samples_to_remove.txt file you will identify the variants from the each one of the contaminated samples and manually remove their respective rows on the Merged_nocont tab.
Create a 3^rd tab and call it as named "coverage>200_both_strand". Copy the entire data from 2^nd sheet (Merge_nocont tab) and paste it into the 3^rd tab. After that, remove all the rows containing variants with coverage lower than 200x in both strands (verify for both CoverageFWD and CoverageREV columns).
Create a 4^th tab and call it as named "Fwd-rev_ratio". Copy the entire data from the 3^rd tab (overage>200_both_strand tab) and paste it into the 4^th. Next, create a new column called FWD-Rev_ratio and calculate the ratio between the CoverageFWD and CoverageREV values (columns L and M). After that, filter out all rows with variants showing Fwd/Rev ratio below 0.5 or higher than 1.5.
Create a 5^th tab and call it as remove_del. Copy the entire data from the 4^th tab (Fwd-rev_ratio tab) and paste it into the 5^th . Check the Ref column and exclude all the rows containing the letter N (which means deletion) on this column.
Create a 6^th tab and call it as named "remove_primer_phantom". Copy the entire data from the 5^th tab (remove_del tab) and paste it into the 6^th . After that, remove all the rows containing variants in the primer regions (e.g. 0-500 bp and 16000-16655 bp). Also check if there is any variant at the known phantom mutation sites (72':['G','T'], 257':['A','C'], '414':['G','T'], 3492':['A','C'], 3511':['A','C'], 4774':['T','A'], 5290':['A','T'], '9801':['G','T'], 10306':['A','C'], '10792':['A','C'], '11090':['A','C']). If yes, you should remove the rows containing these variants as well.
Create a 7^th tab and call it as named "homoplasmy_only". Copy the entire data from the 6^th tab (remove_primer_phantom tab) and paste it into the 7^th . Check the VariantLevel column and remove all the rows containing values lower than 95%.
#NOTE: Here you can decide for other values such as 97% or 99%, depending You have the flexibility to adjust the threshold values based on the quality of your sequencing data and the sample sizespecific requirements of your analysis.
Create a 8^th sheet in excel and name it as tab named "heteroplasmy_only". Copy the entire data fromthe 6^th tab (remove_primer_phantom tab) and paste it into the 8^th tab. Check the VariantLevel column and remove all the rows containing values lower than 3% and higher than 95%.

...

Page tree

Versions Compared

Old Version 39

New Version 40

Key