...
- Continue the next steps using the Merged.txt file.
- First, call Create the first excel Excel tab asnamed "Merged_raw_data" using the Merged.txt file.
- Second, create Create a 2nd tab and call it as named "Merged_nocont". Copy the entire data from the raw results (1st tab - Merged_raw_data) and paste it into the 2nd tab. Using the Sample IDs from the samples_to_remove.txt file you will identify the variants from the each one of the contaminated samples and manually remove their respective rows on the Merged_nocont tab.
- Create a 3rd tab and call it as named "coverage>200_both_strand". Copy the entire data from 2nd sheet (Merge_nocont tab) and paste it into the 3rd tab. After that, remove all the rows containing variants with coverage lower than 200x in both strands (verify for both CoverageFWD and CoverageREV columns).
- Create a 4th tab and call it as named "Fwd-rev_ratio". Copy the entire data from the 3rd tab (overage>200_both_strand tab) and paste it into the 4th. Next, create a new column called FWD-Rev_ratio and calculate the ratio between the CoverageFWD and CoverageREV values (columns L and M). After that, filter out all rows with variants showing Fwd/Rev ratio below 0.5 or higher than 1.5.
- Create a 5th tab and call it as remove_del. Copy the entire data from the 4th tab (Fwd-rev_ratio tab) and paste it into the 5th . Check the Ref column and exclude all the rows containing the letter N (which means deletion) on this column.
- Create a 6th tab and call it as named "remove_primer_phantom". Copy the entire data from the 5th tab (remove_del tab) and paste it into the 6th . After that, remove all the rows containing variants in the primer regions (e.g. 0-500 bp and 16000-16655 bp). Also check if there is any variant at the known phantom mutation sites (72':['G','T'], 257':['A','C'], '414':['G','T'], 3492':['A','C'], 3511':['A','C'], 4774':['T','A'], 5290':['A','T'], '9801':['G','T'], 10306':['A','C'], '10792':['A','C'], '11090':['A','C']). If yes, you should remove the rows containing these variants as well.
- Create a 7th tab and call it as named "homoplasmy_only". Copy the entire data from the 6th tab (remove_primer_phantom tab) and paste it into the 7th . Check the VariantLevel column and remove all the rows containing values lower than 95%.
#NOTE: Here you can decide for other values such as 97% or 99%, depending You have the flexibility to adjust the threshold values based on the quality of your sequencing data and the sample sizespecific requirements of your analysis.
- Create a 8th sheet in excel and name it as tab named "heteroplasmy_only". Copy the entire data fromthe 6th tab (remove_primer_phantom tab) and paste it into the 8th tab. Check the VariantLevel column and remove all the rows containing values lower than 3% and higher than 95%.
...