QC Test Set - Methods GWAS method: REGENIE with dosages (regenie_dosages) Visualization: gwas_viz_combined.R (Manhattan plots) Contingency QC: - Binary phenotypes: MIN_CELL = min(carrier_case, carrier_ctrl) Only alternative allele carriers are considered (reference allele excluded). - Continuous phenotypes: MIN_CELL = total carrier count All samples are treated as "cases" (no case/control split). - Variants with MIN_CELL <= 3 are flagged with black rings on Manhattan plots. Recommended filtering thresholds (based on test set analysis): - Binary: MIN_CELL >= 5 (removes 93% of LOF/CNV artifacts, keeps 99% of SNV/CYP signals) - Continuous: MIN_CELL >= 10 (higher threshold for total carrier count) - No separate treatment needed for dosage vs hardcall genotypes Threshold impact (tested on positive/negative controls): simvastatin__prescribed (POSITIVE): 294/296 sig variants survive at MC>=5 (99.3%) warfarin__prescribed (POSITIVE): 94/131 sig survive (72%; removed are mostly CNV) zopiclone__M796 (NEGATIVE): 1/14 sig survive (93% removed - LOF artifacts) propranolol__prescribed (NEGATIVE): 32/46 sig survive (30% removed - CNV/LOF) What gets removed (MIN_CELL < 5): - LOF: carrier_case=1, extreme BETA (20-60). Always artifacts in ADR phenotypes. - CNV regions: carrier_case=2-4, moderate BETA. Mostly artifacts. - SNV: almost never removed (MIN_CELL typically > 100). - Notable edge case: PROS1_lof in warfarin (genuine gene, MIN_CELL=4) is borderline. Test set: 38 phenotypes (15 negative, 21 positive, 2 mixed) - negative: top variants are artifacts, should be removed by filtering - positive: top variants are genuine signals, should survive filtering - mixed: unclear or mixed results Phenotype types: - binary: case/control (ADR ICD codes, prescribed yes/no, diagnoses) - continuous: quantitative trait (optimal dose, biomarkers, measurements) Variant classes: - SNV: single nucleotide variants (chr1-22, X) - CYP: CYP pharmacogene variants (cypmicro hardcall + cypdosage dosage) - HLA: HLA variants (chr6) - LOF: loss-of-function variants (WES) - MPC: missense variants with MPC scores (WES, dosage) - CNV: copy number variants (genes + regions)