Structured, reproducible local pipeline for personal WGS analysis with two stages:
- Monogenic triage from ClinVar-enriched VEP output.
- Polygenic risk scoring (PRS) via
pgscatalog/pgsc_calc.
This repository is publish-safe by default:
- Real secrets and local machine paths are ignored by git.
- Example config files with fake values are tracked.
workflow/run_pipeline.sh: single entrypoint orchestrating stages.workflow/monogenic/: monogenic analysis scripts.workflow/prs/: PRS prep and execution scripts.workflow/utils/: utility helpers.config/*.example*: tracked templates.docs/: architecture and runbooks.results/,logs/,work/,.nextflow/: generated artifacts (ignored).
- Bootstrap local config files:
make bootstrap- Edit local files (ignored by git):
config/pipeline.envconfig/secrets.env.envconfig/prs_samplesheet.csv(or set auto-build)config/prs_pgs_ids.txt
- Run validation checks:
make validate
make prepublish- Run pipeline:
# full pipeline
make pipeline
# monogenic only
make monogenic
# PRS only
make prsPrimary runtime config: config/pipeline.env
Key variables:
SAMPLE_ID: logical sample identifier for outputs.ANNOTATED_VCF_GZ: path to ClinVar-enriched VEP VCF (.vcf.gz).RUN_MONOGENIC/RUN_PRS: enable stages in--stage allmode.MONOGENIC_MODE,MONOGENIC_MAX_AF: triage strictness.PRS_AUTOBUILD_SAMPLESHEET: set1to build fromconfig/paths.env.PRS_SAMPLESHEET,PRS_PGS_IDS_FILE: PRS inputs.PGSC_PROFILE,PGSC_MAX_MEMORY,PGSC_MIN_OVERLAP,PGSC_RESUME: pgsc_calc runtime controls.
Optional secret files:
.env(from.env.example)config/secrets.env(fromconfig/secrets.env.example)
Both are auto-loaded if present.
Ignored by default:
- Real env/secrets files (
.env,config/secrets.env,config/pipeline.env, etc.) - Local path configs (
config/paths.env,config/prs_*.csv|txt|tsvlocal variants) - Runtime/cache/output artifacts
Before publishing:
- Confirm
git statusdoes not include private/local files. - Keep only
*.example*templates for configuration. - Do not commit generated
results/orlogs/.
GitHub Actions workflow at .github/workflows/ci.yml runs:
- Python syntax checks
- Shell lint and syntax checks
- Pipeline help/entrypoint check
This project is for research/hobby analysis and is not a medical diagnostic system.
MIT. See LICENSE.