Getting started

To run MPRAOligoDesign, first activate the snakemake environment, e-g., via:

conda activate snakemake

All inputs of the workflow are configured via a config file. For more information on the parameters see Config File.

Pre required files

Before running the workflow we need the human reference genome from UCSC (GRCh38 build) and the genome size file. Be aware this is around 3GB large after extraction. You can download them with:

wget -P resources/reference https://hgdownload.cse.ucsc.edu/goldenpath/hg38/bigZips/hg38.fa.gz
wget -P resources/reference https://hgdownload.cse.ucsc.edu/goldenpath/hg38/bigZips/hg38.fa.fai
wget -P resources/reference https://hgdownload.cse.ucsc.edu/goldenpath/hg38/bigZips/hg38.chrom.sizes

gunzip resources/reference/hg38.fa.gz

You need to set the correct path in the config file. Here you have to modify the entries genome and fasta under reference. We will use the config file config/config_quickstart.yml in this example.

Running the workflow

We can combine 4 different designing routines. variants_only, regions_only, variants_regions, and sequences_only. See Config File for more information on them. In this quickstart we will use all four routines. The input files are already provided in the data/quickstart folder. You can run the workflow with:

snakemake --cores 1 --configfile config/config_quickstart.yml

This should design oligogs for all 3 routines. The results will be stored in the results/final_design folder.