Workflows: Make/Ant/Snakemake
Table of Contents
Make
Basic example
Remember to use TABS before each command!
target [target ...]: [component ...]
[command 1]
[command 2]
...
Line continuation uses a backslash
Variables
Use $() or ${} to reference variables.
Grouped targets
Here is an example of a grouped target given in the documentation. Note the use
of the &: to separate the targets from the prerequisites
foo bar biz &: baz boz echo $^ > foo echo $^ > bar echo $^ > biz
Splitting long lines
- Split long lines of dependencies using
\.
Debugging
Echo recipe without execution
make <target> --just-print
will echo the commands without executing them. This is particularly useful for making sure variables have the expected values.
Ant
- Apache Ant is a build tool that can be used in place of make but is geared
towards use with Java projects.
- There is a StackOverflow answer demonstrating how to convert a makefile to an Ant XML.
Snakemake
The Snakemake developers strongly recommend the use of Conda. I have some notes on how to use Conda. Conda can be a bit heavy though, venv is a lighter weight solution and is part of the standard library and seems to work fine.
snakemake -pn(thenis for a dry run), and then when you are happy with it, you can remote thento get a live run.snakemake --keep-goingwill keep going if some rules fail.
Configuration
There is strong support for configuring pipelines with JSON. This has some syntactic sugar for working with the configuration object that gets created. There is also support for validating different data against schema. There is an example here.
Example: the expand function
from snakemake.io import expand expand("foo/{bar}.{ext}", bar=["aaa","bbb"], ext=["json","csv"]) # produces the list of strings # ['foo/aaa.json', 'foo/aaa.csv', 'foo/bbb.json', 'foo/bbb.csv']
Example: Visualization
You can generate a DAG of either the rules or the files (with --dag)
in a snakefile. The rules provide a higher level overview of the
pipline.
snakemake --rulegraph | dot -Tpng > foobar.png
Example: Hello Snakemake!
Create a directory for the project and touch a Snakefile.
mkdir snakemake_example
cd snakemake_example
touch Snakefile
In the Snakefile, specify the task, which is just to echo "Hello, Snakemake!" to a text file.
# Snakefile rule hello_world: params: greeting="Hello" output: "hello.txt" shell: "echo '{params.greeting}, Snakemake!' > {output}"
Run the workflow on a single core and print out the results to check it worked.
snakemake --cores 1 cat hello.txt
Note that you can abbreviate --cores <n> to -c<n>, e.g. -c5 to
run on five cores, and if you need to specify the name of the
snakefile to use there is --snakefile.
Example: input/output aliases
You can give files aliases but these need to come at the end of the list of files.
rule run_stan_model:
input:
CONFIGURATION_YAML,
"stan-renewal-model.stan",
script = "src/stan-renewal-runner.R"
output:
posterior_csv,
stan_data,
shell:
"Rscript {input.script}"
Example: Linting code
Python
Make sure that you have black installed, then add the following rule
to your snakemake file.
rule lint_code:
shell:
"black src"
Note that this assumes your source code is in src/.
Example: Minimal simulation study
The output of this is a histogram in a PNG.
Create a directory for the project (and some useful subdirectories and scripts) and touch a Snakefile.
mkdir snakemake_example
cd snakemake_example
touch Snakefile
touch config.yaml
mkdir src
touch src/simulate.py
touch src/compute_mean.py
touch src/plot_histogram.py
mkdir data
mkdir out
Since it is reasonable to consider the number of simulations as a
configuration parameter we will create a config.yaml file to hold
this.
N: 99
The Snakefile makes use of this when the file is hard-coded into the
snakefile. Note that you can also specify this configuration file at
the command line using the --configfile command line argument.
# Snakefile configfile: "config.yaml" histogram_png = "out/histogram.png" rule all: input: histogram_png rule simulate: output: "out/sim_data_{index}.csv" shell: "python src/simulate.py {output}" rule compute_mean: input: "out/sim_data_{index}.csv" output: "out/mean_{index}.txt" shell: "python src/compute_mean.py {input} {output}" rule plot_histogram: input: expand("out/mean_{index}.txt", index=[f"{i:02d}" for i in range(1, config['N']+1)]) output: histogram_png shell: "python src/generate_histogram.py {input} {output}"
Python scripts
# simulate.py import sys import csv import random def main(): num_samples = 10 csv_file = sys.argv[1] with open(csv_file, 'w') as file: writer = csv.writer(file) for _ in range(num_samples): writer.writerow([random.random()]) if __name__ == "__main__": main()
# compute_mean.py import sys import csv def main(): csv_file = sys.argv[1] txt_file = sys.argv[2] with open(csv_file, 'r') as file: reader = csv.reader(file) data = [float(row[0]) for row in reader] mean = sum(data) / len(data) with open(txt_file, 'w') as file: file.write(str(mean)) if __name__ == "__main__": main()
# generate_histogram.py import sys import matplotlib.pyplot as plt def main(): png_file = sys.argv[-1] txt_files = sys.argv[1:-1] data = [] for txt_file in txt_files: with open(txt_file, 'r') as file: data.append(float(file.read())) plt.figure() plt.hist(data) plt.savefig(png_file) if __name__ == "__main__": main()