Workflows: Make/Ant/Snakemake
Table of Contents
Make
Basic example
Remember to use TABS before each command!
target [target ...]: [component ...]
[command 1]
[command 2]
...
Line continuation uses a backslash
Variables
Use $()
or ${}
to reference variables.
Grouped targets
Here is an example of a grouped target given in the documentation. Note the use
of the &:
to separate the targets from the prerequisites
foo bar biz &: baz boz echo $^ > foo echo $^ > bar echo $^ > biz
Splitting long lines
- Split long lines of dependencies using
\
.
Debugging
Echo recipe without execution
make <target> --just-print
will echo the commands without executing them. This is particularly useful for making sure variables have the expected values.
Ant
- Apache Ant is a build tool that can be used in place of make but is geared
towards use with Java projects.
- There is a StackOverflow answer demonstrating how to convert a makefile to an Ant XML.
Snakemake
The Snakemake developers strongly recommend the use of Conda. I have some notes on how to use Conda.
snakemake -pn
(then
is for a dry run). This is a good way to dosnakemake -n --printshellcmds
, and then when you are happy with it, you can remote then
to get a live run.snakemake --keep-going
will keep going if some fail.
Example: Hello Snakemake!
Create a directory for the project and touch a Snakefile.
mkdir snakemake_example
cd snakemake_example
touch Snakefile
In the Snakefile, specify the task, which is just to echo "Hello, Snakemake!" to a text file.
# Snakefile rule hello_world: params: greeting="Hello" output: "hello.txt" shell: "echo '{params.greeting}, Snakemake!' > {output}"
Run the workflow on a single core and print out the results to check it worked.
snakemake --cores 1 cat hello.txt
Note that you can abbreviate --cores <n>
to -c<n>
, e.g. -c5
to
run on five cores, and if you need to specify the name of the
snakefile to use there is --snakefile
.
Example: Linting code
Python
Make sure that you have black
installed, then add the following rule
to your snakemake file.
rule lint_code:
shell:
"black src"
Note that this assumes your source code is in src/
.
Example: Minimal simulation study
The output of this is a histogram in a PNG.
Create a directory for the project (and some useful subdirectories and scripts) and touch a Snakefile.
mkdir snakemake_example
cd snakemake_example
touch Snakefile
touch config.yaml
mkdir src
touch src/simulate.py
touch src/compute_mean.py
touch src/plot_histogram.py
mkdir data
mkdir out
Since it is reasonable to consider the number of simulations as a
configuration parameter we will create a config.yaml
file to hold
this.
N: 99
The Snakefile makes use of this
# Snakefile configfile: "config.yaml" histogram_png = "out/histogram.png" rule all: input: histogram_png rule simulate: output: "out/sim_data_{index}.csv" shell: "python src/simulate.py {output}" rule compute_mean: input: "out/sim_data_{index}.csv" output: "out/mean_{index}.txt" shell: "python src/compute_mean.py {input} {output}" rule plot_histogram: input: expand("out/mean_{index}.txt", index=[f"{i:02d}" for i in range(1, config['N']+1)]) output: histogram_png shell: "python src/generate_histogram.py {input} {output}"
Python scripts
# simulate.py import sys import csv import random def main(): num_samples = 10 csv_file = sys.argv[1] with open(csv_file, 'w') as file: writer = csv.writer(file) for _ in range(num_samples): writer.writerow([random.random()]) if __name__ == "__main__": main()
# compute_mean.py import sys import csv def main(): csv_file = sys.argv[1] txt_file = sys.argv[2] with open(csv_file, 'r') as file: reader = csv.reader(file) data = [float(row[0]) for row in reader] mean = sum(data) / len(data) with open(txt_file, 'w') as file: file.write(str(mean)) if __name__ == "__main__": main()
# generate_histogram.py import sys import matplotlib.pyplot as plt def main(): png_file = sys.argv[-1] txt_files = sys.argv[1:-1] data = [] for txt_file in txt_files: with open(txt_file, 'r') as file: data.append(float(file.read())) plt.figure() plt.hist(data) plt.savefig(png_file) if __name__ == "__main__": main()