Workflows: Make/Ant/Snakemake

Make
Ant
Snakemake

Make

Basic example

Remember to use TABS before each command!

target [target ...]: [component ...]
  [command 1]
        [command 2]
  ...

Line continuation uses a backslash

Variables

Use $() or ${} to reference variables.

Grouped targets

Here is an example of a grouped target given in the documentation. Note the use of the &: to separate the targets from the prerequisites

foo bar biz &: baz boz
        echo $^ > foo
        echo $^ > bar
        echo $^ > biz

Splitting long lines

Split long lines of dependencies using \.

Debugging

Echo recipe without execution

make <target> --just-print

will echo the commands without executing them. This is particularly useful for making sure variables have the expected values.

Ant

Apache Ant is a build tool that can be used in place of make but is geared towards use with Java projects.
- There is a StackOverflow answer demonstrating how to convert a makefile to an Ant XML.

Snakemake

The Snakemake developers strongly recommend the use of Conda. I have some notes on how to use Conda. Conda can be a bit heavy though, venv is a lighter weight solution and is part of the standard library.

snakemake -pn (the n is for a dry run). This is a good way to do snakemake -n --printshellcmds, and then when you are happy with it, you can remote the n to get a live run.
snakemake --keep-going will keep going if some fail.

Example: Visualization

You can generate a DAG of either the rules or the files (with --dag) in a snakefile. The rules provide a higher level overview of the pipline.

snakemake --rulegraph | dot -Tpng > foobar.png

Example: Hello Snakemake!

Create a directory for the project and touch a Snakefile.

mkdir snakemake_example
cd snakemake_example
touch Snakefile

In the Snakefile, specify the task, which is just to echo "Hello, Snakemake!" to a text file.

# Snakefile
rule hello_world:
    params:
        greeting="Hello"
    output:
        "hello.txt"
    shell:
        "echo '{params.greeting}, Snakemake!' > {output}"

Run the workflow on a single core and print out the results to check it worked.

snakemake --cores 1
cat hello.txt

Note that you can abbreviate --cores <n> to -c<n>, e.g. -c5 to run on five cores, and if you need to specify the name of the snakefile to use there is --snakefile.

Example: input/output aliases

You can give files aliases but these need to come at the end of the list of files.

rule run_stan_model:
    input:
        CONFIGURATION_YAML,
        "stan-renewal-model.stan",
        script = "src/stan-renewal-runner.R"
    output:
        posterior_csv,
        stan_data,
    shell:
        "Rscript {input.script}"

Example: Linting code

Python

Make sure that you have black installed, then add the following rule to your snakemake file.

rule lint_code:
    shell:
        "black src"

Note that this assumes your source code is in src/.

Example: Minimal simulation study

The output of this is a histogram in a PNG.

Create a directory for the project (and some useful subdirectories and scripts) and touch a Snakefile.

mkdir snakemake_example
cd snakemake_example
touch Snakefile
touch config.yaml
mkdir src
touch src/simulate.py
touch src/compute_mean.py
touch src/plot_histogram.py
mkdir data
mkdir out

Since it is reasonable to consider the number of simulations as a configuration parameter we will create a config.yaml file to hold this.

N: 99

The Snakefile makes use of this

# Snakefile
configfile: "config.yaml"
histogram_png = "out/histogram.png"

rule all:
    input:
        histogram_png

rule simulate:
    output:
        "out/sim_data_{index}.csv"
    shell:
        "python src/simulate.py {output}"

rule compute_mean:
    input:
        "out/sim_data_{index}.csv"
    output:
        "out/mean_{index}.txt"
    shell:
        "python src/compute_mean.py {input} {output}"

rule plot_histogram:
    input:
        expand("out/mean_{index}.txt", index=[f"{i:02d}" for i in range(1, config['N']+1)])
    output:
        histogram_png
    shell:
        "python src/generate_histogram.py {input} {output}"

Python scripts

# simulate.py
import sys
import csv
import random

def main():
    num_samples = 10
    csv_file = sys.argv[1]

    with open(csv_file, 'w') as file:
        writer = csv.writer(file)
        for _ in range(num_samples):
            writer.writerow([random.random()])

if __name__ == "__main__":
    main()

# compute_mean.py
import sys
import csv

def main():
    csv_file = sys.argv[1]
    txt_file = sys.argv[2]

    with open(csv_file, 'r') as file:
        reader = csv.reader(file)
        data = [float(row[0]) for row in reader]

    mean = sum(data) / len(data)
    with open(txt_file, 'w') as file:
        file.write(str(mean))

if __name__ == "__main__":
    main()

# generate_histogram.py
import sys
import matplotlib.pyplot as plt

def main():
    png_file = sys.argv[-1]
    txt_files = sys.argv[1:-1]

    data = []
    for txt_file in txt_files:
        with open(txt_file, 'r') as file:
            data.append(float(file.read()))

    plt.figure()
    plt.hist(data)
    plt.savefig(png_file)

if __name__ == "__main__":
    main()