Workflows: Make/Ant/Snakemake

Home

Table of Contents

Make

Basic example

Remember to use TABS before each command!

target [target ...]: [component ...]
  [command 1]
        [command 2]
  ...

Line continuation uses a backslash

Variables

Use $() or ${} to reference variables.

Grouped targets

Here is an example of a grouped target given in the documentation. Note the use of the &: to separate the targets from the prerequisites

foo bar biz &: baz boz
        echo $^ > foo
        echo $^ > bar
        echo $^ > biz

Splitting long lines

  • Split long lines of dependencies using \.

Debugging

Echo recipe without execution

make <target> --just-print

will echo the commands without executing them. This is particularly useful for making sure variables have the expected values.

Ant

apache-ant-logo.png

  • Apache Ant is a build tool that can be used in place of make but is geared towards use with Java projects.

Snakemake

snakemake-logo.png

The Snakemake developers strongly recommend the use of Conda. I have some notes on how to use Conda. Conda can be a bit heavy though, venv is a lighter weight solution and is part of the standard library and seems to work fine.

  • snakemake -pn (the n is for a dry run), and then when you are happy with it, you can remote the n to get a live run.
  • snakemake --keep-going will keep going if some rules fail.

Configuration

There is strong support for configuring pipelines with JSON. This has some syntactic sugar for working with the configuration object that gets created. There is also support for validating different data against schema. There is an example here.

Example: the expand function

from snakemake.io import expand
expand("foo/{bar}.{ext}", bar=["aaa","bbb"], ext=["json","csv"])
# produces the list of strings
# ['foo/aaa.json', 'foo/aaa.csv', 'foo/bbb.json', 'foo/bbb.csv']

Example: Visualization

You can generate a DAG of either the rules or the files (with --dag) in a snakefile. The rules provide a higher level overview of the pipline.

snakemake --rulegraph | dot -Tpng > foobar.png

Example: Hello Snakemake!

Create a directory for the project and touch a Snakefile.

mkdir snakemake_example
cd snakemake_example
touch Snakefile

In the Snakefile, specify the task, which is just to echo "Hello, Snakemake!" to a text file.

# Snakefile
rule hello_world:
    params:
        greeting="Hello"
    output:
        "hello.txt"
    shell:
        "echo '{params.greeting}, Snakemake!' > {output}"

Run the workflow on a single core and print out the results to check it worked.

snakemake --cores 1
cat hello.txt

Note that you can abbreviate --cores <n> to -c<n>, e.g. -c5 to run on five cores, and if you need to specify the name of the snakefile to use there is --snakefile.

Example: input/output aliases

You can give files aliases but these need to come at the end of the list of files.

rule run_stan_model:
    input:
        CONFIGURATION_YAML,
        "stan-renewal-model.stan",
        script = "src/stan-renewal-runner.R"
    output:
        posterior_csv,
        stan_data,
    shell:
        "Rscript {input.script}"

Example: Linting code

Python

Make sure that you have black installed, then add the following rule to your snakemake file.

rule lint_code:
    shell:
        "black src"

Note that this assumes your source code is in src/.

Example: Minimal simulation study

The output of this is a histogram in a PNG.

Create a directory for the project (and some useful subdirectories and scripts) and touch a Snakefile.

mkdir snakemake_example
cd snakemake_example
touch Snakefile
touch config.yaml
mkdir src
touch src/simulate.py
touch src/compute_mean.py
touch src/plot_histogram.py
mkdir data
mkdir out

Since it is reasonable to consider the number of simulations as a configuration parameter we will create a config.yaml file to hold this.

N: 99

The Snakefile makes use of this when the file is hard-coded into the snakefile. Note that you can also specify this configuration file at the command line using the --configfile command line argument.

# Snakefile
configfile: "config.yaml"
histogram_png = "out/histogram.png"

rule all:
    input:
        histogram_png

rule simulate:
    output:
        "out/sim_data_{index}.csv"
    shell:
        "python src/simulate.py {output}"

rule compute_mean:
    input:
        "out/sim_data_{index}.csv"
    output:
        "out/mean_{index}.txt"
    shell:
        "python src/compute_mean.py {input} {output}"

rule plot_histogram:
    input:
        expand("out/mean_{index}.txt", index=[f"{i:02d}" for i in range(1, config['N']+1)])
    output:
        histogram_png
    shell:
        "python src/generate_histogram.py {input} {output}"

Python scripts

# simulate.py
import sys
import csv
import random

def main():
    num_samples = 10
    csv_file = sys.argv[1]

    with open(csv_file, 'w') as file:
        writer = csv.writer(file)
        for _ in range(num_samples):
            writer.writerow([random.random()])

if __name__ == "__main__":
    main()
# compute_mean.py
import sys
import csv

def main():
    csv_file = sys.argv[1]
    txt_file = sys.argv[2]

    with open(csv_file, 'r') as file:
        reader = csv.reader(file)
        data = [float(row[0]) for row in reader]

    mean = sum(data) / len(data)
    with open(txt_file, 'w') as file:
        file.write(str(mean))

if __name__ == "__main__":
    main()
# generate_histogram.py
import sys
import matplotlib.pyplot as plt

def main():
    png_file = sys.argv[-1]
    txt_files = sys.argv[1:-1]

    data = []
    for txt_file in txt_files:
        with open(txt_file, 'r') as file:
            data.append(float(file.read()))

    plt.figure()
    plt.hist(data)
    plt.savefig(png_file)

if __name__ == "__main__":
    main()

Author: Alexander E. Zarebski

Created: 2026-01-20 Tue 11:13

Validate