AANN 06/09/2025

Table of Contents

rsync

cover-image.png

Overview

In this post, we will visit an awesome tool from the 90's that hasn't aged a day. It is rsync (remote sync), a tool that takes care of copying files between locations (often across different machines on a network) in a way that avoids unnecessary copies and preserves information about the files.

When doing computational work, we often have to do bigger tasks on a server somewhere, and git isn't always the best solution for larger files. I often find myself working on a non-trivial dataset and code which will generate a substantial volume output. I can work on the code locally and analyse a subset of the data, but when I want to analyse the whole thing properly, it all needs to go on the server. Then I need to pull some (but not all) of the output back from the server. Do this a few times and it gets annoying, so of course I'm going to script it.

Wait! Before you write yet another shell script to do the copying to-and-from, or click and drag a bunch of files, remember rsync can help.

Info

rsync is available on most Linux machines. There is a useful introduction to rsync here (and a simpler one here.) Commands will look like the following:

rsync [OPTIONS] SRC DEST

Example

My use case for this is keeping some code, data and results synced up between my local machine and a server (in this case, the server is called brahms). I have one script to sync my code and some data to the server, and another to sync the important contents of the output directory back to my local machine (without all the intermediate stuff).

Important The --dry-run flag does exactly what it says and is a nice way to check that you aren't accidentally going to mess things up!

Copying to the server

Here I wanted to send all the code and data from current directory (./ which is the root of the project) to the server (brahms:~/projects/derp-simulation), but exclude existing output and the git repository I'm working in.

rsync --archive --progress --compress \
        --exclude "out/" \
        --exclude "/.git/" \
        ./ brahms:~/projects/derp-simulation

The --archive comment tells rsync to do this recursively while also preserving file information. The --progress flag tells rsync to print progress out and --compress tells it to compress the diffs before sending them. The --exclude examples provide a way to ignore files matching those patterns.

Copying from the server

This script is also runs from the root of the project, ./, on my local machine, but pulls the output back from the server. There are a lot of intermediate files (e.g. the pickle, tree and XML files) that get generated, and I don't need them so there are exclude flags to prevent them being copied over.

rsync --archive --progress --compress \
        --exclude "*pickle" \
        --exclude "*time" \
        --exclude "*traj" \
        --exclude "*tree" \
        --exclude "*xml" \
        brahms:~/projects/derp-simulation/out/ ./out

Discussion

In this post, we have looked at an example of how rsync has helped me in to simplify the process of moving (large and numerous) files between my laptop and a server. Perhaps not the most exciting topic, but it is something that I find far less annoying thanks to rsync.

If you're not sold on the command line interface, there are graphical interfaces: Grsync looks like a good option, but I haven't really used it. Also, I have heard good things about Unison but that feels a bit heavy for my needs.

So remember, before you write yet another shell script to move files around or click and drag a bunch of stuff, keep rsync in mind.

Thanks

Thank you to the members of the developers and maintainers of rsync and to David Pascall for helpful comments on a draft of this post.

Author: Alexander E. Zarebski

Created: 2025-09-08 Mon 17:54

Validate