AANN 06/09/2025

rsync
- Overview
- Background
- Example
- Discussion
- Thanks

`rsync`

Overview

In this post, we will visit an awesome tool from the 90's that hasn't aged a day. It is rsync (remote sync), a tool that takes care of copying files between locations (often across different machines on a network) in a way that avoids unnecessary copies and preserves information about the files.

In computational work, we usually do bigger tasks a server, and git isn't always the best way to share larger files. I often find myself working on a non-trivial dataset and code which will generate a substantial volume of output. I can work on the code locally and analyse a subset of the data, but when I want to analyse the whole thing properly, it all needs to go on the server. Then I need to pull some (but not all) of the output back from the server. Do this a few times and it gets annoying remembering which files are involved in which steps, so of course I'm going to script it.

Wait! Before you write yet another shell script to do the copying to-and-from, or click and drag a bunch of files, remember rsync can help.

Background

rsync is available on most Linux machines. There is a useful introduction to rsync here (and a simpler one here.) Commands will look like the following:

rsync [OPTIONS] SRC DEST

Example

My use case for this is keeping code, data and results synced up between my local machine and a server (in this case, the server is called brahms). I have one script to sync my code and data to the server, and another to sync the important output back to my local machine (without all the intermediate output that I don't really care about.)

Important The --dry-run flag does exactly what it says and is a nice way to check that you aren't accidentally going to mess things up!

Copying to the server

Here I wanted to send all the code and data from current directory (./ which is the root of the project) to the server (brahms:~/projects/derp-simulation), but to exclude existing output and the git repository I'm working in.

rsync --archive --progress --compress \
        --exclude "out/" \
        --exclude "/.git/" \
        ./ brahms:~/projects/derp-simulation

The --archive comment tells rsync to do this recursively while also preserving file information. The --progress flag tells rsync to print progress out and --compress tells it to compress the diffs before sending them (to reduce network usage). The --exclude examples provide a way to ignore files matching those patterns.

There is an --include flag to tell rsync to include files that would be excluded by subsequent --exclude flags, but the order of the flags matters here so be careful. Dry runs will help in setting this up correctly…

Copying from the server

This script is also runs from the root of the project, ./, on my local machine, but pulls the output back from the server. There are a lot of intermediate files (e.g. the pickle, tree and XML files) that get generated, and I don't need them so there are exclude flags to prevent them being copied over.

rsync --archive --progress --compress \
        --exclude "*pickle" \
        --exclude "*time" \
        --exclude "*traj" \
        --exclude "*tree" \
        --exclude "*xml" \
        brahms:~/projects/derp-simulation/out/ ./out

Discussion

In this post, we have looked at an example of how rsync has helped to simplify moving (large and numerous) files between my laptop and a server. Perhaps not the most exciting topic, but it is something that I find far less annoying when using rsync.

If you're not sold on the command line interface, there are graphical interfaces: Grsync looks like a good option, and there is a pure python clone, but I haven't used either. Also, I have heard good things about Unison but that feels a bit heavy for my needs.

So remember, before you write yet another shell script to move files around or click and drag a bunch of stuff, keep rsync in mind.

Thanks

Thank you to the members of the developers and maintainers of rsync and to David Pascall for helpful comments on a draft of this post.