AANN 06/09/2025
rsync
Overview
In this post, we will visit an awesome tool from the 90's that hasn't aged a day. It is rsync (remote sync), a tool that takes care of copying files between locations (often across different machines on a network) in a way that avoids unnecessary copies and preserves information about the files.
When doing computational work, we often have to do bigger tasks on a server somewhere, and git isn't always the best solution for larger files. I often find myself working on a non-trivial dataset and code which will generate a substantial volume output. I can work on the code locally and analyse a subset of the data, but when I want to analyse the whole thing properly, it all needs to go on the server. Then I need to pull some (but not all) of the output back from the server. Do this a few times and it gets annoying, so of course I'm going to script it.
Wait! Before you write yet another shell script to do the copying
to-and-from, or click and drag a bunch of files, remember rsync
can
help.
Info
Example
My use case for this is keeping some code, data and results synced up
between my local machine and a server (in this case, the server is
called brahms
). I have one script to sync my code and some data to
the server, and another to sync the important contents of the output
directory back to my local machine (without all the intermediate
stuff).
Important The --dry-run
flag does exactly what it says and is a
nice way to check that you aren't accidentally going to mess things
up!
Copying to the server
Here I wanted to send all the code and data from current directory
(./
which is the root of the project) to the server
(brahms:~/projects/derp-simulation
), but exclude existing output and
the git repository I'm working in.
rsync --archive --progress --compress \ --exclude "out/" \ --exclude "/.git/" \ ./ brahms:~/projects/derp-simulation
The --archive
comment tells rsync to do this recursively while also
preserving file information. The --progress
flag tells rsync to
print progress out and --compress
tells it to compress the diffs
before sending them. The --exclude
examples provide a way to ignore
files matching those patterns.
Copying from the server
This script is also runs from the root of the project, ./
, on my
local machine, but pulls the output back from the server. There are a
lot of intermediate files (e.g. the pickle, tree and XML files) that
get generated, and I don't need them so there are exclude flags to
prevent them being copied over.
rsync --archive --progress --compress \ --exclude "*pickle" \ --exclude "*time" \ --exclude "*traj" \ --exclude "*tree" \ --exclude "*xml" \ brahms:~/projects/derp-simulation/out/ ./out
Discussion
In this post, we have looked at an example of how rsync has helped me in to simplify the process of moving (large and numerous) files between my laptop and a server. Perhaps not the most exciting topic, but it is something that I find far less annoying thanks to rsync.
If you're not sold on the command line interface, there are graphical interfaces: Grsync looks like a good option, but I haven't really used it. Also, I have heard good things about Unison but that feels a bit heavy for my needs.
So remember, before you write yet another shell script to move files
around or click and drag a bunch of stuff, keep rsync
in mind.
Thanks
Thank you to the members of the developers and maintainers of rsync and to David Pascall for helpful comments on a draft of this post.