One of my projects at university requires that I upload 600-1200 files to an external development server every few days. Originally, I did this by hand, using Filezilla. Filezilla for some reason doesn’t work well with me with the particular server(s) I’m using, so out of 600 files, about 30-40 would fail, repeatedly. Then I had issues where Filezilla would skip files, trying to be smart about modification dates. That would be fine and dandy, if my server wasn’t set up for a completely different timezone to mine (on the order of being 15 hours behind me).
I’d been doing this for months, and it finally hit breaking point on Monday when a demo to the client failed miserably. The new functionality wasn’t visible, and the old functionality was completely broken. To say the least, it was embarrassing. Later, in a fury, I investigated what went wrong. Lo and behold, Filezilla had silently failed to upload about 30-40 files. The failures weren’t too bad before, because at least Filezilla told you about them. But silent failures? No way. So I propose this wonderful script:
#!/usr/bin/python
import argparse
import os
import tarfile
import tempfile
parser = argparse.ArgumentParser(
description='Upload a directory to a server via ssh.')
parser.add_argument('--version', action='version', version='%(prog)s 1.0')
parser.add_argument('-v', '--verbose', action='store_true', default=False,
dest='verbose', help='Give verbose output')
parser.add_argument('-s', '--source', action='store', default=os.getcwd(),
dest='source', help='Source directory to archive')
parser.add_argument('-d', '--destination', action='store', dest='destination',
help='Destination for the archive on the remote machine', required=True)
parser.add_argument('-c', '--connection-string', action='store',
dest='connection', required=True,
help=('Connection string to use for connecting to the remote machine. '
'E.g. "user@domain.com"'))
parser.add_argument('-i', '--ignore', action='store', dest='ignore',
help='Custom filenames to ignore')
args = parser.parse_args()
cwd = os.getcwd()
verbose = args.verbose
host = args.connection
destination = args.destination
if verbose:
print("Chdir'ing to {}".format(args.source))
os.chdir(args.source)
bad_files = ['.svn', '.hg', '.git']
bad_files.extend(args.ignore.split(','))
def file_filter(name):
for bad in bad_files:
if bad in name:
return True
return False
filename = ''
if verbose:
print('Creating temporary file...')
with tempfile.NamedTemporaryFile(delete=False, suffix='.tar') as _fileobj:
filename = _fileobj.name
if verbose:
print('Archiving directory...')
with tarfile.open(filename, mode='w:gz') as tar:
for dirpath, dirnames, filenames in os.walk(os.getcwd()):
for f in filenames:
# we don't want the real path, just relative.
directory = dirpath[len(cwd)+1:]
tar.add(os.path.join(directory, f), exclude=file_filter)
size = os.path.getsize(filename)
if verbose:
print('Uploading {} bytes with {}...'.format(size, host))
execute_string = 'cat {} | ssh {} tar xz -C {}'.format(filename,
host, destination)
if verbose:
print('executing {}'.format(execute_string))
os.system(execute_string)
if verbose:
print('Removing temporary file...')
os.remove(filename)
Using Python, tar and ssh, it uploads a gzipped version of the current folder, ignoring Svn, Mercurial and Git repositories. It smashes 1200 files (about 7mb) into 900kb, uploads it and uncompresses on the other end automatically.
This has a few benefits for me, namely:
- I can automate this.
- It ignores version control files, cutting down a huge amount of useless transfers (on the order of 12mb, or 2700 files, for my current project)
- It has powerful filters, so if I don’t want to upload images, I can skip those with a few extra characters.
- It **doesn’t fail miserably **like Filezilla. With the exception of real networking problems, it can’t skip files, and it won’t fail mysteriously on random files either.
I hope this is useful to anyone that has to do a lot of tedious reuploading of the same thing! If you think that using Python was silly, and that I could have written this with a shell script, then I agree. But I needed powerful filtering, and I wasn’t going to muck around with Bash for a script like this.
The fact that I’m using os.system feels bad, but let’s look at the alternative;
- I read the entire file into memory, with a single
open('filename').read()
- I then use one of the eleven million SSH wrappers for Python to connect to the remote client.
- I write that single, massive binary-blob holding variable to the SSH wrapper as input.
Yeah, that’s okay, but hey, I’d rather depend on the intergrity of my operating system than a heap of third party libraries. I’ve never used a system that didn’t have SSH and cat. Adding in the SSH wrapper is a needless dependency.
Final, completely unrelated note; when I started learning Python, I learnt getopt for my command line argument parsing. Boy, what a crappy decision. I learnt argparse for this script, and it is awesome. Generated usage statements? Hell yes. Automatic error handling? Double hell yes.
**Update: **I’ve uploaded the above script and another useful script, ss, to Google code as networktools.