Python Script for Automatically Copying Modified Files

I had a situation at work where I needed to copy static resources from one directory to another, any time they changed. A simple symlink wouldn’t do, because the destination location was on a shared drive, and not all users would have access to my machine. Although, there are existing solutions for this problem, it seemed a simple enough problem to write my own python script. Today’s article will describe my python script for monitoring file changes and copying those files to another location.

How do it…

The complete script is available on as a gist. It is a commandline script that uses the built in python argparse to parse three arguments: source, destination, and source_map:

import argparse
import os
import time

if __name__ == "__main__":
    parser = argparse.ArgumentParser(description='Listen for file changes and mirror changed files to a second location.')
    parser.add_argument('-s', '--source', dest='source', action='store',
        default=None, help='The source file', required=False)
    parser.add_argument('-d', '--destination', dest='destination',
        action='store', default=None, help='The destination file',
        required=False)
    parser.add_argument('-m', '--source_map', dest='source_map', action='store',
        default=None, help='A CSV file mapping multiple source files to '
                           'multiple targets', required=False)
    args = parser.parse_args()

    if not (args.source_map or (args.source and args.destination)):
        raise ValueError(
            "You must provide either a source_map of files or "
            "a source and destination file")

Then it creates a list of files that need to be monitored for changes:

    file_list = []

    if args.source and args.destination:
        file_list.append((args.source, args.destination,))

    if args.source_map:
        source_map = os.path.normpath(args.source_map)
        with open(source_map, 'rb') as f:
            for line in f.readlines():
                file_list.append(tuple(line.strip().split(",")))

The meat of the function is to loop indefinitely, while checking to see if the modify time for any of the source files has changed, and updating the destination files when that happens:

    last_checked_map = {}

    # currently only ctrl+c will terminate
    while (True):
        for t in file_list:
            source_file = os.path.normpath(t[0])
            destination_file = os.path.normpath(t[1])
            try:
                stat = os.stat(source_file)
            except OSError as e:
                print "Encountered a OSError, skipping file:"
                print e
                continue
            last_time = last_checked_map.get(source_file)

            if not last_time or stat.st_mtime > last_time:
                f = open(source_file, 'rb')
                filedata = f.read()
                f.close()
                with open(destination_file, 'w') as f:
                    f.writelines(filedata)
                last_checked_map[source_file] = stat.st_mtime
                print "File %s changed, updated %s" % (
                    source_file, destination_file)

        time.sleep(1)

How it works…

The argparse defines and checks the arguments, then creates an object that can be used to reference the arguments. We raise an exception when source and destination or source_map are not provided, as these define what files to monitor and where to copy them to, if they change.

The provided files are appended to a list of tuples, where the first value is the source and the second value is the destination. The file specified by source_map allows for many files to be monitored, and should be a CSV with source_path,destination_path on each line.

The infinite while loop iterates over the files, using os.stat to see if the modified time has changed. The last modified time for the source file is stored in the last_checked_map dictionary, with the path to the source file as its key. If for some reason a source file does not exist, we print the error, but continue. The logic is written so that the source files always copy to the destination when the script starts.

To copy the source file, we read its data completely and then write it (line by line) to the destination file, using basic python file functions. The for loop is then slept for a second, to free up the processor.

For now, I just use ctrl+c to stop the script.

There’s more…

This is a basic script and probably a good homework assignment for a beginning class for Python, but it is also really useful. There is a lot of room for improvement, such as monitoring all files in an entire directory tree, listening for standard input to stop the script, or supporting the running of a command against the source file before populating the destination file. The source code is on github, so please clone, use, and improve.

Encrypting Settings Files

When I was developing for Votizen, I always felt uneasy that we had our settings files with many important passwords and auth keys stored unencrypted on a third-party service. Sure, SSL was used for fetching and committing code, and only a few SSH public keys were allowed access to our repository, but it is one more opportunity for your site to be hacked. I recently read John Resig’s post Keeping Passwords in Source ...

Django Template Tags for JavaScript Deferment

I have been slowly working to improve my Django Shared project, which I use as the basis for all my Django projects. Recently, I added several new templatetags for deferring content and scripts: async_script, defer_html, defer_script, and render_deferred_html. Today’s article will cover how to use these templatetags in your own projects.

Getting ready

You will need to include django-shared in your own project, or extract the parts from templatetags/common.py. If you use pip, ...

A Python JSON Client for the LinkedIn API

The LinkedIn API is fairly robust and well documented, but is lacking a good JSON-based Python API for interacting with it. I recently opened-sourced LinkedIn-API-JSON-Client to fill this gap. It currently implements all the user profile related API calls, and is used in production by Votizen.com. This is a simple tutorial for how you can use it for your application as well.

Getting ready

You will need Python 2.x running in a ...

Using Sphinx to Easily Manage Engineering Documents

As an engineering organization grows, eventually there is a need to document more than just code comments. When this happens, there are many solutions for handling documentation, and most of them are equally bad. Previously, at Votizen.com we used a wiki, but it was difficult to organize, search, and links/documents rot. Recently, we chose to ditch the wiki, converting our documents to reStructuredText (.rst), and instead use the tool Sphinx to compile our documents ...

Django Foreign Key Object Patcher

This article may be helpful for optimizing query performance when fetching foreign key objects in Django on large tables. Most of the time using the select_related queryset function is enough to group the population of foreign key objects into the original query. However, if you are using a relational database and the tables used in select_related become sufficiently large (1m+ rows), then select_related will begin to perform very poorly. Today, we will discuss a generic ...

Deployment/Monitoring Strategies

This article finishes the series on building a continuous deployment environment using Python and Django.

If you have been following along, hopefully you're already on your way to building a continuous deployment environment. The final touch is to setup a deployment and monitoring ...

Using Celery to Handle Asynchronous Processes

This article continues the series on building a continuous deployment environment using Python and Django.

Those of you following along, now have the tools to setup a Python/Django project, fully test it, and deploy it. Today we will be discussing the Celery package, ...

Using Fabric for Painless Scripting

This article continues the series on building a continuous deployment environment using Python and Django.

If you have been following along, you now have to tools to setup a Python/Django project and fully test it. Today we will be discussing the Fabric package, ...

Coverage and Mock

This article continues the series on building a continuous deployment environment using Python and Django.

So far we have covered the basics of setting up a Django project and testing it. Today we will discuss how to ensure your tests fully cover the ...