Updating a File from Multiple Threads in Python

Use locking to properly share resources across multiple threads

“You’re not obligated to win. You’re obligated to keep trying. To the best you can do everyday.” ― Jason Mraz

1. Introduction

Resource management in multi-threaded programs is a tricky situation. Say, for example, you have a counter variable which is read and modified by multiple threads. If you do not exercise proper care, it is possible that inconsistent values are read and/or propagated. This leads to subtle and hard-to-debug errors.

One solution to manage access to a shared resource across threads is to use a lock. A thread must acquire the lock before accessing a shared resource. If another thread is using the resource, the first thread will wait till the lock is released. When multiple threads are waiting on a lock, one of the threads will be woken up and will be able to obtain the lock.

Let us examine how this works with regard to reading and writing a file from multiple threads. We have an application with multiple worker threads, each of which need to read and update a file.

2. Read and Write a File Without Locking

For simplicity sake, let us assume that the worker function needs to read the last line of a file, increment the number found there, and write it back. As explained below, we have this arrangement to be able to track trampling of multiple threads.

This is the function and it does not use locking for accessing the shared resource.

def runner(fname):
    with open(fname, 'r') as f:
        for ln in f:
            n = int(ln)
    n += 1
    with open(fname, 'a') as f:
        f.write(str(n) + '\n')

3. Run Worker Function in Multiple Threads

And here is the main part of the program to create and start multiple threads to run the function above. As explained above, each thread tries to read and increment the value on the last line. This is set up so that we can track multiple threads interfering with each other.

For example, one thread might read the value 20. Before it increments the value to 21 and writes it back, another thread might come in and read the same value, 20. And the second thread also writes the value 21. So if multiple values are found in the output file, it means multiple threads interfered with each other in updating the shared resource.

# filename to use
fname = sys.argv[1] if len(sys.argv) > 1 else 'counter.txt'

# re-initialize file
with open(fname, 'w') as f:
    f.write('0\n')

threads = []
for i in xrange(0, 50):
    t = threading.Thread(target = runner, args = (fname, ))
    threads.append(t)
    t.start()

for t in threads:
    t.join()

On running this program, it was found that values were written correctly about 9 times out of 10. And, this type of random errors are what makes these errors hard to track down.

4. Locking Code Section for File Update

Let us now update the task worker function to obtain a lock before performing any other operations (This is somewhat similar to using the synchronized keyword in java.)

Here is the updated function. It uses a global variable lck and uses the acquire() and release() methods for the critical sections.

def runner(fname):
    global lck
    lck.acquire()
    with open(fname, 'r') as f:
        for ln in f:
            n = int(ln)
    n += 1
    with open(fname, 'a') as f:
        f.write(str(n) + '\n')
    lck.release()

5. Creating the Lock

The lock itself is created in the main program. Barring that, there are no other changes. The lock is created using threading.Lock().

lck = threading.Lock()

# filename to use
fname = sys.argv[1] if len(sys.argv) > 1 else 'counter.txt'

# re-initialize file
with open(fname, 'w') as f:
    f.write('0\n')

threads = []
for i in xrange(0, 50):
    t = threading.Thread(target = runner, args = (fname, ))
    threads.append(t)
    t.start()

for t in threads:
    t.join()

When we run this updated program, there are no more resource contentions and updates to the file are protected from tampering across multiple threads.

Conclusion

This article shows one approach for dealing with thread contention for critical resources. Without locks, threads step on each other and the results are unpredictable. Using locks serializes access to the shared resource and thread contention is eliminated.