How to Avoid Deadlock when Calling External Command from Python

Use pipes from multiple threads for bidirectional communication.

“The truth will set you free, but first it will piss you off.” ― Joe Klaas, Twelve Steps to Happiness

1. Introduction

Invoking an external process from python and interacting with it can be quite tricky. This is especially true if the interaction is duplex i.e. involving both reading and writing to it. Such an interaction can cause deadlocks since both processes can end up waiting for output from the other. One way to avoid such deadlocks is to separate the reader and writer parts to different threads. We demonstrate such an approach in this article.

We use subprocess.Popen() to call an external command from python. We covered the basics of this process in an earlier article which you can review.

2. Adding Line Numbers to a Text File

Let us look a simple way to add line numbers to a text file by having an external command add the line numbers. The parent process reads the file line by line, and writes it to the child process which reads the line and replies back with the line number added. This process illustrates repeated duplex communication between the parent and the child process.

We use the awk program on Linux as the child process and invoke it as follows to read stdin and write the output to stdout.

awk '{printf "[%5d] %s\n", NR, $0}'

3. Multi-threaded Reader

A thread is spawned for reading the input file line-by-line and writing it to the external process started via a pipe using subprocess.Popen. The function is invoked with two arguments: a file from which text is read, and a pipe with the child process at the other end.

The function reads text line-by-line till it hits EOF.

while True:
    ln = f.readline()
    if not ln: break

It writes the line to the stdin of the pipe (at the other end of which is the awk process).

pipe.stdin.write(ln)

Most important: to signal end-of-file (EOF) to the child process, you should close the pipe’s stdin. This allows the awk process to complete the writing and exit.

pipe.stdin.close()

Here is the complete reader function.

def reader(fname, pipe):
    with open(fname, 'r') as f:
        while True:
            ln = f.readline()
            if not ln: break
            pipe.stdin.write(ln)
        pipe.stdin.close()

4. Multi-threaded Writer

Another thread is created with the writer function to read output from the awk process and write it to the output file. It is invoked with the output file path and the pipe as arguments.

The function reads the output from the child in a loop, and exits when it sees EOF.

while True:
    ln = pipe.stdout.readline()
    if not ln: break

Write the line to the output file (opened previously).

f.write(ln)

Here is the complete writer function.

def writer(oname, pipe):
    with open(oname, 'w') as f:
        while True:
            ln = pipe.stdout.readline()
            if not ln: break
            f.write(ln)
        pipe.stdout.close()

5. Starting the Child Process and Creating Threads

A pipe is created using subprocess.Popen with the awk process as the child.

p = subprocess.Popen(['/usr/bin/awk', '{printf "[%5d] %s\\n", NR, $0}'], stdin=subprocess.PIPE, stdout=subprocess.PIPE)

Create and start the reader and writer threads. We picked a large input file (for add line numbers to) to illustrate multi-step reading and writing. The input file is called pride.txt and is about 13500 lines long. The output with the line numbers added is called output.txt.

threads = []
threads.append(threading.Thread(target = reader, args=('pride.txt',p,)))
threads.append(threading.Thread(target = writer, args=('output.txt',p,)))
for t in threads:
    t.start()

6. Waiting for the Threads to Complete

Once the reader and writer threads are off doing their jobs, the main thread can wait for them to complete.

print 'joining ..'
while threading.active_count() > 1:
    for t in threads:
        t.join()
print 'all done.'

7. Task Completed – Input and Output

Here is a part of the input text to which we were attempting to add line numbers.

...
PRIDE AND PREJUDICE

By Jane Austen



Chapter 1


It is a truth universally acknowledged, that a single man in possession
of a good fortune, must be in want of a wife.

However little known the feelings or views of such a man may be on his
first entering a neighbourhood, this truth is so well fixed in the minds
of the surrounding families, that he is considered the rightful property
of some one or other of their daughters.
...

And here is the output with the line numbers added:

...
[   32] PRIDE AND PREJUDICE
[   33]
[   34] By Jane Austen
[   35]
[   36]
[   37]
[   38] Chapter 1
[   39]
[   40]
[   41] It is a truth universally acknowledged, that a single man in possession
[   42] of a good fortune, must be in want of a wife.
[   43]
[   44] However little known the feelings or views of such a man may be on his
[   45] first entering a neighbourhood, this truth is so well fixed in the minds
[   46] of the surrounding families, that he is considered the rightful property
[   47] of some one or other of their daughters.
...

8. The Complete Program

Here is the complete program for review.

import subprocess, threading

def reader(fname, pipe):
    with open(fname, 'r') as f:
        while True:
            ln = f.readline()
            if not ln: break
            pipe.stdin.write(ln)
        pipe.stdin.close()

def writer(oname, pipe):
    with open(oname, 'w') as f:
        while True:
            ln = pipe.stdout.readline()
            if not ln: break
            f.write(ln)
        pipe.stdout.close()

p = subprocess.Popen(['/usr/bin/awk', '{printf "[%5d] %s\\n", NR, $0}'], stdin=subprocess.PIPE, stdout=subprocess.PIPE)
threads = []
threads.append(threading.Thread(target = reader, args=('pride.txt',p,)))
threads.append(threading.Thread(target = writer, args=('output.txt',p,)))
for t in threads:
    t.start()

print 'joining ..'
while threading.active_count() > 1:
    for t in threads:
        t.join()
print 'all done.'

Review

In this article, we showed how to use multi-threading to simultaneously read and write to an external command from python. It is indeed necessary to use multi-threading for this task to avoid deadlocks.