Recent Changes - Search:

Tutorial: Scientific Python for Distributed / Parallel Computing

Seminar given at CMM by Juan-Carlos Maureira. Get the Slides in Spanish only.


This tutorial aims at introducing concepts of parallel and distributed computing for Python. Note that while this tutorial gives hints of how to improve the performance of python in particular problems, you must realize that python is an "interpreted" language, ergo, it will not perform fast always. Remember that CPython interpreter is single-threaded and the lazy typification of variables will always make Python slower that compiled and hard typified languages.

Basic Concepts

Distributed and Sheared Memory.

An important starting point is to distinguish the difference between a shared memory system from a distributed memory system in order to understand how python will take advantage from them. The first one is nothing more than a fancy name for a single computer (multi-core inclusive). The key element is each core (or processor) is able to access the memory directly through the socket memory bridges. For these systems we have interconnection systems such as QPI and NUMA to communicate different processors and their memory banks. The second (distributed memory systems) is the formal name for a bunch of computers interconnected via a fast networking system and a shared filesystem. So, each "computer" has its own memory and is able to share information via the network or the filesystem.

Parallel and Distributed

These concepts are a bit tricky to use, since the literature use them for describing both parallel code and distributed code running in HPC systems. Then, only for the sake of having an agreement in the vocabulary, I propose the following definition: let us say parallel when the code (either the same or different one) requires communication among processors in order to fulfill its goal while running in a shared or distributed memory system. And let us say distributed when the code requires no communication among processors to fulfill its goal. Therefore, either a code using MPI only to communicate results to a master process, but the computation of them is an individual and independent process, we say the computation is distributed, and either a code is using files or IPC to communicate variables states and synchronizing a set of processes for computing a result, we say the computation is parallel. Note that for both cases, the use of MPI for gathering partial results and the use of files or IPC for process communication and synchronization are opposite examples of a good use of the technique.

Process and Threads

The difference between them is as simple as to say processes run in separate memory spaces and threads runs in the same memory space. However, we may say that you can address a shared memory page among several process in order to make them to share variables states and values, and we may say that each thread has its own private memory segment inside a process memory space. Both statements are true when thinking in a shared memory system, but when thinking in distributed memory one, the history is different. Two cases: (i) when processes (or threads) are in different computers, it is required to transport memory segments over the network, and (ii) when processes (or threads) are in the same computer, shared or global memory can be use for communicating them. Of course you can handle both cases by your self, but why to reinvent the wheel when MPI do this automatically?

Mutex and Semaphores

Frameworks for distributed and parallel computing in Python

(only a few of them)

There are several frameworks for providing python with parallel/distributed processing. Here we discuss only the most common ones (IMHO), discussing only their important aspects. For a deeper review, please refer to corresponding framework's homepage. Note that we are talking about python 2. In python 3 the history is a bit different (and not covered in this document).

MultiProcessing (import mp)

Python MultiProcessing library is a module that gives python 2 the ability to run an algorithm in several processes in the same machine (i.e. shared memory system). It is based in the worker paradigm, which is nothing more than a process running a predefined function with different arguments. It provides abstractions for Pools, Processes, Locks, Queues, Arrays and Managers to handle the data among processes.


The Pool maps a function f(x) to a set of values for x. You define the number of workers you want in the pool, and then you map the function to the arguments you want to run. The following code illustrate the use of a Pool of 5 workers that will evaluate the function f(x) for x=1,2,3 at the same time since the pool will use a different process to execute the body of the function f for each given value of x.

$ cat
from multiprocessing import Pool

def f(x):
    return x*x

if __name__ == '__main__':
    p = Pool(5)
    print(, [1, 2, 3]))


The output of this code is

$ python

(Work in Progress)


Threads (import threads)

Parallel Python (import pp)

Guided Example: PowerCouples


Pure Python Implementation

C-function Implementation

Share Memory: Implementation with Multiprocessing (MP)

Share Memory: Implementation with Threading (threads)

Distributed Memory: Implementation with Parallel Python (pp)