How to set up a private IPython parallel cluster

IPython Notebook (now Jupyter Notebooks) is frickin’ awesome. With the parallel extensions, it’s even awesomer.

I want to use spare desktops around my house to speed up my parallel jobs. There is lots of documentation on how to do this. It is very long.

My setup is:

MacBook running OS X 10.8
Two desktops running Ubuntu

On the MacBook, you need ipython+notebook+parallel. I use MacPorts, so you can install this with:

sudo port install py-ipython +notebook +parallel +scientific

On the Ubuntu machines, you just need the ipython-notebook package, and the rest of the dependencies will install automatically:

sudo aptitude install ipython-notebook

To start some workers (‘engines’) on your local machine, run:

ipcluster start --n=4

To start a Notebook instance, run:

ipython notebook

You should get a Notebook instance in your web browser. Fire it up and run:

from IPython.parallel import Client

c = Client()
c.ids
c[:].apply_sync(lambda: "Hello, world!")

You should get back:

[0, 1, 2, 3]
['Hello, world!', 'Hello, world!', 'Hello, world!', 'Hello, world!']

You get one for each engine.

Shut down the cluster on your local computer by hitting Ctrl-C on the terminal window running it.

Set up your laptop

Cluster configuration is described in a ‘profile’. On your local machine, run:

ipython profile create --parallel --profile=home

This creates a profile called ‘home’. Modify ~/.ipython/profile_home/ipcluster_config.py:

c = get_config()

c.IPClusterEngines.engine_launcher_class = 'SSH'
c.LocalControllerLauncher.controller_args = ["--ip='*'"]

c.SSHEngineSetLauncher.engines = {
    'localhost': 4,
    'tyler': 4,
    'par': 4,
}

# FIXME NASTY HACK We need to use non-system-default Python on the Mac
# (i.e. the /opt path in the default config below) but we want default
# Python on the Linux machine. I couldn't figure out a way to specify it
# on a per-host basis, and profile/bashrc/whatever are not executed for
# ssh login, and the ~ alias doesn't seem to work, so... I created a
# symlink in / for ipengine (i.e. `ln -s /opt/local/bin/ipengine
# /ipengine`). Horrible, but it works!
c.SSHEngineSetLauncher.engine_cmd = ['/ipengine'] # works on Linux (thought nothing necessary)
#c.SSHEngineSetLauncher.engine_cmd = ['/opt/local/Library/Frameworks/Python.framework/Versions/2.7/Resources/Python.app/Contents/MacOS/Python', '-c', 'from IPython.parallel.apps.ipengineapp import launch_new_instance; launch_new_instance()'] # works on Mac

Note that this will open up these ports on your MacBook to the whole local network. If you’re on an untrusted network segment, don’t do this! A future revision of this guide might deal with doing everything across SSH port forwarding.

Set up the cluster nodes

You need to be able to log in to the remote Linux machines automatically via SSH, ideally using your username on the frontend machine. Check here if you’re not sure how.

Per the above dodgy hack, you also need to link /ipengine to the relevant ipengine binary on that machine. For the Macs:

sudo ln -s /opt/local/bin/ipengine /ipengine

Also make sure you enable SSH (Control Panels -> Sharing -> Remote Login). You also need passwordless login on your local machine; test with ssh localhost.

For the Ubuntu machines:

sudo ln -s /usr/bin/ipengine /ipengine

ipcluster will try to share config files with the engines. By default, the directories do not exist on the Linux hosts. On each of them, run:

mkdir -p .ipython/profile_home/security/

Running the thing

To start the cluster, on your local machine, run:

ipcluster start --profile=home

If you add new machines to the cluster, you need to re-run the Client(profile='home') line to get access to them. This also fixes things if you break the cluster (e.g. by exhausting RAM).

Ian Howson