IPython Notebook (now Jupyter Notebooks) is frickin’ awesome. With the parallel extensions, it’s even awesomer.
I want to use spare desktops around my house to speed up my parallel jobs. There is lots of documentation on how to do this. It is very long.
My setup is:
On the MacBook, you need ipython+notebook+parallel. I use MacPorts, so you can install this with:
sudo port install py-ipython +notebook +parallel +scientific
On the Ubuntu machines, you just need the ipython-notebook
package, and the rest of the dependencies will install automatically:
sudo aptitude install ipython-notebook
To start some workers (‘engines’) on your local machine, run:
ipcluster start --n=4
To start a Notebook instance, run:
ipython notebook
You should get a Notebook instance in your web browser. Fire it up and run:
from IPython.parallel import Client
c = Client()
c.ids
c[:].apply_sync(lambda: "Hello, world!")
You should get back:
[0, 1, 2, 3]
['Hello, world!', 'Hello, world!', 'Hello, world!', 'Hello, world!']
You get one for each engine.
Shut down the cluster on your local computer by hitting Ctrl-C on the terminal window running it.
Cluster configuration is described in a ‘profile’. On your local machine, run:
ipython profile create --parallel --profile=home
This creates a profile called ‘home’. Modify ~/.ipython/profile_home/ipcluster_config.py
:
c = get_config()
c.IPClusterEngines.engine_launcher_class = 'SSH'
c.LocalControllerLauncher.controller_args = ["--ip='*'"]
c.SSHEngineSetLauncher.engines = {
'localhost': 4,
'tyler': 4,
'par': 4,
}
# FIXME NASTY HACK We need to use non-system-default Python on the Mac
# (i.e. the /opt path in the default config below) but we want default
# Python on the Linux machine. I couldn't figure out a way to specify it
# on a per-host basis, and profile/bashrc/whatever are not executed for
# ssh login, and the ~ alias doesn't seem to work, so... I created a
# symlink in / for ipengine (i.e. `ln -s /opt/local/bin/ipengine
# /ipengine`). Horrible, but it works!
c.SSHEngineSetLauncher.engine_cmd = ['/ipengine'] # works on Linux (thought nothing necessary)
#c.SSHEngineSetLauncher.engine_cmd = ['/opt/local/Library/Frameworks/Python.framework/Versions/2.7/Resources/Python.app/Contents/MacOS/Python', '-c', 'from IPython.parallel.apps.ipengineapp import launch_new_instance; launch_new_instance()'] # works on Mac
Note that this will open up these ports on your MacBook to the whole local network. If you’re on an untrusted network segment, don’t do this! A future revision of this guide might deal with doing everything across SSH port forwarding.
You need to be able to log in to the remote Linux machines automatically via SSH, ideally using your username on the frontend machine. Check here if you’re not sure how.
Per the above dodgy hack, you also need to link /ipengine
to the relevant ipengine
binary on that machine. For the Macs:
sudo ln -s /opt/local/bin/ipengine /ipengine
Also make sure you enable SSH (Control Panels -> Sharing -> Remote Login). You also need passwordless login on your local machine; test with ssh localhost
.
For the Ubuntu machines:
sudo ln -s /usr/bin/ipengine /ipengine
ipcluster
will try to share config files with the engines. By default, the directories do not exist on the Linux hosts. On each of them, run:
mkdir -p .ipython/profile_home/security/
To start the cluster, on your local machine, run:
ipcluster start --profile=home
If you add new machines to the cluster, you need to re-run the Client(profile='home')
line to get access to them. This also fixes things if you break the cluster (e.g. by exhausting RAM).