bulkem: Background

The inverse Gaussian distribution is an exponential-family probability distribution with the density function:

$$ f(x)=\left[\frac{\lambda}{2\pi x^{3}}\right]^{1/2}\exp\frac{-\lambda(x-\mu)^{2}}{2\mu^{2}x} $$

for $ x>0 $, mean $ \mu>0 $ and shape $ \lambda>0 $ (Seshadri, 1993, p. 1).

The EM algorithm (Dempster et al., 1977) iteratively refines a maximum likelihood estimate in the presence of missing data. Here, we use it to fit mixture models, as described in Bilmes (1998). Two characteristics are of note:

As it is an iterative algorithm, the execution time can be quite long. Hence our desire to find a faster way to perform model fitting.
EM is sensitive to the choice of initial parameters (Aitkin et al., 1980, p. 327). Therefore, we must think about how best to select them.

A GPU is a component of a computer that accelerates graphics operations. All modern computers and mobile devices include a GPU. GPUs can be programmed to perform general-purpose computations in the same way as a CPU; this is referred to as ‘general purpose GPU computing’.

GPUs are similar to CPUs in that they run user-defined software. The key difference is that while a CPU generally has a small number of execution units (two or four are common for consumer hardware), a GPU may have thousands of execution units operating in parallel. The CPU is optimised for serial operations – performing a sequence of instructions as quickly as possible. The GPU is optimised for parallel operations where the same instructions are performed many times on different data. Each execution unit of the GPU is simple and more restricted, but there are many more of them. The peak computational output (measured in instructions per second) is far greater than for a CPU. Redesigning the problem to take advantage of this structure is the major challenge of the programmer working on a GPU computing problem.

There are two major standards for GPU computing: OpenCL and CUDA. OpenCL is supported by most GPU vendors. CUDA is only supported by NVIDIA hardware but it is a mature standard with excellent tools and documentation. We will only consider CUDA from this point on.

There are significant barriers to widespread adoption of GPU computing.

Not everyone has access to ‘large’ GPU hardware. Most CPUs now ship with on-board GPUs which are sufficient for graphics, but do not provide significant performance advantages in the GPU computing context.
The effort required to port a given algorithm to a GPU is large. Programming GPUs is significantly more difficult than CPUs
CPUs will always be the first to get any new algorithm
Most problems do not require a large amount of computing power. There is no incentive to speed up something that is already fast.
Not all algorithms run faster when executed on a GPU. For an algorithm to be a good candidate for GPU execution, it must generally:
- require many iterations (thousands), each of which is independent of the others
- require a large amount of computation time relative to the amount of memory access
For many problems and institutions, a cluster of general purpose PCs is a better fit. It has the advantages of running all available software, requiring minimal rework and being ‘familiar’ (programming a cluster is very similar to that of programming a single PC).

M Aitkin and GT Wilson. Mixture models, outliers, and the EM algorithm. Technometrics, 22(3): 325–331, Aug 1980.

JA Bilmes. A gentle tutorial of the EM algorithm and its application to parameter estimation for Gaussian mixture and hidden Markov models. Technical Report TR-97-021, International Computer Science Institute, 1998.

AP Dempster, NM Laird, and DB Rubin. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society. Series B (Methodological), 39(1):1–38, 1977.

V Seshadri. The inverse gaussian distribution: a case study in exponential families. Oxford Science Publications, 1993.

Ian Howson