One disadvantage of the Expectation-Maximization (EM) algorithm is that it requires a lot of computation time to produce a result. This can be burdensome when many models need to be fit. The bulkem R package attempts to address this problem by taking advantage of CUDA hardware.
CUDA is a parallel computing platform available on NVIDIA graphics processing units, widely explored in the computational statistics literature. On the author’s hardware, bulkem runs around thirty times faster on CUDA hardware than on a CPU when many small datasets need to be fit. For very large datasets (one million observations), CUDA is around 36 times faster.
Using the current market price of CPU vs. GPU compute time, the GPU implementation is slightly more cost-effective for small datasets and twice as cost-effective for large datasets.
To achieve this level of performance even for small datasets, bulkem uses task parallelism and a novel parallel summation algorithm. It is also unique in being the only CUDA implementation of EM fitting for mixture models using the inverse Gaussian distribution.