GPU-Accelerated R in the Cloud with Teraproc Cluster-as-a-Service

Analysis of statistical algorithms can generate workloads that run for hours, if not days, tying up a single computer. Many statisticians and data scientists write complex simulations and statistical analysis using the R statistical computing environment. Often these programs have a very long run time. Given the amount of time R programmers can spend waiting for results, it makes sense to take advantage of parallelism in the computation and the available hardware. In a previous post on the Te...
More

Scaling R clusters? AWS Spot Pricing is your new best friend

An elastic infrastructure for distributed R Most of us recall the notion of elasticity from Economics 101. Markets are about supply and demand, and when there is an abundance of supply, prices usually go down. Elasticity is a measure of how responsive one economic variable is to another, and in an elastic market the response is proportionately greater than the change in input. It turns out that cloud pricing, on the margin at least, is pretty elastic. Like bananas in a supermarket, CPU cyc...
More

Accelerating R with multi-node parallelism – Rmpi, BatchJobs and OpenLava

Gord Sissons, Feng Li In a previous blog we showed how we could use the R BatchJobs package with OpenLava to accelerate a single-threaded k-means calculation by breaking the workload into chunks and running  them as serial jobs. R users frequently need to find solutions to parallelize workloads, and while solutions like multicore and socket level parallelism are good for some problems, when it comes to large problems there is nothing like a distributed cluster. The message passing inter...
More

Seeing the Forest and the Trees – a parallel machine learning example

Parallelizing Random Forests in R with BatchJobs and OpenLava By: Gord Sissons and Feng Li In his series of blogs about machine learning, Trevor Stephens focuses on a survival model from the Titanic disaster and provides a tutorial explaining how decision trees tend to over-fit models yielding anomalous predictions. How do we build a better predictive model? The answer as Trevor observes, is to grow a whole forest of decision trees, let the models grow as deep as they will, and let these ...
More

Parallel R with BatchJobs

Parallelizing R with BatchJobs - An example using k-means Gord Sissons, Feng Li Many simulations in R are long running. Analysis of statistical algorithms can generate workloads that run for hours if not days tying up a single computer. Given the amount of time R programmers can spend waiting for results, getting acquainted parallelism makes sense. In this first in a series of blogs, we describe an approach to achieving parallelism in R using BatchJobs, a framework that provides Map, Re...
More