Analysis of statistical algorithms can generate workloads that run for hours, if not days, tying up a single computer. Many statisticians and data scientists write complex simulations and statistical analysis using the R statistical computing environment. Often these programs have a very long run time. Given the amount of time R programmers can spend waiting for results, it makes sense to take advantage of parallelism in the computation and the available hardware.
In a previous post on the Te...

Read More
# Author: Gord Sissons

# Scaling R clusters? AWS Spot Pricing is your new best friend

An elastic infrastructure for distributed R
Most of us recall the notion of elasticity from Economics 101. Markets are about supply and demand, and when there is an abundance of supply, prices usually go down. Elasticity is a measure of how responsive one economic variable is to another, and in an elastic market the response is proportionately greater than the change in input.
It turns out that cloud pricing, on the margin at least, is pretty elastic. Like bananas in a supermarket, CPU cyc...

Read More
# Why HPC Clusters are like Bananas

Realizing a more cost-efficient infrastructure
Most of us recall the notion of elasticity from Economics 101. Markets are about supply and demand, and when there is an abundance of supply, prices usually go down. Elasticity is a measure of how responsive one economic variable is to another, and in an elastic market the response is proportionately greater than the change in input.
What does this have to do with HPC or analytic clusters you ask? It turns out that cloud pricing, on the margin a...

Read More
# Parallel R with BatchJobs

Parallelizing R with BatchJobs - An example using k-means
Gord Sissons, Feng Li
Many simulations in R are long running. Analysis of statistical algorithms can generate workloads that run for hours if not days tying up a single computer. Given the amount of time R programmers can spend waiting for results, getting acquainted parallelism makes sense.
In this first in a series of blogs, we describe an approach to achieving parallelism in R using BatchJobs, a framework that provides Map, Re...

Read More