Teraproc Application Autoscaler

As the Big Data explosion continues, Data Scientists are increasingly needing to run a mix of traditional batch oriented workloads with service oriented workloads on Hadoop clusters in order to process and analyze the massive amounts of data, and turn the data into knowledge. While the Hadoop resource manager supports dynamic allocation of resources for batch workloads, computing resources are statically allocated to workloads requiring long running services (non-batch workloads), resulting in ...
Read More

GPU-Accelerated R in the Cloud with Teraproc Cluster-as-a-Service

Analysis of statistical algorithms can generate workloads that run for hours, if not days, tying up a single computer. Many statisticians and data scientists write complex simulations and statistical analysis using the R statistical computing environment. Often these programs have a very long run time. Given the amount of time R programmers can spend waiting for results, it makes sense to take advantage of parallelism in the computation and the available hardware. In a previous post on the Te...
Read More

Scaling R clusters? AWS Spot Pricing is your new best friend

An elastic infrastructure for distributed R Most of us recall the notion of elasticity from Economics 101. Markets are about supply and demand, and when there is an abundance of supply, prices usually go down. Elasticity is a measure of how responsive one economic variable is to another, and in an elastic market the response is proportionately greater than the change in input. It turns out that cloud pricing, on the margin at least, is pretty elastic. Like bananas in a supermarket, CPU cyc...
Read More

Why HPC Clusters are like Bananas

Realizing a more cost-efficient infrastructure Most of us recall the notion of elasticity from Economics 101. Markets are about supply and demand, and when there is an abundance of supply, prices usually go down. Elasticity is a measure of how responsive one economic variable is to another, and in an elastic market the response is proportionately greater than the change in input. What does this have to do with HPC or analytic clusters you ask? It turns out that cloud pricing, on the margin a...
Read More

Accelerating R with multi-node parallelism – Rmpi, BatchJobs and OpenLava

Gord Sissons, Feng Li In a previous blog we showed how we could use the R BatchJobs package with OpenLava to accelerate a single-threaded k-means calculation by breaking the workload into chunks and running  them as serial jobs. R users frequently need to find solutions to parallelize workloads, and while solutions like multicore and socket level parallelism are good for some problems, when it comes to large problems there is nothing like a distributed cluster. The message passing inter...
Read More

Seeing the Forest and the Trees – a parallel machine learning example

Parallelizing Random Forests in R with BatchJobs and OpenLava By: Gord Sissons and Feng Li In his series of blogs about machine learning, Trevor Stephens focuses on a survival model from the Titanic disaster and provides a tutorial explaining how decision trees tend to over-fit models yielding anomalous predictions. How do we build a better predictive model? The answer as Trevor observes, is to grow a whole forest of decision trees, let the models grow as deep as they will, and let these ...
Read More

Parallel R with BatchJobs

Parallelizing R with BatchJobs - An example using k-means Gord Sissons, Feng Li Many simulations in R are long running. Analysis of statistical algorithms can generate workloads that run for hours if not days tying up a single computer. Given the amount of time R programmers can spend waiting for results, getting acquainted parallelism makes sense. In this first in a series of blogs, we describe an approach to achieving parallelism in R using BatchJobs, a framework that provides Map, Re...
Read More

Teraproc Cloud Manager

Teraproc Cloud Manager is a complete solution of managing a heterogeneous cloud computing environment. Whether you have traditional or cloud workloads, Teraproc Cloud Manager helps you manage all your applications in one cloud environment. While other cloud management solutions only support applications running in a virtualized environment, Teraproc Cloud Manager is flexible providing your choice of deployment models. Teraproc Cloud Manager is easy to use designed for users with little experienc...
Read More

Why Teraproc for your HPC or Analytic cluster

Guided by a belief that compute and data intensive distributed clusters should be simple, scalable, affordable and open, Teraproc delivers unique, turnkey cluster-as-a-service offerings tailored to design, simulation and scaled-out data science applications. It improves analyst productivity, speeds time to deployment, provides user self-service on cloud, and manages infrastructure costs with the enhanced flexibility.
Read More

Early access for R CaaS

Teraproc announces early registration for our R Cluster-as-a-Service offering. It's the eleventh hour so hurry up and secure your space! Learn more about the service here. As data scientists and statisticians know, R is an excellent language for analytic problems. For large scale problems, configuring distributed Hadoop or compute clusters can be a challenge. Talented technical people can spend days or weeks building out distributed clusters, assembling all the needed software components a...
Read More

Teraproc HPC Cluster-as-a-Service

Teraproc HPC Cluster-as-a-Service is a simple, scalable and affordable way to quickly deliver an HPC cluster in cloud with a built-in workload scheduler. Teraproc customers can focus on their own applications and data, and we do the rest delivering turnkey, elastic, pre-integrated clusters production proven on the industry’s leading public cloud platform. Deploys in minutes with one step cluster creation No infrastructure required by leveraging public cloud Deploys in minutes Sc...
Read More

Teraproc R Analytics Cluster-as-a-Service

The Teraproc R Analytics Cluster-as-a-Service is a simple and affordable way for R users to quickly get ready-to-use R environment in the cloud. Whether you need just a single R Studio instance, or a scaled-out R supercomputer to tackle those big gnarly problems, Teraproc has a solution for you.  Pre-configured with a wealth of R tools and parallel frameworks, clusters can be deployed in minutes and can take advantage of Amazon spot instances to keep costs low. It is comprised of 100% open-sourc...
Read More