As the Big Data explosion continues, Data Scientists are increasingly needing to run a mix of traditional batch oriented workloads with service oriented workloads on Hadoop clusters in order to process and analyze the massive amounts of data, and turn the data into knowledge. While the Hadoop resource manager supports dynamic allocation of resources for batch workloads, computing resources are statically allocated to workloads requiring long running services (non-batch workloads), resulting in reduced performance, underutilization, and increased infrastructure costs.
Teraproc Application Autoscaler dynamically grows and shrinks the number of long running services on Hadoop clusters. Through integration with the Hadoop resource manager YARN, and the Hadoop application manager Slider, Teraproc Application Autoscaler enables a Hadoop environment to dynamically allocate resources to workloads requiring long running services. At the core of Teraproc Application Autoscaler is an analysis engine that analyzes application business logic and metrics for resource demand, and a policy engine that allows configuration of thresholds, scaling rules, and actions to meet service level agreements. Teraproc Application Autoscaler enhances both YARN and Slider to schedule resource and application more intelligently. Teraproc Application Autoscaler is an agent based architecture. Agents are to be developed to collect systems metrics from the underlying monitoring system, and to introspect application managers. Teraproc Application Autoscaler supports both on premise, static Hadoop clusters, as well as dynamic Hadoop clusters on cloud infrastructures. By integration with the Amazon cloud, Teraproc Application Autoscaler can dynamically expand the size of a Hadoop cluster if the resources within a cluster is insufficient to meet the workload demand.
Teraproc Application Autoscaler supports Hadoop Mapreduce, HBase, Storm, Spark and HPC workload management engines in the same shared cluster infrastructure, while guaranteeing the following service levels:
- The average cpu or memory utilization is above 80%
- The average response time of servicing a request in Hbase is less than 5 seconds
Teraproc Application Autoscaler can also configure and monitor the SLA and auto-scale policies of applications in Hadoop operation manager Ambari.