In our previous blogs and webinars we have discussed the significant improvements and architectural changes coming to Apache Hadoop .Next (0.23). To recap, the major ones are:
- Federation for Scaling HDFS – HDFS has undergone a transformation to separate Namespace management from the Block (storage) management to allow for significant scaling of the filesystem. In previous architectures, they were intertwined in the NameNode.
- NextGen MapReduce (aka YARN) – MapReduce has undergone a complete overhaul in hadoop-0.23, including a fundamental change to split up the major functionalities of the JobTracker, resource management and job scheduling/monitoring into separate daemons. The idea is to have a global ResourceManager (RM) and per-application ApplicationMaster (AM). An application is either a single job in the classical sense of Map-Reduce jobs or a DAG of jobs. Thus, Hadoop becomes a general purpose data-processing platform that can support MapReduce as well as other application execution frameworks such as MPI, Graph processing, Iterative processing etc.
As we have discussed previously, delivering a major Apache Hadoop release takes a significant amount of effort to meet very strict reliability, scalability and performance requirements. Since Apache Hadoop (HDFS & MapReduce) are the core parts of the ecosystem, compatibility and integration of components in the upper layers of the stack (HBase, Pig, Hive, Oozie etc.) are critical for success of the new release.
In the tradition that we’ve followed for every single major (stable) release of Apache Hadoop, Hortonworks partnered with Yahoo! to benchmark and certify hadoop-0.23.1 on a performance cluster of 350 machines. Although performance improvements have been a continuous process since the beginning, it became the principle focus after the alpha release of Hadoop .Next (0.23.0).
We are pleased to report that almost all of the benchmarks perform significantly better on Hadoop .Next (0.23.1) compared to the current stable hadoop-1.0 release. Even those that don’t perform significantly better are on par with hadoop-1.0.
The performance benchmarks are the same ones that we’ve been using to harden & stabilize major Hadoop releases throughout the lifetime of the project.
The aim of this process is to verify every single aspect of core Hadoop – to validate that there are no regressions at scale. These include the core HDFS and MapReduce (i.e. NextGen MapReduce, or YARN) and the applications that run on top of this framework.
Here are some details on the benchmark tests:
- The dfsio benchmark for measuring HDFS I/O (read/write) performance.
- The slive benchmark for measuring NameNode operations.
- The scan benchmark to measure HDFS I/O performance for MapReduce jobs.
- The shuffle benchmark to calibrate how fast the map-outputs are shuffled
- The famous sort benchmark which measures time for sorting data with MapReduce.
- The compression benchmark to validate how fast we compress intermediate and the final outputs of MapReduce jobs.
- The gridmix-V3 to measure the throughput of the cluster using a production trace of thousands of actual user jobs.
We also started using a couple of new benchmarks to cater to the architectural changes due to YARN:
- The ApplicationMaster Scalability benchmark to figure out how fast task/container scheduling happens at the MapReduce ApplicationMaster. Compared to hadoop-1.0, this benchmark ran twice as fast with hadoop-0.23.1.
- The ApplicationMaster Recoverability benchmark for measuring how fast jobs recover on restart.
- The ResourceManager Scalability to evaluate the central master’s scalability by simulating lots of nodes in a cluster.
- The Small Jobs benchmark to measure performance for very small jobs also runs more than twice as fast due to improvements made where the tasks execute within the ApplicationMaster itself (as opposed to launching small number of tasks for the job).
Many of the performance improvements can be attributed to the new architecture itself. Stay tuned for additional blogs on this topic.
Leaving YARN aside, i.e. the resource-management layer, the MapReduce runtime (map task, sort, shuffle, merge etc.) itself has many improvements when compared to hadoop-1.0. Some examples are: MAPREDUCE-64, MAPREDUCE-318, MAPREDUCE-240.
More information is available on MAPREDUCE-3561, which is the umbrella Apache Hadoop JIRA where we were tracking all our benchmarking efforts.
Benchmarking distributed systems is a very challenging task. It involves debugging, constant focus on one problem at a time, knowing which threads of investigation to follow and which to ignore and last, but not the least, patience and persistence. We had so much fun doing it and learnt some valuable lessons along the way. The process itself merits its own post.
Summary & Acknowledgements
We thank the Yahoo! Performance team for the cluster resources, development & performance teams for all the help along the way!
We are very excited to be delivering on the promise of Hadoop .Next and hope you can derive even better value from your Hadoop clusters.
- Vinod Kumar Vavilapalli a.k.a @tshooter