Profiling and performance-tuning your Hadoop pipelines

Speaker:

Aaron Beppu

In the Hadoop ecosystem, there are now several tools which allow developers to quickly produce pipelines of MapReduce jobs without descending to the verbose level of the Java MapReduce apis. Unfortunately, these concise, higher-level tools often produce pipelines which are initially slow, and difficult to optimize. This talk will describe Etsy's pipeline of hundreds of Cascading flows (and thousands of daily Hadoop jobs), and our approach to profiling and performance-tuning them. Concrete examples will include speeding up our initial log parsing by 10x, streamlining our serialization and deserialization, and producing so much JVM snapshot data from our Hadoop jobs that we needed more Hadoop jobs to summarize it all.

Watch the video of Aaron Beppu's talk here.

Schedule info

Time slot:

5 June 11:25 - 11:45

Room:

Loft

Track:

scale

Experience level:

intermediate

Presentation Format:

Short (20min)

Slides:

hadoop tuning-abeppu-bbuzz12.pdf

Please login to sign up for this Session.

Profiling and performance-tuning your Hadoop pipelines

Gold-Partner

Silver-Partner

Bronze-Partner

Startup-Sponsor

User login