Real-time Analytics with HBase
HBase can store massive amounts of data and allow random access to it - great. MapReduce jobs can be used to perform data analytics on a large scale - great. MapReduce jobs are batch jobs - not so great if you are after Real-time Analytics. Meet append-only writes approach that allows going real-time where it wasn't possible before.
In this talk we'll explain how we implemented "update-less updates" (not a typo!) for HBase using append-only approach. This approach uses HBase core strengths like fast range scans and the recently added Coprocessors to enable Real-time Analytics. It shines in situations where high data volume and velocity make random updates (aka Get+Put) prohibitively expensive. Apart from making Real-time Analytics possible, we'll show how the append-only approach to updates makes it possible to A) perform rollbacks of data changes, and B) avoid data inconsistency problems caused by tasks in MapReduce jobs that fail after only partially updating data in HBase.
The talk is based on Sematext's success story of building a highly scalable, general purpose data aggregation framework which was used to build Search Analytics and Performance Monitoring services. Most of the generic code needed for append-only approach described in this talk is implemented in our HBaseHUT open-source project. Using HBaseHUTit - great. MapReduce jobs can be used to perform data analytics on a large scale - great. MapReduce jobs are batch jobs - not so great if you are after Real-time Analytics. Meet append-only writes approach that allows going real-time where it wasn't possible before.
Watch the video of Alex Baranau's talk here.
- Login to post comments