Near real time processing of time series data with HBase

Speaker:

Christian Richter

A known problem when storing time series data in HBase is having hot regions when using timestamps as keys. A common solution is to use a salt as prefix to distribute the data over multiple regions. This presents a problem when one wants to process the data ordered by timestamps in a Map/Reduce job, as currently only one Scan object can serve as input. One approach is to start a Map/Reduce job for each prefix. Another solution is to allow multiple Scan objects, one for each prefix, to serve as input in a Map/Reduce job by implementing a MultiSegmentTableInputFormat. Using the MultiSegmentTableInputFormat in a Map/Reduce job has the advantage of being able to use a prefix to avoid hot regions when writing data and allows to process data ordered by timestamps in a single Map/Reduce job, though improving performance. This talk will provide details on how we use HBase for near real time processing of time series data from a real world application.

Watch the video of Christian Richter's talk here.

Schedule info

Time slot:

5 June 14:45 - 15:05

Room:

Humboldtsaal

Track:

store

Experience level:

intermediate

Presentation Format:

Short (20min)

Please login to sign up for this Session.

Near real time processing of time series data with HBase

Gold-Partner

Silver-Partner

Bronze-Partner

Startup-Sponsor

User login