From Batch to Realtime with Hadoop
In the early days of web applications, sites were designed to serve users and gather information along the way. With the proliferation of data sources and growing user bases, the amount of data generated required new ways for storage and processing. Hadoop's HDFS and its batch oriented MapReduce opened new possibilities, yet it falls short of instant delivery of aggregate data to end users. Adding HBase and other layers, such as stream processing using Twitter's Storm, can overcome this delay and bridge the gap to realtime aggregation and reporting. This presentation takes the audience from the beginning of web application design to the current architecture, which combines multiple technologies to be able to process vast amounts of data, while still being able to react timely and report near realtime statistics.
Watch the video of Lars George's talk here.
- Login to post comments