HCatalog, Table Management for Hadoop


HCatalog is a table and storage management system for Hadoop that makes it easy for multiple data processing tools to interact with the same data. Since it provides a single data model, it allows users to pick the best tool for their job, whether it is Pig, Hive, or MapReduce, and share with other users who make different choices. Its table abstraction allows users to read and write data without being aware of where on HDFS the data is stored and what format is used (text, sequence file, RCFile, etc.). The table abstraction and shared data model also allow schemas to be known and checked as part of program development. As data changes over time, in format or storage requirements, this can be managed without requiring changes to existing data consumers' programs. This talk will cover interfaces for Pig, Hive, and MapReduce users as well as an overview of current work and the roadmap for future development.

Watch the video of Alan Gates talk here.

Schedule info
Time slot: 
4 June 13:30 - 14:10
Experience level: 
Presentation Format: 
Long (40min)