You know, for search. Querying 24 Billion Records in 900ms.
Who doesn't love building high-available, scalable systems holding multiple Terabytes of data? Recently we had the pleasure to crack some tough nuts to solve the problems and we'd love to share our findings designing, building up and operating a 120 Node, 6TB Elasticsearch (and Hadoop) cluster with the community: - Dynamicly increasing and decreasing cluster size - Amazon Webservices vs. Dedicated Hardware - IO performance on Solid State Disks, Amazon Elastic Block Storage (EBS) or instance store - Choosing the EC2 right instance type, Dimensioning your Hardware - Tuning Elasticsearch configuration - Out of Memory: Implementing custom facets - Keep the cluster responsive while heavily indexing - Automated Deployment (e.g. Puppet), Version updates - Monitoring/Tools (Ganglia, Zabbix, Elasticsearch-Head and Bigdesk) - Costs (EC2 vs. dedicated), how to safe money. - Integration with Hadoop: Use Hadoop/Mapred/Hive to fill the search cluster, HDFS to backup.
Watch the video of Jodok Batlogg's talk here.