Lightning Talk

Please note: This is a wiki page - log yourself in (or create an account first), than edit this page directly to submit your lightning talk (don't remove any existing ones of course).

Lightning Talks - Berlin Buzzwords

At the end of the second day we have reserved space in the schedule for a lightning talk session. Please list your talk proposals below. Make sure to be on-site when your slot is up. Talks are moderated by one of our session chairs.

There will be NO laptop switching between sessions - make sure you bring your slides in some widely supported format e.g. pdf or odp and give it to the session chair 20min before the first Lighting Talk session starts. We'll provide a laptop for you. If you happen to need longer than 5min they will make sure to get you off-stage. Schedule

When Who What
2:20pm Jonathan Wisler Big Data, Internet Scale – Building a Global Object Storage Platform
2:30pm blocked for Jonathan Wisler
2:40pm Lukas Kahwe Smith travis-ci.org - social continuous integration
2:50pm Jukka Zitting Jackrabbit Oak - git for content (https://s.apache.org/oak-bb12)
3:00pm -4:10 Blocked for break and regular talk
4:10pm Michael Hunger Query A Graph with ASCII ART
4:20pm Ingo Renner Apache Solr for TYPO3, Search Meets CMS
4:30pm Doug Joud Hypertable - Big Data. Big Performance.
4:40pm Tim Lossen Lightning IO - the lightning talk conference
4:50pm Max Jakob Named Entity statistics using Apache Pig
5:00pm Lukas Kahwe Smith PHPCR - PHP Content Repository Specification
5:10pm Karel Minarik ElasticSearch on Rails
5:20pm Iván de Prado Pangool: Hadoop API made easy!

Further info on talks

Apache Pig for Named Entity Disambiguation

Abstract: Named Entity Recognition and Disambiguation is the task of spotting names of people, organizations, places etc. in natural language text and disambiguating them to unambiguous identifiers. Several probabilities and context similarity measures are typically employed to solve this problem. Apache Pig is a framework for analyzing large datasets using a high-level dataflow language on top of Apache Hadoop.

This talked focuses on a concrete case of using Apache Pig for efficiently estimating probabilities related to Named Entity Recognition and Disambiguation with Wikipedia as input. Even though development time was short, the performance gain compared to a previous single-machine implementation is remarkable, enabling more frequent updates and more flexible evaluations and tuning.

About the speaker: Max Jakob works as software developer at Neofonie R&D and is currently involved in the Dicode EU-project. He aquired his Master of Science degree in Language & Communication Technologies focusing on different semantic representations for natural language processing. Recently, he was involved in the DBpedia project and co-authored DBpedia Spotlight, a tool for Named Entity Recognition and Disambiguation using DBpedia concepts.

Exploring data on Google App-Engine

Abstract: This talk is aimed primarily at practitioners wishing to summarise and explore data held in google-app engine's non-relational datastore.

Cupple (www.cupple.mobi) is a social media iPhone app that supports an individual "coupling" with a single partner, to share messages and photos. The architecture of the system is an iphone client backed by a google app-engine web-service. The system has been live for 5 months and has over 10000 active users, generating significant traffic on a daily basis. This talk discusses how we analysed the data generated by the system, including initial efforts to visualise key metrics using app-engine's remote-api and ggplot2 in R, and then leveraging the app-engine map-reduce framework. The results of the analysis led to some startling conclusions that significantly affected the development direction of the product.

About the speaker: I've been in and around computers since I was 7 - I must be the youngest person to have still used punchcards:) Really started doing interesting University based things in 94, first Neural Networks in Dundee, then a bit of image processing at Northumbria University which lead to researching and teaching data analysis (in R) and visualisation work, mainly on really big image databases (millions of images). Started a company (see-fish technology) looking at 3d visualisation of databases, which is still going. Laterly worked on social applications mainly support by google app-engine.

Big Data, Internet Scale – Building a Global Object Storage Platform

Abstract: The design and implementation of a global object storage platform with intelligent indexing and search. Learn how SoftLayer architected a platform to deliver object storage at Internet scale, and how Cloudant is delivering a scalable and globally distributed data layer to further extend the platform. Technologies discussed will include OpenStack Swift and Cloudant BigCouch.

About the speaker: Jonathan Wisler, SoftLayer General Manager EMEA Jonathan Wisler is responsible for spearheading and managing the growth of SoftLayer’s EMEA operations including running the day-to-day operations of the company’s data centre in Amsterdam. An IT veteran, Mr. Wisler has been driving innovation and international growth for over a decade. He started his career in technology working for one of the first pioneers of interactive advertising, Red Sky Interactive, which became a part of Agency.com. Jonathan was one of the key management team who grew Kodak Gallery (formerly Ofoto) from a start up to an international market leader. Most recently he helped Blurb, the self-publishing platform expand into the key European markets. Mr. Wisler holds a BA in Economics from the University of California in Santa Cruz.

Jackrabbit Oak

"We want to implement a scalable and performant hierarchical content repository for use as the foundation of modern world-class web sites and other demanding content applications. The repository should implement standards like JCR, WebDAV and CMIS, and be easily accessible from various platforms, especially from JavaScript clients running in modern browser environments. The implementation should provide more out-of-the-box functionality than typical NoSQL databases while achieving comparable levels of scalability and performance." -- https://wiki.apache.org/jackrabbit/Jackrabbit%203%20Strategic%20Plan

This lightning talk introduces the Oak effort and gives an update on the current state of the tree after the first three months of development. Slides at https://s.apache.org/oak-bb12.