Searching Japanese with Lucene and Solr
My company donated Kuromoji (LUCENE-3305) to the Apache Software Foundation last year with the goal of bringing easy-to-use and high-quality Japanese language support to Lucene and Solr.
This talks covers the following topics:
1) Introduce the basic challenges when searching Japanese text
2) Present common techniques for dealing with these challenges
3) Demonstrate how Lucene and Solr now supports sophisticated Japanese support out-of-the-box and also explore common configuration options/best-practices, and some of the very cool things we can do.
The talk is meant for beginners and doesn't require any prior knowledge of Japanese.
The key takeaway I'd like to give listeners is to a) make them aware of the basic challenged when searching Japanese; and b) empower them to build Japanese search solutions using the now standard features in Lucene/Solr 3.6/4.0.
(It's possible to deliver the talk in 20 - 40 minutes, but perhaps 40 minutes works best.)