Implementing a Text Classifier based on Lucene and LIBSVM

Speaker:

Christoph Goller

During the last ten years statistical text classification has become an important application area. After briefly showing some interesting applications I will sketch how a statistical text classification system can be implemented based on Lucene as basis for storing training- and test data and LIBSVM as machine learning library. Classification Rules delivered by the support vector machines can be represented as Lucene queries thus allowing very efficient classification of big document collections provided they are already indexed (batch mode classification). On the other hand, classification of one document (online classification) with respect to thousands of classification rules (e.g. Patent Classification) might best be implemented by representing all classification rules as a Lucene index (boosts in payloads) and applying the document that has to be classified as query.

Watch Christoph Golller`s video talk here.

Schedule info

Time slot:

4 June 16:35 - 16:55

Room:

Kleistsaal

Track:

Experience level:

advanced

Presentation Format:

Short (20min)

Slides:

Text Classifier-cgoller-bbuzz12.pdf

Please login to sign up for this Session.

Implementing a Text Classifier based on Lucene and LIBSVM

Gold-Partner

Silver-Partner

Bronze-Partner

Startup-Sponsor

User login