Skip to Content

Featured Items Featured Items

News Bites News Bites

There's More Lucene in Solr than You Think!

There is an interesting blog post about Lucene & Solr consultancy and training services and how these two technologies are perceived by different companies and their technical teams. The blog post highlights how many Solr users do not realize the importance of understanding the concepts of Lucene and provides some interesting examples too.

Apache Solr and Lucene 3.6.0 released

The Lucene PMC is pleased to announce the release of Apache Solr and Lucene 3.6.0. As this may be the last release from the 3.x line of releases, it is highly recommended that users upgrade. The releases include the new Kuromoji morphological analysis framework for Japanese, improvements to suggester implementations and query-time joining, a new SolrJ client connector based on Apache HttpComponents, and a wide variety of bug fixes. Solr and Lucene 3.6.0 can be downloaded from here and here respectively.

Blogs Blogs

Finite State Automata in Lucene

Lucene Revolution 2012 is now done, and the talk Robert and I gave went well! We showed how we are using automata ( FSA s and FST s) to make great improvements throughout Lucene. You can view the slides here . This was the first time I used Google Docs exclusively for a talk, and I was impressed! The real-time collaboration was awesome: we each could see the edits the other was...

Lucene's TokenStreams are actually graphs!

Lucene's TokenStream class produces the sequence of tokens to be indexed for a document's fields. The API is an iterator: you call incrementToken to advance to the next token, and then query specific attributes to obtain the details for that token. For example, CharTermAttribute holds the text of the token; OffsetAttribute has the character start and end offset into the original string...

Lucene has two Google Summer of Code students!

I'm happy to announce that two Lucene Google Summer of Code projects were accepted for this summer! The first project ( LUCENE-3312 ), proposed by Nikola Tanković, will separate StorableField out of IndexableField , and also fix the longstanding confusing trap that one can retrieve a Document at search time and re-index it, without losing anything. It's unfortunately not true ! ...

Berlin Buzzwords is back and will take place on 4th & 5th June 2012! It's a conference for developers and users of open source software projects, focusing on the issues of scalable search, data-analysis in the cloud and NoSQL-databases. Berlin Buzzwords presents more than 30 talks and presentations courtesy of international speakers specific to the three tags: "search", "store" and "scale". It goes without saying of course that many of the contributors from our SearchWorkings.org community site will be present too.

Registration is open so get your tickets now! More info via berlinbuzzwords.de

Furthermore, based on last years success there will also be an opportunity to participate in search related training sessions (with a massive discount for Berlin Buzzwords attendees), organized by some of our contributors, for more information click here.

Lucene Revolution 2012 will take place in Boston on May 7-10. It's the largest conference for the Apache Lucene / Solr open source search community. A large contingency of the project committers will be there, as well as most of the 400+ fellow Lucene / Solr enthusiasts. The two-day agenda consists of 40 sessions, workshops, panels and keynotes dedicated to all things related to open source search. We're proud that many of our contributors have been invited as speakers, more information available here. Furthermore, gain deeper insights into Lucene, Solr and Big Data by attending a two-day training workshop, which will take place May 7-8. Register now and take advantage of special savings! Visit lucenerevolution.org for more information.

Training & Presentations Training & Presentations

Lucene Today, Tomorrow & Beyond

This video is the presentation that introduces the current state of the Lucene eco-system from a technical perspective and tries to provide a future vision of the project even beyond the next revolutionary major release. Download presentation slides here.

Apache Tika: 1 point Oh!

  Apache Tika, since April 2010 an ASF top level project, and a thriving Apache community has made tremendous strides over the past 4 years to grow and mature into a leading text extraction library, and content detection framework. Tika is used in a number of search projects, in a number of data...

Configuring Mahout Clustering Jobs

Apache Mahout is a framework for scalable machine learning on top of Apache Hadoop and can be used for large scale document clustering. This presentation introduces clustering in general and shows you step-by-step how to configure Mahout clustering jobs to create a tag cloud from a document...

Latest Tweets Latest Tweets