Searchworkings.org feedFinite State Automata in LuceneMike McCandlesshttp://www.searchworkings.org/c/blogs/find_entry?noSuchEntryRedirect=null&entryId=4679822012-05-15T10:51:25Z2012-05-15T10:46:47ZLucene Revolution 2012 is now done, and the talk Robert and I gave went well! We showed how we are using automata (FSAs and FSTs) to make great improvements throughout Lucene. You can view the slides <a...Mike McCandless2012-05-15T10:46:47ZSpatial Solr Plugin 1.0-RC4Chris Malehttp://www.searchworkings.org/c/blogs/find_entry?noSuchEntryRedirect=null&entryId=239562012-05-15T09:21:49Z2011-06-03T12:50:17ZI am pleased to announce the latest release of our Spatial Solr Plugin, v1.0-RC4. This release is a backwards compatible with RC3, and contains the following changes: PDF documentation has been improved to remove inconsistencies in request parameter and source code package names SpatialFilter now includes hashCode and equals implementations, facilitating storage of the filter in...Chris Male2011-06-03T12:50:17ZLucene's TokenStreams are actually graphs!Mike McCandlesshttp://www.searchworkings.org/c/blogs/find_entry?noSuchEntryRedirect=null&entryId=4631292012-05-03T14:47:33Z2012-05-03T14:45:52ZLucene's TokenStream class produces the sequence of tokens to be indexed for a document's fields. The API is an iterator: you call incrementToken to advance to the next token, and then query specific attributes to obtain the details for that token. For example, <a...Mike McCandless2012-05-03T14:45:52ZLucene has two Google Summer of Code students!Mike McCandlesshttp://www.searchworkings.org/c/blogs/find_entry?noSuchEntryRedirect=null&entryId=4631102012-05-03T14:44:43Z2012-05-03T14:43:12ZI'm happy to announce that two Lucene Google Summer of Code projects were accepted for this summer! The first project (LUCENE-3312), proposed by Nikola Tanković, will separate StorableField out of IndexableField, and also fix the longstanding...Mike McCandless2012-05-03T14:43:12ZIndexing your Samba/Windows network shares using SolrMartijn van Groningenhttp://www.searchworkings.org/c/blogs/find_entry?noSuchEntryRedirect=null&entryId=240582012-04-24T10:54:19Z2011-06-03T12:50:21ZMany of JTeam's clients want to search the content of their existing network shares as part of their Enterprise Search infrastructure. Over the last couple of years, more and more people are switching to Apache Lucene / Solr as their preferred, open source search solution. However, many still have the misconception that it is not possible to index the content of other enterprise content systems, like Microsoft Sharepoint and Samba / Windows...Martijn van Groningen2011-06-03T12:50:21ZThere's More Lucene in Solr than You Think!sejal korenromp2012-04-19T13:48:50Z2012-04-19T13:03:03ZThere is an interesting blog post about Lucene & Solr consultancy and training services and how these two technologies are perceived by different companies and their technical teams. The blog post highlights how many Solr users do not realize the importance of understanding the concepts of Lucene and provides some interesting examples too.sejal korenromp2012-04-19T13:03:03ZOn Schemas and LuceneChris Malehttp://www.searchworkings.org/c/blogs/find_entry?noSuchEntryRedirect=null&entryId=4506232012-04-17T08:04:36Z2012-04-04T08:35:14ZOne of the very first thing users encounter when using Apache Solr is its schema. Here they configure the fields that their Documents will contain and the field types which define amongst other things, how field data will be analyzed. Solr’s schema is often touted as one of its major features and you will find it used in almost every Solr component. Yet at the same time, users of Apache Lucene won’t encounter a schema. Lucene is schemaless, letting users index Documents with any fields they...Chris Male2012-04-04T08:35:14ZApache Solr and Lucene 3.6.0 releasedChris Male2012-04-17T07:53:31Z2012-04-16T04:30:34ZThe Lucene PMC is pleased to announce the release of Apache Solr and Lucene 3.6.0. As this may be the last release from the 3.x line of releases, it is highly recommended that users upgrade. The releases include the new Kuromoji morphological analysis framework for Japanese, improvements to suggester implementations and query-time joining, a new SolrJ client connector based on Apache HttpComponents, and a wide variety of bug fixes. Solr and Lucene 3.6.0 can be downloaded from here and here...Chris Male2012-04-16T04:30:34ZFaceting & result groupingMartijn van Groningenhttp://www.searchworkings.org/c/blogs/find_entry?noSuchEntryRedirect=null&entryId=4468612012-04-10T18:43:26Z2012-03-27T11:10:47ZResult grouping and faceting are in essence two different search features. Faceting counts the number of hits for specific field values matching the current query. Result grouping groups documents together with a common property and places these documents under a group. These groups are used as the hits in the search result. Usually result grouping and faceting are used together and a lot of times the results get misunderstood. The main reason is that when using grouping people...Martijn van Groningen2012-03-27T11:10:47ZLucene is participating in GSoC 2012Luca Cavanna2012-04-04T10:48:33Z2012-04-04T08:16:37ZThe Google Summer of Code is a global program that offers students stipends to write code for open source projects. They work with the open source community to identify and fund exciting projects for the upcoming summer. As in previous years Apache Lucene will be involved so it's once again a great chance for students to participate in exciting open source projects. The application deadline is drawing near! If you are a student don't miss out on this opportunity and have a look at the Lucene...Luca Cavanna2012-04-04T08:16:37ZResult grouping made easierMartijn van Groningenhttp://www.searchworkings.org/c/blogs/find_entry?noSuchEntryRedirect=null&entryId=4464092012-04-03T09:45:18Z2012-03-26T07:59:52ZLucene has result grouping for a while now as a contrib in Lucene 3.x and as a module in the upcoming 4.0 release. In both releases the actual grouping is performed with Lucene Collectors. As a Lucene user you need to use various of these Collectors in searches. However these Collectors have many constructor arguments. So they can become quite cumbersome to use grouping in pure Lucene apps. The example below illustrates this. Result grouping using the...Martijn van Groningen2012-03-26T07:59:52ZLucene Versions - Stable, Development, 3.x and 4.0Chris Malehttp://www.searchworkings.org/c/blogs/find_entry?noSuchEntryRedirect=null&entryId=4460762012-04-02T04:34:03Z2012-03-25T05:11:30ZWith Solr and Lucene 3.6 soon becoming the last featureful 3.x release and the release of 4.0 slowly drawing near, I thought it might be useful just to recap what all the various versions mean to you the user and why two very different versions are soon going to be made available. A Brief History of Time Prior to Solr and Lucene 3.1 and the merger of the developments of both projects, both were developed using single paths. This meant that all development was done on...Chris Male2012-03-25T05:11:30ZSolr will check on startup for index locks as of 3.6Luca Cavanna2012-03-23T14:35:45Z2012-03-23T14:35:45ZAs of the 3.6 version Solr will check on startup if the index is locked. While the unlockOnStartup option allows to automatically unlock the index when locked, the SOLR-3156 issue was about checking on startup in order to raise an error and prevent the web application to start in that case. In fact, if you don't use the unlockOnStartup option, you don't know that the index is locked until someone tries to add a document to the index. Thanks to this improvement which has been committed you'll...Luca Cavanna2012-03-23T14:35:45ZChallenges in maintaining a high performance search engine written in JavaSimon Willnauerhttp://www.searchworkings.org/login?p_p_id=58&p_p_lifecycle=0&p_p_mode=view&_58_redirect=http%3A%2F%2Fwww.searchworkings.org%2Fdownload%2F-%2Fcontent%2Fpremium-download%2F444870&p_p_state=normal2012-03-21T14:10:46Z2012-03-21T14:10:46Z During the last decade Apache Lucene became the de-facto standard in open source search technology. Thousands of applications from Twitter Scale Webservices to Computers playing Jeopardy rely on Lucene, a rock-solid, scaleable and fast information-retrieval library entirely written in Java. Maintaining and improving such a popular software library reveals tough challenges in testing, API design, data-structures, concurrency and optimizations. This talk presents the most demanding technical...Simon Willnauer2012-03-21T14:10:46ZDocument Frequency Limited MultiTermQuerysChris Malehttp://www.searchworkings.org/c/blogs/find_entry?noSuchEntryRedirect=null&entryId=4439562012-03-19T16:28:06Z2012-03-19T08:52:57ZIf you've ever looked at user generated data such as tweets, forum comments or even SMS text messages, you'll have noticed there there are many variations in the spelling of words. In some cases they are intentional such as omissions of vowels to reduce message length, in other cases they are unintentional typos and spelling mistakes. Querying this kind of data since only matching the traditional spelling of a word can lead to many valid results being missed. One way to...Chris Male2012-03-19T08:52:57ZNew index statistics in Lucene 4.0Mike McCandlesshttp://www.searchworkings.org/c/blogs/find_entry?noSuchEntryRedirect=null&entryId=4401202012-03-15T10:16:58Z2012-03-15T10:09:40ZIn the past, Lucene recorded only the bare minimal aggregate index statistics necessary to support its hard-wired classic vector space scoring model. Fortunately, this situation is wildly improved in trunk (to be 4.0), where we <a...Mike McCandless2012-03-15T10:09:40ZUsing your Lucene index as input to your Mahout job - Part IFrank Scholtenhttp://www.searchworkings.org/c/blogs/find_entry?noSuchEntryRedirect=null&entryId=4256002012-03-05T19:37:06Z2012-02-17T10:45:12ZThis blog shows you how to use an upcoming Mahout feature, the lucene2seq program or https://issues.apache.org/jira/browse/MAHOUT-944. This program reads the contents of stored fields in your Lucene index and converts them into text sequence files, to be used by a Mahout text clustering job. The tool contains both a sequential and MapReduce implementation and can be run from the command line or from Java using a bean configuration object. In this blog I demonstrate how to use...Frank Scholten2012-02-17T10:45:12ZTransactional LuceneMike McCandlesshttp://www.searchworkings.org/c/blogs/find_entry?noSuchEntryRedirect=null&entryId=4353532012-03-04T20:49:23Z2012-03-04T20:45:57ZMany users don't appreciate the transactional semantics of Lucene's APIs and how this can be useful in search applications. For starters, Lucene implements ACID properties: Atomicity: when you make changes (adding, removing documents) in an IndexWriter session, and then commit, either all (if the commit succeeds) or none (if the commit fails) of your changes will be visible, never...Mike McCandless2012-03-04T20:45:57ZDifferent ways to make auto suggestions with SolrLuca Cavannahttp://www.searchworkings.org/c/blogs/find_entry?noSuchEntryRedirect=null&entryId=4208452012-02-15T16:21:41Z2012-02-06T15:03:38ZNowadays almost every website has a full text search box as well as the auto suggestion feature in order to help users to find what they are looking for, by typing the least possible number of characters possible. The example below shows what this feature looks like in Google. It progressively suggests how to complete the current word and/or phrase, and corrects typo errors. That's a meaningful example which contains multi-term suggestions depending on the most popular queries, combined...Luca Cavanna2012-02-06T15:03:38ZQuery time join will be included in Lucene 3.6!Martijn van Groningen2012-02-08T13:51:33Z2012-02-08T08:41:15ZYesterday the feature query time joining has been added to the stable 3x branch. This means that query time joining will be available in the join contrib when Lucene 3.6 will be released. The query time joining in the stable 3x branch isn't quite the same as what is committed to trunk (Lucene 4.0). The version that is in trunk is about 3 times faster and supports joining on fields that have multiple values per document. Nonetheless the query time joining that will be included in Lucene 3.6 will...Martijn van Groningen2012-02-08T08:41:15Z