Apache Solr
Apache Solr
Solr is an open source enterprise search platform from the Apache Lucene project. Its major features include powerful full-text search, hit highlighting, faceted search, dynamic clustering, database integration, and rich document (e.g., Word, PDF) handling. Providing distributed search and index replication, Solr is highly scalable.
Solr is written in Java and runs as a standalone full-text search server within a servlet container such as Apache Tomcat. Solr uses the Lucene Java search library at its core for full-text indexing and search, and has REST-like HTTP/XML and JSON APIs that make it easy to use from virtually any programming language. Solr's powerful external configuration allows it to be tailored to almost any type of application without Java coding, and it has an extensive plugin architecture when more advanced customization is required.
Apache Lucene and Apache Solr are both produced by the same ASF development team since the project merge in 2010. It is common to refer to the technology or products as Lucene/Solr or Solr/Lucene.
In January 2006, CNET Networks decided to openly publish the source code by donating it to the Apache Software Foundation under the Lucene top-level project. Like any new project at Apache Software Foundation it entered an incubation period which helped solve organizational, legal, and financial issues.
In January 2007, Solr graduated from incubation status and grew steadily with accumulated features thereby attracting a robust community of users, contributors, and committers. Although quite new as a public project, it is already used for several high-traffic websites.
In September 2008, Solr 1.3 was released with many enhancements including distributed search capabilities and performance enhancements among many others.
November 2009 saw the release of Solr 1.4 This version introduces enhancements in indexing, searching and faceting along with many other improvements such as Rich Document processing (PDF, Word, HTML), Search Results clustering based on Carrot2 and also improved database integration. The release also features many additional plug-ins.
In March 2010, the Lucene and Solr projects merged. Separate downloads will continue, but the products are now jointly developed by a single set of committers.
Features
Uses the Lucene library for full-text search
- Faceted navigation
- Hit highlighting
- Query language supports structured as well as textual search
- JSON, XML, PHP, Ruby, Python, XSLT, Velocity and custom Java binary output formats over HTTP
- Replication to other Solr servers - enables scaling QPS
- Distributed Search through Sharding - enables scaling content volume
- Search results clustering based on Carrot2
- Extensible through plugins
- Pluggable relevance - boost through formula
- Caching
- Embeddable in a Java Application
Because no more code is written than necessary to pass a failing test case, automated tests tend to cover every code path. For example, in order for a TDD developer to add an else branch to an existing if statement, the developer would first have to write a failing test case that motivates the branch. As a result, the automated tests resulting from TDD tend to be very thorough: they will detect any unexpected changes in the code's behaviour. This detects problems that can arise where a change later in the development cycle unexpectedly alters other functionality.