Solr
Encyclopedia
Solr is an open source
Open source
The term open source describes practices in production and development that promote access to the end product's source materials. Some consider open source a philosophy, others consider it a pragmatic methodology...

 enterprise search platform from the Apache Lucene
Lucene
Apache Lucene is a free/open source information retrieval software library, originally created in Java by Doug Cutting. It is supported by the Apache Software Foundation and is released under the Apache Software License....

 project. Its major features include powerful full-text search, hit highlighting, faceted search, dynamic clustering, database integration, and rich document (e.g., Word, PDF) handling. Providing distributed search and index replication, Solr is highly scalable.

Solr is written in Java
Java (programming language)
Java is a programming language originally developed by James Gosling at Sun Microsystems and released in 1995 as a core component of Sun Microsystems' Java platform. The language derives much of its syntax from C and C++ but has a simpler object model and fewer low-level facilities...

 and runs as a standalone full-text search server within a servlet container such as Apache Tomcat
Apache Tomcat
Apache Tomcat is an open source web server and servlet container developed by the Apache Software Foundation...

. Solr uses the Lucene
Lucene
Apache Lucene is a free/open source information retrieval software library, originally created in Java by Doug Cutting. It is supported by the Apache Software Foundation and is released under the Apache Software License....

 Java search library at its core for full-text indexing and search, and has REST-like HTTP/XML
XML
Extensible Markup Language is a set of rules for encoding documents in machine-readable form. It is defined in the XML 1.0 Specification produced by the W3C, and several other related specifications, all gratis open standards....

 and JSON
JSON
JSON , or JavaScript Object Notation, is a lightweight text-based open standard designed for human-readable data interchange. It is derived from the JavaScript scripting language for representing simple data structures and associative arrays, called objects...

 APIs that make it easy to use from virtually any programming language. Solr's powerful external configuration allows it to be tailored to almost any type of application without Java coding, and it has an extensive plugin architecture when more advanced customization is required.

Apache Lucene
Lucene
Apache Lucene is a free/open source information retrieval software library, originally created in Java by Doug Cutting. It is supported by the Apache Software Foundation and is released under the Apache Software License....

 and Apache Solr are both produced by the same Apache Software Foundation development team since the two projects were merged in 2010. It is common to refer to the technology or products as Lucene/Solr or Solr/Lucene.

History

In 2004, Solr was created by Yonik Seeley at CNET Networks as an in-house project to add search capability for the company website. Yonik Seeley along with Grant Ingersoll and Erik Hatcher went on to launch Lucid Imagination
Lucid Imagination
Lucid Imagination is a Redwood City, California-based company offering commercial support, consulting, training and value-add software for open source Apache Lucene and Apache Solr search technologies. Lucid Imagination is a private company founded in 2007 and publicly launched on January 26, 2009...

, a company providing commercial support, consulting and training for Apache Solr search technologies.

In January 2006, CNET Networks decided to openly publish the source code by donating it to the Apache Software Foundation
Apache Software Foundation
The Apache Software Foundation is a non-profit corporation to support Apache software projects, including the Apache HTTP Server. The ASF was formed from the Apache Group and incorporated in Delaware, U.S., in June 1999.The Apache Software Foundation is a decentralized community of developers...

 under the Lucene top-level project. Like any new project at Apache Software Foundation
Apache Software Foundation
The Apache Software Foundation is a non-profit corporation to support Apache software projects, including the Apache HTTP Server. The ASF was formed from the Apache Group and incorporated in Delaware, U.S., in June 1999.The Apache Software Foundation is a decentralized community of developers...

 it entered an incubation period which helped solve organizational, legal, and financial issues.

In January 2007, Solr graduated from incubation status and grew steadily with accumulated features thereby attracting a robust community of users, contributors, and committers. Although quite new as a public project, it is already used for several high-traffic websites.

In September 2008, Solr 1.3 was released with many enhancements including distributed search capabilities and performance enhancements among many others.

November 2009 saw the release of Solr 1.4 This version introduces enhancements in indexing, searching and faceting along with many other improvements such as Rich Document processing (PDF, Word
Microsoft Word
Microsoft Word is a word processor designed by Microsoft. It was first released in 1983 under the name Multi-Tool Word for Xenix systems. Subsequent versions were later written for several other platforms including IBM PCs running DOS , the Apple Macintosh , the AT&T Unix PC , Atari ST , SCO UNIX,...

, HTML
HTML
HyperText Markup Language is the predominant markup language for web pages. HTML elements are the basic building-blocks of webpages....

), Search Results clustering based on Carrot2
Carrot2
Carrot² is an open source search results clustering engine. It can automatically cluster small collections of documents, e.g. search results or document abstracts, into thematic categories. Apart from two specialized search results clustering algorithms, Carrot² offers ready-to-use components for...

 and also improved database integration. The release also features many additional plug-ins.

In March 2010, the Lucene
Lucene
Apache Lucene is a free/open source information retrieval software library, originally created in Java by Doug Cutting. It is supported by the Apache Software Foundation and is released under the Apache Software License....

 and Solr projects merged. Separate downloads will continue, but the products are now jointly developed by a single set of committers.

Features

  • Uses the Lucene library for full-text search
  • Faceted navigation
  • Hit highlighting
  • Query language supports structured as well as textual search
  • JSON
    JSON
    JSON , or JavaScript Object Notation, is a lightweight text-based open standard designed for human-readable data interchange. It is derived from the JavaScript scripting language for representing simple data structures and associative arrays, called objects...

    , XML
    XML
    Extensible Markup Language is a set of rules for encoding documents in machine-readable form. It is defined in the XML 1.0 Specification produced by the W3C, and several other related specifications, all gratis open standards....

    , PHP
    PHP
    PHP is a general-purpose server-side scripting language originally designed for web development to produce dynamic web pages. For this purpose, PHP code is embedded into the HTML source document and interpreted by a web server with a PHP processor module, which generates the web page document...

    , Ruby
    Ruby (programming language)
    Ruby is a dynamic, reflective, general-purpose object-oriented programming language that combines syntax inspired by Perl with Smalltalk-like features. Ruby originated in Japan during the mid-1990s and was first developed and designed by Yukihiro "Matz" Matsumoto...

    , Python
    Python (programming language)
    Python is a general-purpose, high-level programming language whose design philosophy emphasizes code readability. Python claims to "[combine] remarkable power with very clear syntax", and its standard library is large and comprehensive...

    , XSLT
    XSLT
    XSLT is a declarative, XML-based language used for the transformation of XML documents. The original document is not changed; rather, a new document is created based on the content of an existing one. The new document may be serialized by the processor in standard XML syntax or in another format,...

    , Velocity and custom Java binary output formats over HTTP
  • HTML administration interface
  • Replication to other Solr servers - enables scaling QPS
    Queries per second
    Queries Per Second is a common measure of the amount of search traffic an information retrieval system, such as a search engine or a database, receives during one second....

  • Distributed Search through Sharding - enables scaling content volume
  • Search results clustering based on Carrot2
    Carrot2
    Carrot² is an open source search results clustering engine. It can automatically cluster small collections of documents, e.g. search results or document abstracts, into thematic categories. Apart from two specialized search results clustering algorithms, Carrot² offers ready-to-use components for...

  • Extensible through plugins
  • Pluggable relevance - boost through formula
  • Caching
  • Embeddable in a Java Application

Community and future

Solr has an active development community, both individuals and companies, who contribute new features and bug fixes.

Some of the features available in version 3.1 (which was the first version after merging with Apache Lucene) are:
  • Geo-spatial search
  • Automated management of large clusters through ZooKeeper
    Apache ZooKeeper
    Apache ZooKeeper is a software project of the Apache Software Foundation, providing an open source centralized configuration service and naming registry for large distributed systems. ZooKeeper is a sub project of Hadoop....

  • More function queries
  • Field Collapsing
  • A new auto-suggest component

External links

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK