View all newsletters
Receive our newsletter - data, insights and analysis delivered to you
  1. Hardware
January 29, 2020

What is Apache Lucene?

First written in 1999 by Doug Cutting, still going strong...

By CBR Staff Writer

Apache Lucene, the full-text search library, has operated and been maintained for more than 20 years and for many developers is an integral part of their website and application builds. Essentially Apache Lucene is a full-text search engine software library that provides a Java-based search and indexing platform.

Using Java it lets you add search capabilities to websites or applications. It takes content and adds it to a full-text index which can then be used to perform queries. The content that is added to the index can be ingested from any number of sources such as a SQL/NoSQL databases or even from the website itself.

The software was first written in Java back in 1999 by Doug Cutting before the platform joined the Apache Software Foundation in 2001. To this day it is still one of the most active projects within the Apache Foundation family.

Last year alone, nine versions were released, four committers were made project management committee members and seven community members became official committers. Currently users can get a version of it written in the following programming languages; Perl, C++, Python, Object Pascal, Ruby and PHP.

Content from our partners
Banks must better balance compliance with customer outreach
How one manufacturer transformed data capabilities by moving to the cloud
Why fashion’s future lies in the cloud

Forks include a version from Benipal Technologies who state: “We are heavy Lucene users and have forked the Lucene / SOLR source code to create a high volume, high performance search cluster with MapReduce, HBase and katta integration, achieving indexing speeds as high as 3000 Documents per second with sub 20ms response times on 100 Million + indexed documents.”

Apache Lucene

One of the main reason that Apache Lucene is considered in such high regard is that it can return search responses quickly.

It does so because instead of searching text or content directly, it instead searches an index which has been created in relation to that content. Known as an inverted index it works in a similar manner as the index of a book. The engine itself is incredibly robust, and while the engine is commonly used in a one thread per query manner when initiating a search, the engine can actually execute a single query concurrently using multiple threads.

PMC member and committer for the Apache Lucene project Michael McCandless explains this in detail in a blog that states: “Lucene’s IndexSearcher class, responsible for executing incoming queries to find their top matching hits from your index, accepts an optional Executor (e.g. a thread pool) during construction.

“If you pass an Executor and your CPUs are idle enough (i.e. your server is well below its red-line QPS throughput capacity), Lucene will use multiple concurrent threads to find the top overall hits for each query.”

Apache Lucene 8.4.1 can be downloaded here.

See Also: You Could Pee These Files, or Store them in a 3D Printed Rabbit

Topics in this article : , , , ,
Websites in our network
Select and enter your corporate email address Tech Monitor's research, insight and analysis examines the frontiers of digital transformation to help tech leaders navigate the future. Our Changelog newsletter delivers our best work to your inbox every week.
  • CIO
  • CTO
  • CISO
  • CSO
  • CFO
  • CDO
  • CEO
  • Architect Founder
  • MD
  • Director
  • Manager
  • Other
Visit our privacy policy for more information about our services, how New Statesman Media Group may use, process and share your personal data, including information on your rights in respect of your personal data and how you can unsubscribe from future marketing communications. Our services are intended for corporate subscribers and you warrant that the email address submitted is your corporate email address.