View all newsletters
Receive our newsletter - data, insights and analysis delivered to you
  1. Technology
  2. Hardware
January 29, 2020

What is Apache Lucene?

First written in 1999 by Doug Cutting, still going strong...

By CBR Staff Writer

Apache Lucene, the full-text search library, has operated and been maintained for more than 20 years and for many developers is an integral part of their website and application builds. Essentially Apache Lucene is a full-text search engine software library that provides a Java-based search and indexing platform.

Using Java it lets you add search capabilities to websites or applications. It takes content and adds it to a full-text index which can then be used to perform queries. The content that is added to the index can be ingested from any number of sources such as a SQL/NoSQL databases or even from the website itself.

The software was first written in Java back in 1999 by Doug Cutting before the platform joined the Apache Software Foundation in 2001. To this day it is still one of the most active projects within the Apache Foundation family.

Last year alone, nine versions were released, four committers were made project management committee members and seven community members became official committers. Currently users can get a version of it written in the following programming languages; Perl, C++, Python, Object Pascal, Ruby and PHP.

Content from our partners
Why all businesses must democratise data analytics
Unlocking the value of artificial intelligence and machine learning
Behind the priorities of tech and cybersecurity leaders

Forks include a version from Benipal Technologies who state: “We are heavy Lucene users and have forked the Lucene / SOLR source code to create a high volume, high performance search cluster with MapReduce, HBase and katta integration, achieving indexing speeds as high as 3000 Documents per second with sub 20ms response times on 100 Million + indexed documents.”

Apache Lucene

One of the main reason that Apache Lucene is considered in such high regard is that it can return search responses quickly.

It does so because instead of searching text or content directly, it instead searches an index which has been created in relation to that content. Known as an inverted index it works in a similar manner as the index of a book. The engine itself is incredibly robust, and while the engine is commonly used in a one thread per query manner when initiating a search, the engine can actually execute a single query concurrently using multiple threads.

PMC member and committer for the Apache Lucene project Michael McCandless explains this in detail in a blog that states: “Lucene’s IndexSearcher class, responsible for executing incoming queries to find their top matching hits from your index, accepts an optional Executor (e.g. a thread pool) during construction.

“If you pass an Executor and your CPUs are idle enough (i.e. your server is well below its red-line QPS throughput capacity), Lucene will use multiple concurrent threads to find the top overall hits for each query.”

Apache Lucene 8.4.1 can be downloaded here.

See Also: You Could Pee These Files, or Store them in a 3D Printed Rabbit

Topics in this article: , , , ,
Websites in our network
NEWSLETTER Sign up Tick the boxes of the newsletters you would like to receive. Tech Monitor's research, insight and analysis examines the frontiers of digital transformation to help tech leaders navigate the future. Our Changelog newsletter delivers our best work to your inbox every week.
I consent to New Statesman Media Group collecting my details provided via this form in accordance with the Privacy Policy
SUBSCRIBED

THANK YOU