Lucene Index Maintenance (Insert, Update and Delete)

1 Mins read
lucene index maintenance, lucene index insert, lucene index update not working, lucene index delete fails, lucene index fast insert, lucene index fast update, lucene index maintenance issues, lucene index maintenance problems.

Lucene is a powerfull search engine with great indexing API which can be used to build search feature for your web application.
In this post I am trying to address the maintenance aspects of a Lucene index.
Once the initial index is created here are the steps which you may need to perform

1. Inserting new documents
2. Updating existing documents
3. Deleting existing documents

Inserting new documents:
Lucene doesnt check for duplicate documents, so you need to build your own logic to identify whether a document is new document or a existing one.
Once identied, inserting is not different than creating the index we just need to take care of recreate flag,
which needs to be false so that the index directory is not overwritten (recreated). Adding the document can be done using IndexWriter.addDocument() call for each document you want to add.

Updating existing documents:
As I have already mentioned that Lucene is not going to check for duplicates, so in case of updating
if its an existing document then it can be done in two steps,
first delete the document and then add the new document to index.

Deleting existing documents:
Deleting can be easily done by IndexReader.deleteDocuments(Terms) and IndexWriter.deleteDocuments(Terms)

Problem deleting documents:

What I have observed is that if the document field is UN_TOKENIZED then these methods work well. But if the Document field is TOKENIZED then delete fails.

One good way to do this is to keep same field with a different name and with UN_TOKENIZED type index field, and while deleting use the same field Term to delete the document.

Deleting the index documents by number is simple and easy. It works well if you know the document number.

You need to reopen all the searchers which were already open.

Related posts

How to Hack an Instagram Account: An Expert's Opinion

2 Mins read
It’s been more than ten years since Instagram was launched, and it brought us an infinite supply of photos, videos, and many…

Performance Measurement: Stress Testing for Linux

3 Mins read
The operating system is the essential component that drives a computer. They typically run a large amount of software that handles a…

Majority Of Cash Consultants Area Unit Proud Of Bitcoin

3 Mins read
The mystery is solved as people have started believing that cryptocurrency is not a myth but the bread and butter of many…
Power your team with InHype

Add some text to explain benefits of subscripton on your services.


Leave a Reply

Your email address will not be published. Required fields are marked *