Lucene.net DeleteDocuments deleting too much? - lucene.net

I have for example two docs in the index...both of them have fields "Id".
Now, I issue a DeleteDocuemnts on IndexWritter giving it the Id of the first item.
So creating the index:
var document = new global::Lucene.Net.Documents.Document();
document.Add(new Field("Content", "content", Field.Store.YES, Field.Index.ANALYZED, Field.TermVector.WITH_POSITIONS_OFFSETS));
document.Add(new Field("Id", "vladanstrigo", Field.Store.YES, Field.Index.NOT_ANALYZED));
var document2 = new global::Lucene.Net.Documents.Document();
document2.Add(new Field("Content", "content second", Field.Store.YES, Field.Index.ANALYZED, Field.TermVector.WITH_POSITIONS_OFFSETS));
document2.Add(new Field("Id", "ivanstrigo", Field.Store.YES, Field.Index.NOT_ANALYZED));
var directory = FSDirectory.Open("...directory...");
var analyzer = ...GetAnalyzer();
var indexWriter = ...GetWriter();
indexWriter.AddDocument(document);
indexWriter.AddDocument(document2);
This works great...I get two documents in index and they work perfect.
But when I do:
indexWriter.DeleteDocuments(new Term("Id", "ivanstrigo"));
The IndexWriter deleted ALL documents in index...not only the one matching this term...which I don't know how to stop. I only want to delete this one!
What am I doing wrong?

Found an answer...when creating IndexWriter, I've had "true" on create new index...which basically always recreated a new index.
It works now.

Related

Lucene.net 4.8: Search by facet field

I am trying to write a filter in lucene.net which matches all blog posts who have at least one tag from a set of tags.
I'm trying the following but this always returns 0 results:
var bq = new BooleanQuery();
var termsQuery = new BooleanQuery();
foreach (var tag in tags)
{
termsQuery.Add(new TermQuery(new Term("TagSlugs", tag)), Occur.SHOULD);
}
bq.Add(termsQuery, Occur.MUST);
var hits = searcher.Search(bq, page * pageSize);
What am I doing wrong?
My document looks like this:
var doc = new Document
{
new StoredField("Id", blogPost.Id),
new Int32Field("ModuleId", blogPost.ModuleId, Field.Store.YES),
new TextField("Title", blogPost.Title, Field.Store.YES),
new StringField("Slug", blogPost.Slug, Field.Store.YES),
new StoredField("ImagePath", blogPost.ImagePath),
new TextField("Intro", blogPost.Intro, Field.Store.YES),
new TextField("Html", blogPost.Title, Field.Store.YES),
new Int64Field("PublishDate", blogPost.PublishDate.Ticks, Field.Store.YES),
new FacetField("PublishDateTag", blogPost.PublishDate.Year.ToString(), blogPost.PublishDate.Month.ToString(), blogPost.PublishDate.Year.ToString())
};
foreach (var tag in blogPost.TagObjects)
{
doc.Add(new FacetField("Tags", tag.Name));
doc.Add(new FacetField("TagSlugs", tag.Slug));
}
You don't need to use Facet in this specific case, you can use a simple String field and filter by that.
But if you are looking to use Facets, try reading this demo https://lucene.apache.org/core/4_8_0/demo/org/apache/lucene/demo/facet/SimpleFacetsExample.html
It's in Java, but readable if you are C# developer.

Columns Priority while searching with Lucene.NET

Team,
I have 6 indexed columns to search as below.
Name
Description
SKU
Category
Price
SearchCriteria
Now, While searching I have need to perform search on "SearchCritera" column first then rest of the columns.
In short - The products with matched "SearchCritera" shold display on the top of search results.
var parser = new MultiFieldQueryParser(Version.LUCENE_30,
new[] { "SearchCriteria",
"Name",
"Description",
"SKU",
"Category",
"Price"
}, analyzer);
var query = parseQuery(searchQuery, parser);
var finalQuery = new BooleanQuery();
finalQuery.Add(parser.Parse(searchQuery), Occur.SHOULD);
var hits = searcher.Search(finalQuery, null, hits_limit, Sort.RELEVANCE);
There are 2 ways to do it.
The first method is using field boosting:
During indexing set a boost to the fields by their priority:
Field name = new Field("Name", strName, Field.Store.NO, Field.Index.ANALYZED);
name.Boost = 1;
Field searchCriteria = new Field("SearchCriteria", strSearchCriteria, Field.Store.NO, Field.Index.ANALYZED);
searchCriteria.Boost = 2;
doc.Add(name);
doc.Add(searchCriteria);
This way the scoring of the terms in SearchCriteria field will be doubled then the scoring of the terms in the Name field.
This method is better if you always wants SearchCriteria to be more important than Name.
The second method is to using MultiFieldQueryParser boosting during search:
Dictionary<string,float> boosts = new Dictionary<string,float>();
boosts.Add("SearchCriteria",2);
boosts.Add("Name",1);
MultiFieldQueryParser parser = new MultiFieldQueryParser(Lucene.Net.Util.Version.LUCENE_30new[], new[] { "SearchCriteria", "Name"}, analyzer, boosts);
This method is better if you want the boosting to work only in some scenarios of your application.
You should try and see if the boosting number fits your needs (the sensitivity of the priority you are looking for) and change them according to your needs.
to make the example short and readable I used only 2 of your fields but you should use all of them of curseā€¦

Lucene query with field dependency

I have a lucene index of documents that have an _IsPrivate field. I need to query the index to retrieve all documents that are either _IsPrivate == false or _IsPrivate == true and _Owner == me. I've been trying the following lucene query, but I'm not getting the expected results...
_IsPrivate:false OR (_IsPrivate:true AND _Owner:me)
The result is that I'm only getting documents that I own (public and private).
Any thoughts one how I can rewrite my query?
I would use "BooleanQuery" to performe that kind of operation. You make 2 queries, one for each complete search statement, and then add them together with the "SHOULD" operator.
var bq = new BooleanQuery();
var bq1 = new BooleanQuery();
bg1.add(new Term("_IsPrivate", "false"), BooleanClause.Occur.MUST);
var bq2 = new BooleanQuery();
bg2.add(new Term("_IsPrivate", "true"), BooleanClause.Occur.MUST);
bg2.add(new Term("_Owner", "me"), BooleanClause.Occur.MUST);
bq.add(bq1, BooleanClause.Occur.SHOULD);
bq.add(bq2, BooleanClause.Occur.SHOULD);
It might be a bit cumbersome, but I really like to organise my queries this way.
Hope it helps.

Lucene.Net BooleanClause issue

I'm having an issue with Lucene.Net and a BooleanQuery. This is my code:
BooleanQuery query = new BooleanQuery();
String[] types = searchTypes.Split(',');
foreach (string t in types)
query.Add(new TermQuery(new Term("document type", t.ToLower())), BooleanClause.Occur.SHOULD);
This should basically be an OR statement going through documents that have a certain type, which works on its own. However, I also have this query:
Query documentTitleQuery = new WildcardQuery(new Term("title", "*" + documentTitle.ToLower() + "*"));
query.Add(documentTitleQuery, BooleanClause.Occur.MUST);
Which searches for words in a title. Both of these queries work find on their own. When they are used together, it seems Lucene is treating the documentTitleQuery as an OR. So both queries together should return documents of a specific type AND contain specific words in the title, but it is returning all types that have specific words in the title.
Use one more layer of Boolean query to group both:
BooleanQuery topQuery = new BooleanQuery();
...
BooleanQuery query1 = new BooleanQuery();
...
BooleanQuery query2 = new BooleanQuery();
...
topQuery.add(query1, BooleanClause.Occur.MUST);
topQuery.add(query2, BooleanClause.Occur.MUST);

Lucene.Net IndexSearcher not working with BooleanQuery

I have the following code snippet:
QueryParser parser = new MultiFieldQueryParser(Lucene.Net.Util.Version.LUCENE_29, new string[] { Field1, Field2, Field3 }, _analyzer);
parser.SetDefaultOperator(QueryParser.Operator.AND);
Query queryOrig= parser.Parse(queryString);
var query = new BooleanQuery();
query.Add(queryOrig, BooleanClause.Occur.MUST);
if (itemId.HasValue)
query.Add(new TermQuery(new Term("Field3", NumericUtils.IntToPrefixCoded(itemId.Value))), BooleanClause.Occur.MUST);
Hits hits;
if (sortField != null)
{
var sort = new Sort(new SortField(sortField, isDescending));
hits = Searcher.Search(query, null, sort);
}
else
hits = Searcher.Search(query);
This piece of code is always returning 0 hits no matter what.
If I do a direct search using the queryOrig without the boolean, it works fine.
I'm quite sure the data is correct.
Thanks,
Leonardo
Well.. It was a data problem! :D
The lucene works just fine.
Thanks,
Leo!