Lucene Duplicated results / spaces in text search - lucene.net

I’m actually using Lucene 2.9.4.1 and everything works just fine if I search for something that exists just once in the same line.
Per instance, if Lucene find the same string that I’m looking for in the same line, I have duplicated (or more) results.
I’m actually using the following BooleanQuery:
booleanQuery.Add(new TermQuery(new Term(propertyInfo.Name, textSearch)), BooleanClause.Occur.SHOULD);
The second issue is about searching by something with spaces like “hello world”: never works.
Can anyone advise me or help me with these two malfunctioning features, please?
Thank you so much in advance,
Best regards,

Well, I just found the answer that solved both of my issues =)
I was using this:
BooleanQuery booleanQuery = new BooleanQuery();
PropertyInfo[] propertyInfos = typeof(T).GetProperties();
foreach (PropertyInfo propertyInfo in propertyInfos)
{
booleanQuery.Add(new TermQuery(new Term(propertyInfo.Name, textSearch)), BooleanClause.Occur.SHOULD);
}
And now I use this:
var booleanQuery = new BooleanQuery();
textSearch = QueryParser.Escape(textSearch.Trim().ToLower());
string[] properties = typeof(T).GetProperties().Select(x => x.Name).ToArray();
Analyzer analyzer = new StandardAnalyzer(global::Lucene.Net.Util.Version.LUCENE_29);
MultiFieldQueryParser titleParser = new MultiFieldQueryParser(Lucene.Net.Util.Version.LUCENE_29, properties, analyzer);
Query titleQuery = titleParser.Parse(textSearch);
booleanQuery.Add(titleQuery, BooleanClause.Occur.SHOULD);
It seems that Analyzer and MultiFieldQueryParser are the solution for my problems: no more duplicated results, I can search by something with spaces and … the performance as significantly raised up (faster results) =)

Related

Applying the same Analyzer to Queries and Fields

I am trying to build a basic search for my API backend. Users pass arbitrary queries and the backend is supposed to return results (obviously). I would prefer a solution that works with a local index as well as Elasticsearch.
On my entity I defined an analyzer like this:
#AnalyzerDef(name = "ngram",
tokenizer = #TokenizerDef(factory = StandardTokenizerFactory.class ),
filters = {
#TokenFilterDef(factory = StandardFilterFactory.class),
#TokenFilterDef(factory = LowerCaseFilterFactory.class),
#TokenFilterDef(factory = StopFilterFactory.class),
#TokenFilterDef(factory = NGramFilterFactory.class,
params = {
#Parameter(name = "minGramSize", value = "2"),
#Parameter(name = "maxGramSize", value = "3") } )
}
)
For the query, I tried the following:
FullTextEntityManager fullTextEntityManager = Search.getFullTextEntityManager(this.entityManager);
Analyzer analyzer = fullTextEntityManager.getSearchFactory().getAnalyzer("ngram");
QueryParser queryParser = new MultiFieldQueryParser(ALL_FIELDS, analyzer);
queryParser.setDefaultOperator(QueryParser.AND_OPERATOR);
org.apache.lucene.search.Query query = queryParser.parse(queryString);
javax.persistence.Query persistenceQuery =
fullTextEntityManager.createFullTextQuery(query, MyEntity.class);
List<MyEntity> result = persistenceQuery.getResultList();
As far as I understand, I need to provide an analyzer for the Query so that the search query are "ngram-tokenized" and a match can be found. Before, I used SimpleAnalyzer and as a result, the search only matched full words which - I think - backs my theory (Sorry, I am still learning this).
The above code gives me a NullPointerException:
java.lang.NullPointerException: null
at org.hibernate.search.engine.impl.ImmutableSearchFactory.getAnalyzer(ImmutableSearchFactory.java:370) ~[hibernate-search-engine-5.11.1.Final.jar:5.11.1.Final]
at org.hibernate.search.engine.impl.MutableSearchFactory.getAnalyzer(MutableSearchFactory.java:203) ~[hibernate-search-engine-5.11.1.Final.jar:5.11.1.Final]
at org.hibernate.search.impl.SearchFactoryImpl.getAnalyzer(SearchFactoryImpl.java:50) ~[hibernate-search-orm-5.11.1.Final.jar:5.11.1.Final]
in the line
Analyzer analyzer = fullTextEntityManager.getSearchFactory().getAnalyzer("ngram");
You cannot retrieve the analyzer from Hibernate Search when using the Elasticsearch integration, because in that case there is no analyzer locally: the analyzer only exists remotely, in the Elasticsearch cluster.
If you only need a subset of the query syntax, try out the "simple query string" query: it's a query that can be built using the DSL (so it will work the same with Lucene and Elasticsearch) and that provides the most common features (boolean queries, fuzziness, phrases, ...). For example:
Query luceneQuery = queryBuilder.simpleQueryString()
.onFields("name", "history", "description")
.matching("war + (peace | harmony)")
.createQuery();
The syntax is a bit different, but only because it's aiming at end users and tries to be simpler.
EDIT: If simple query strings are not an option, you can create an analyzer manually: this should work even when using the Elasticsearch integration.
org.apache.lucene.analysis.custom.CustomAnalyzer#builder() should be a good starting point. There are several examples in the javadoc of that class.
Make sure you only create the analyzer once and store it somewhere, e.g. in a static constant: creating an analyzer may be costly.

Can't delete document on Lucene.Net

I am trying to delete a document but i am unable to delete it in any way. A specific thing that is related to my example that i am using RAMDirectory as directory and i am using Lucene.Net 3.0.3 version. My example is as below.
public void DeleteIndex(IndexWriter writer,IndexSearcher searcher)
{
var boolQuery = new BooleanQuery();
boolQuery.Add(new TermQuery(new Term("Id", "2")), Occur.MUST);
boolQuery.Add(new TermQuery(new Term("Type", "Product")), Occur.MUST);
writer.DeleteDocuments(boolQuery);
writer.Optimize(true);
//writer.Flush(true, true, true);//even this line doesn't help
writer.Commit();
var result = searcher.Search(boolQuery,1); // I can access deleted doc in search results
}
After writer.Commit(); you need to reopen you searcher.
IndexReader newReader = YOURIndexReader.Reopen(true);
searcher= new IndexSearcher(newReader );
...
The code example here are only examples, not a working code(!), i'm sure you can continue from here...

Lucene.Net - weird behaviour in different servers

I was writing a search for one of our sites: (SITE A)
BooleanQuery booleanQuery = new BooleanQuery();
foreach (var field in fields)
{
QueryParser qp = new QueryParser(field, new StandardAnalyzer());
Query query = qp.Parse(search.ToLower() + "*");
if (field.Contains("Title")) { query.SetBoost((float)1.8); }
booleanQuery.Add(query, BooleanClause.Occur.SHOULD);
}
// CODE DIFFERENCE IS HERE
Query query2 = new TermQuery(new Term("StateProperties.IsActive", "True"));
booleanQuery.Add(query2, BooleanClause.Occur.MUST);
// END CODE DIFFERENCE
Lucene.Net.Search.TopScoreDocCollector collector = Lucene.Net.Search.TopScoreDocCollector.create(21, true);
searcher.Search(booleanQuery, collector);
hits = collector.TopDocs().scoreDocs;
this was working as expected.
since we own a few sites, and they use the same skeleton,
i uploaded the search to another site ( SITE B )
but the search stopped returning results.
after playing a round a bit with the code, i managed to make it work like so: (showing only the rewriten lines of code)
QueryParser qp2 = new QueryParser("StateProperties.IsActive", new StandardAnalyzer());
Query query2 = qp2.Parse("True");
booleanQuery.Add(query2, BooleanClause.Occur.MUST);
anyone knows why this is happening ?
i have checked the Lucene dll version, and its the same version in both sites (2.9.2.2)
is the code i have written in SITE A wrong ? is SITE B code wrong ?
is this my fault at all ? can production server infulance something like this ?
Doesn't they have individual indexes on disk? If they have been indexed differently, they would also return different results. One thing that comes to mind is if there is some sort of case sensitivity that matters, becayse a TermQuery will look for an EXACT match, where as the parser will try to tokenize/filter the search term according to the analyzer (and probably search for "true" instead of "True".

Lucene.NET stemming problem

I'm running into a problem using the SnowBallAnalyzer in Lucene.NET. It works great for some words, but others it doesn't find any results on at all, and I'm not sure how to dig into this further to find out what is happening. I am testing the search on the USDA Food Description file which can be found here (http://www.ars.usda.gov/SP2UserFiles/Place/12354500/Data/SR23/asc/FOOD_DES.txt). I'm using the English stemming algorithm. I get the following results when searching for "eggs":
Bagels, egg
Bread, egg
Egg, whole, raw, fresh
Egg, white, raw, fresh
Egg, yolk, raw, fresh
Egg, yolk, raw, frozen
Egg, whole, cooked, fried
...
Those results are great. However I get no results at all when searching for "apple". When I use the StandardAnalyzer, and search for "apple" I get the following results.
Croissants, apple
Strudel, apple,
Babyfood, juice, apple
Babyfood, apple-banana juice
...
Not the best results, but at least it's showing something. Anyone know why the stemming analyzer would be filtering in such a way that I would not get any results?
Edit: Here is my prototype code that I'm working with.
static string[] Search(string searchTerm)
{
//Lucene.Net.Analysis.Analyzer analyzer = new Lucene.Net.Analysis.Snowball.SnowballAnalyzer("English");
Lucene.Net.Analysis.Analyzer analyzer = new Lucene.Net.Analysis.Standard.StandardAnalyzer();
Lucene.Net.QueryParsers.QueryParser parser = new Lucene.Net.QueryParsers.QueryParser(Lucene.Net.Util.Version.LUCENE_29, "text", analyzer);
Lucene.Net.Search.Query query = parser.Parse(searchTerm);
Lucene.Net.Search.Searcher searcher = new Lucene.Net.Search.IndexSearcher(Lucene.Net.Store.FSDirectory.Open(new DirectoryInfo("./index/")), true);
var topDocs = searcher.Search(query, null, 10);
List<string> results = new List<string>();
foreach(var scoreDoc in topDocs.scoreDocs)
{
results.Add(searcher.Doc(scoreDoc.doc).Get("raw"));
}
return results.ToArray();
}
Are you sure you used Lucene.Net.Analysis.Snowball.SnowballAnalyzer("English") to write your index ? You have to use the same analyzer to write and query the index.

How do I disable some entities based on a few properties in NHibernate Search?

Im still pretty new to NHibernate.Search so please bear with me if this is stupid question :)
Say, I have indexed some entities of type BlogPost, which has a property called IsDeleted. If IsDeleted is set to true, I don't want my queries to show this particular blogpost.
Is this possible? And if it is - How? :P
Thanks in advance
- cwap
// Using NHibernate.Linq:
var result = Session.Linq<BlogPost>().Where(post => !post.IsDeleted).ToList();
// Using HQL:
var hql = "from BlogPost bp where bp.IsDeleted == false";
var result = Session.CreateQuery(hql).List<BlogPost>();
// Using Criteria API:
var result = s.CreateCriteria(typeof(BlogPost))
.Add(Restrictions.Eq("IsDeleted", false));
.List<BlogPost>();
NHibernate.Linq
HQL: Hibernate Query Language
Found the solution myself. I added the [Field(Index.Tokenized, Store = Store.Yes)]-attribute to the IsDeleted property, and added this clause to any query inbound:
string q = "(" + userQuery + ") AND IsDeleted:False";
I knew it was something simple :)