ASP.NET and Lucene proximate search - lucene.net

In Lucene's proximity search the order of the words searched is not maintained.
Is there any way to filter the search so that word1 will always come before word2 in resulted docs.

There is the SpanNearQuery for that.
http://lucene.apache.org/core/old_versioned_docs/versions/2_9_4/api/all/org/apache/lucene/search/spans/SpanNearQuery.html
SpanTermQuery tq1 = new SpanTermQuery(new Term("field", "word1"));
SpanTermQuery tq2 = new SpanTermQuery(new Term("field", "word2"));
SpanNearQuery spanNear = new SpanNearQuery(new SpanQuery[]{tq1,tq2}, 2, true);

Related

Lucene Duplicated results / spaces in text search

I’m actually using Lucene 2.9.4.1 and everything works just fine if I search for something that exists just once in the same line.
Per instance, if Lucene find the same string that I’m looking for in the same line, I have duplicated (or more) results.
I’m actually using the following BooleanQuery:
booleanQuery.Add(new TermQuery(new Term(propertyInfo.Name, textSearch)), BooleanClause.Occur.SHOULD);
The second issue is about searching by something with spaces like “hello world”: never works.
Can anyone advise me or help me with these two malfunctioning features, please?
Thank you so much in advance,
Best regards,
Well, I just found the answer that solved both of my issues =)
I was using this:
BooleanQuery booleanQuery = new BooleanQuery();
PropertyInfo[] propertyInfos = typeof(T).GetProperties();
foreach (PropertyInfo propertyInfo in propertyInfos)
{
booleanQuery.Add(new TermQuery(new Term(propertyInfo.Name, textSearch)), BooleanClause.Occur.SHOULD);
}
And now I use this:
var booleanQuery = new BooleanQuery();
textSearch = QueryParser.Escape(textSearch.Trim().ToLower());
string[] properties = typeof(T).GetProperties().Select(x => x.Name).ToArray();
Analyzer analyzer = new StandardAnalyzer(global::Lucene.Net.Util.Version.LUCENE_29);
MultiFieldQueryParser titleParser = new MultiFieldQueryParser(Lucene.Net.Util.Version.LUCENE_29, properties, analyzer);
Query titleQuery = titleParser.Parse(textSearch);
booleanQuery.Add(titleQuery, BooleanClause.Occur.SHOULD);
It seems that Analyzer and MultiFieldQueryParser are the solution for my problems: no more duplicated results, I can search by something with spaces and … the performance as significantly raised up (faster results) =)

BooleanFilter with empty FilterClause in Lucene.NET 3.0.3

I am using BooleanFilter for performing filter in lucene index.
code:
BooleanFilter _filter = new BooleanFilter();
var locationFilter = new TermsFilter();
locationFilter.AddTerm(new Term("Location", "Dhaka"));
_filter.Add(new FilterClause(locationFilter, Occur.MUST));
And in my Search code snippet
var hits = searcher.Search(query, _filter, hits_limit, Sort.RELEVANCE).ScoreDocs;
This code works fine;
But sometimes i don't need to filter with location then i just put a empty BooleanFilter and perform search like this:
BooleanFilter _filter = new BooleanFilter();
var hits = searcher.Search(query, _filter, hits_limit, Sort.RELEVANCE).ScoreDocs;
Now why do hits not return me no search result?;
I think that your empty BooleanFilter is matching nothing..
Try using the other overload of search search(Query query, int n, Sort sort)

Lucene.Net - weird behaviour in different servers

I was writing a search for one of our sites: (SITE A)
BooleanQuery booleanQuery = new BooleanQuery();
foreach (var field in fields)
{
QueryParser qp = new QueryParser(field, new StandardAnalyzer());
Query query = qp.Parse(search.ToLower() + "*");
if (field.Contains("Title")) { query.SetBoost((float)1.8); }
booleanQuery.Add(query, BooleanClause.Occur.SHOULD);
}
// CODE DIFFERENCE IS HERE
Query query2 = new TermQuery(new Term("StateProperties.IsActive", "True"));
booleanQuery.Add(query2, BooleanClause.Occur.MUST);
// END CODE DIFFERENCE
Lucene.Net.Search.TopScoreDocCollector collector = Lucene.Net.Search.TopScoreDocCollector.create(21, true);
searcher.Search(booleanQuery, collector);
hits = collector.TopDocs().scoreDocs;
this was working as expected.
since we own a few sites, and they use the same skeleton,
i uploaded the search to another site ( SITE B )
but the search stopped returning results.
after playing a round a bit with the code, i managed to make it work like so: (showing only the rewriten lines of code)
QueryParser qp2 = new QueryParser("StateProperties.IsActive", new StandardAnalyzer());
Query query2 = qp2.Parse("True");
booleanQuery.Add(query2, BooleanClause.Occur.MUST);
anyone knows why this is happening ?
i have checked the Lucene dll version, and its the same version in both sites (2.9.2.2)
is the code i have written in SITE A wrong ? is SITE B code wrong ?
is this my fault at all ? can production server infulance something like this ?
Doesn't they have individual indexes on disk? If they have been indexed differently, they would also return different results. One thing that comes to mind is if there is some sort of case sensitivity that matters, becayse a TermQuery will look for an EXACT match, where as the parser will try to tokenize/filter the search term according to the analyzer (and probably search for "true" instead of "True".

Facing Issue in zend_search_lucene

I am using Zend Lucene Search:
......
$results = $test->fetchAll();
setlocale(LC_CTYPE, 'de_DE.iso-8859-1');
Zend_Search_Lucene_Analysis_Analyzer::setDefault(new Zend_Search_Lucene_Analysis_Analyzer_Common_Utf8());
foreach ($results as $result) {
$doc = new Zend_Search_Lucene_Document();
// add Fields
$doc->addField(
Zend_Search_Lucene_Field::Text('testid', $result->id));
$doc->addField(
Zend_Search_Lucene_Field::Keyword('testemail', strtolower(($result->email))));
$doc->addField(
Zend_Search_Lucene_Field::Text('testconfirmdate', $result->confirmdate));
$doc->addField(
Zend_Search_Lucene_Field::Text('testcreateddate', $result->createddate));
// Add document to the index
$index->addDocument($doc);
}
// Optimize index.
$index->optimize();
// Search by query
setlocale(LC_CTYPE, 'de_DE.iso-8859-1');
if(strlen($Data['name']) > 2){
//$query = Zend_Search_Lucene_Search_QueryParser::parse($Data['name'].'*');
$pattern = new Zend_Search_Lucene_Index_Term($Data['name'].'*');
$query = new Zend_Search_Lucene_Search_Query_Wildcard($pattern);
$this->view->hits = $index->find(strtolower($query));
}
else{
$query = $Data['name'];
$this->view->hits = $index->find($query);
}
............
Works fine here:
It works when I give complete word, first 3 character, case insensitive words
My issues are:
When I search for email, i got error like "Wildcard search is supported only for non-multiple word terms "
When I search for number/date like "1234" or 09/06/2011, I got error like "At least 3 non-wildcard characters are required at the beginning of pattern"
I want to search date, email, number here.
In file zend/search/Lucene/search/search/query/wildcard a parameter is set,
private static $_minPrefixLength = 3;
chnage it and it may work..!
Based on NaanuManu's suggestion, I did a little more digging to figure this out - I posted my answer on a related question here, but repeating for convenience:
Taken directly from the Zend Reference documentation, you can use:
Zend_Search_Lucene_Search_Query_Wildcard::getMinPrefixLength() to
query the minimum required prefix length and
use Zend_Search_Lucene_Search_Query_Wildcard::setMinPrefixLength() to
set it.
So my suggestion would be either of two things:
Set the prefixMinLength to 0 using Zend_Search_Lucene_Search_Query_Wildcard::setMinPrefixLength(0)
Validate all search queries using javascript or otherwise to ensure there is a minimum of Zend_Search_Lucene_Search_Query_Wildcard::getMinPrefixLength() before any wildcards used (I recommend querying that instead of assuming the default of "3" so the validation is flexible)

Lucene.NET stemming problem

I'm running into a problem using the SnowBallAnalyzer in Lucene.NET. It works great for some words, but others it doesn't find any results on at all, and I'm not sure how to dig into this further to find out what is happening. I am testing the search on the USDA Food Description file which can be found here (http://www.ars.usda.gov/SP2UserFiles/Place/12354500/Data/SR23/asc/FOOD_DES.txt). I'm using the English stemming algorithm. I get the following results when searching for "eggs":
Bagels, egg
Bread, egg
Egg, whole, raw, fresh
Egg, white, raw, fresh
Egg, yolk, raw, fresh
Egg, yolk, raw, frozen
Egg, whole, cooked, fried
...
Those results are great. However I get no results at all when searching for "apple". When I use the StandardAnalyzer, and search for "apple" I get the following results.
Croissants, apple
Strudel, apple,
Babyfood, juice, apple
Babyfood, apple-banana juice
...
Not the best results, but at least it's showing something. Anyone know why the stemming analyzer would be filtering in such a way that I would not get any results?
Edit: Here is my prototype code that I'm working with.
static string[] Search(string searchTerm)
{
//Lucene.Net.Analysis.Analyzer analyzer = new Lucene.Net.Analysis.Snowball.SnowballAnalyzer("English");
Lucene.Net.Analysis.Analyzer analyzer = new Lucene.Net.Analysis.Standard.StandardAnalyzer();
Lucene.Net.QueryParsers.QueryParser parser = new Lucene.Net.QueryParsers.QueryParser(Lucene.Net.Util.Version.LUCENE_29, "text", analyzer);
Lucene.Net.Search.Query query = parser.Parse(searchTerm);
Lucene.Net.Search.Searcher searcher = new Lucene.Net.Search.IndexSearcher(Lucene.Net.Store.FSDirectory.Open(new DirectoryInfo("./index/")), true);
var topDocs = searcher.Search(query, null, 10);
List<string> results = new List<string>();
foreach(var scoreDoc in topDocs.scoreDocs)
{
results.Add(searcher.Doc(scoreDoc.doc).Get("raw"));
}
return results.ToArray();
}
Are you sure you used Lucene.Net.Analysis.Snowball.SnowballAnalyzer("English") to write your index ? You have to use the same analyzer to write and query the index.