Can we get match items' position at Lucence.net search result? - lucene.net

I'm using Lucene.net to implement fulltext search feature in an Asp.net application. The search result page should high light the match items. I got the instance of Lucene.Net.Search.Hits and used .Doc(int i) method to get Lucene Document.
But I don't know how to get the position of match item by existing method or property of some Lucene class. Does Lucene.net provide any feature to support high light query string?

You can use Highlighter or FastVectorHighlighter which can be found in contrib

As the previous answerer said, you should use either Highlighter or FastVectorHighlighter from contrib.
Here's an example of using Highlighter lib to get highlighted fragments:
Formatter formatter = new SimpleHTMLFormatter("<span><b>", "</b></span>");
Lucene.Net.Highlight.Scorer scorer = new QueryScorer(query, field);
Lucene.Net.Highlight.Encoder encoder = new SimpleHTMLEncoder();
var highlighter = new Highlighter(formatter, encoder, scorer);
highlighter.SetTextFragmenter(new SimpleFragmenter(100));
string[] fragments =
highlighter.GetBestFragments(DefaultAnalyzer, field, doc.Get(field), 3);
Some Highlighter-related gotchas:
To highlight a field, it should be added to index with Field.Store.YES option
Your query should be rewritten before passing it to highlighter
The analyzer you pass to highlighter should be the same you use for indexing and searching

Related

Hibernate Search Turkish Character

I am trying to search a field filled with Turkish characters.
For example: Müdürlük.
But when I try to query this field not able to get results. I am new to hibernate search. What should I do?
There may be other problem (I can't tell without the code), but one thing you'll probably want is to use a language-specific analyzer, so that "mudurluk" will match "Müdürlük" for instance, or so that "istanbul" will match "İstanbul" (capital dotted "i").
To do that with Hibernate Search using annotations, fill in the analyzer attribute of your #Field annotation:
#Field(analyzer = #Analyzer(impl = org.apache.lucene.analysis.tr.TurkishAnalyzer.class))
String myProperty;
If you mapped your entity using the programmatic API, the process should be fairly similar.
Please see the official documentation for more details about analyzers in Hibernate Search: https://docs.jboss.org/hibernate/stable/search/reference/en-US/html_single/#_analyzer
EDIT: don't forget to reindex your data after you changed the analyzer.

How to search across all the fields?

In Lucene, we can use TermQuery to search a text with a field. I am wondering how to search a keyword across a bunch of fields or all the searchable fields?
Another approach, which doesn't require to index anything more than what you already have, nor to combine different queries, is using the MultiFieldQueryParser.
You can provide a list of fields where you want to search on and your query, that's all.
MultiFieldQueryParser queryParser = new MultiFieldQueryParser(
Version.LUCENE_41,
new String[]{"title", "content", "description"},
new StandardAnalyzer(Version.LUCENE_41));
Query query = queryParser.parse("here goes your query");
This is how I would do it with the original lucene library written in Java. I'm not sure whether the MultiFieldQueryParser is available in lucene.net too.
Two approaches
1) Index-time approach: Use a catch-all field. This is nothing but appending all the text from all the fields (total text from your input doc) and place that resulting huge text in a single field. You've to add an additional field while indexing to act as a catch-all field.
2) Search-time approach: Use a BooleanQuery to combine multiple queries, for example TermQuery instances. Those multiple queries can be formed to cover all the target fields.
Example check at the end of the article.
Use approach 2 if you know the target-field list at runtime. Otherwise, you've got to use the 1st approach.
Another easy approach to search across all fields using "MultifieldQueryParser" is use IndexReader.FieldOption.ALL in your query.
Here is example in c#.
Directory directory = FSDirectory.Open(new DirectoryInfo(HostingEnvironment.MapPath(VirtualIndexPath)));
//get analyzer
Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_29);
//get index reader and searcher
IndexReader indexReader__1 = IndexReader.Open(directory, true);
Searcher indexSearch = new IndexSearcher(indexReader__1);
//add all possible fileds in multifieldqueryparser using indexreader getFieldNames method
var queryParser = new MultiFieldQueryParser(Version.LUCENE_29, indexReader__1.GetFieldNames(IndexReader.FieldOption.ALL).ToArray(), analyzer);
var query = queryParser.Parse(Criteria);
TopDocs resultDocs = null;
//perform search
resultDocs = indexSearch.Search(query, indexReader__1.MaxDoc());
var hits = resultDocs.scoreDocs;
click here to check out my pervious answer to same quesiton in vb.net

handling wildcard search with Special character in Lucene.net search

how do i perform wildcard search a word in lucene that contain special character. for example i have a word like "91-95483534" if i search like "91*" it works and if i search like "91-95483534" also works fine. but my senario is that to search "91-9548*". if i perform like this "91-9548*". i got no output. am i missing anything. my actual code is given below:
MultiFieldQueryParser queryParser = new MultiFieldQueryParser(new string[] {"column1","column2"}, new StandardAnalyzer());
queryParser.SetAllowLeadingWildcard(true);
Query query = queryParser.Parse(QueryParser.Escape(strKeyWord) + "*");
As you used StandardAnalyzer, that indexed your word as 91 and 95483534 if you used INDEX_ANALYZED when you index....
if you want to search as 91-9548* , use INDEX_NOT_ANALYZED when you index that specified field which have "91-95483534" as terms
http://lucene.apache.org/core/old_versioned_docs/versions/3_0_3/api/core/org/apache/lucene/document/Field.Index.html

Make Lucene index a value and store another

I want Lucene.NET to store a value while indexing a modified, stripped-down version of the stored value. e.g. Consider the value:
this_example-has some/weird (chars) 100%
I want it stored right like that (so that I can retrieve exactly that for showing in the results list), but I want lucene to index it as:
this example has some weird chars 100
(you see, like a "sanitized" version of the original value) for a simplified search.
I figure this would be the job of an analyzer, but I don't want to mess with rolling my own. Ideally, the solution should remove everything that is not a letter, a number or quotes, replacing the removed chars by a white-space before indexing.
Any suggestions on how to implement that?
This is because I am indexing products for an e-commerce search, and some have realy creepy names. I think this would improve search assertiveness.
Thanks in advance.
If you don't want a custom analyzer, try storing the value as a separate non-indexed field, and use a simple regex to generate the sanitized version.
var input = "this_example-has some/weird (chars) 100%";
var output = Regex.Replace(input, #"[\W_]+", " ");
You mention that you need another Analyzer for some searching functionality. Dont forget the PerFieldAnalyzerWrapper which will allow you to use different analyzers within the same document.
public static void Main() {
var wrapper = new PerFieldAnalyzerWrapper(defaultAnalyzer: new StandardAnalyzer(Version.LUCENE_29));
wrapper.AddAnalyzer(fieldName: "id", analyzer: new KeywordAnalyzer());
IndexWriter writer = null; // TODO: Retrieve these.
Document document = null;
writer.AddDocument(document, analyzer: wrapper);
}
You are correct that this is the work of the analyzer. And I'd start by using a tool like luke to see what the standard analyzer does with your term before getting into what to use -- it tends to do a good job stripping noise characters and words.

Lucene.net Fuzzy Phrase Search

I have tried this myself for a considerable period and looked everywhere around the net - but have been unable to find ANY examples of Fuzzy Phrase searching via Lucene.NET 2.9.2. ( C# )
Is something able to advise how to do this in detail and/or provide some example code - I would seriously seriously appreciate any help as I am totally stuck ?
I assume that you have Lucene running and created a search index with some fields in it. So let's assume further that:
var fields = ... // a string[] of the field names you wish to search in
var version = Version.LUCENE_29; // your Lucene version
var queryString = "some string to search for";
Once you have all of these you can go ahead and define a search query on multiple fields like this:
var analyzer = LuceneIndexProvider.CreateAnalyzer();
var query = new MultiFieldQueryParser(version, fields, analyzer).Parse(queryString);
Maybe you already got that far and are only missing the fuzzy part. I simply add a tilde ~ to every word in the queryString to tell Lucene to do a fuzzy search for all words in the queryString:
if (fuzzy && !string.IsNullOrEmpty(queryString)) {
// first escape the queryString so that e.g. ~ will be escaped
queryString = QueryParser.Escape(queryString);
// now split, add ~ and join the queryString back together
queryString = string.Join("~ ",
queryString.Split(' ', StringSplitOptions.RemoveEmptyEntries)) + "~";
// now queryString will be "some~ string~ to~ search~ for~"
}
The key point here is that Lucene uses fuzzy search only for terms that end with a ~. That and some more helpful info was found on
http://scatteredcode.wordpress.com/2011/05/26/performing-a-fuzzy-search-with-multiple-terms-through-multiple-lucene-net-document-fields/.