How to search across all the fields? - lucene.net

In Lucene, we can use TermQuery to search a text with a field. I am wondering how to search a keyword across a bunch of fields or all the searchable fields?

Another approach, which doesn't require to index anything more than what you already have, nor to combine different queries, is using the MultiFieldQueryParser.
You can provide a list of fields where you want to search on and your query, that's all.
MultiFieldQueryParser queryParser = new MultiFieldQueryParser(
Version.LUCENE_41,
new String[]{"title", "content", "description"},
new StandardAnalyzer(Version.LUCENE_41));
Query query = queryParser.parse("here goes your query");
This is how I would do it with the original lucene library written in Java. I'm not sure whether the MultiFieldQueryParser is available in lucene.net too.

Two approaches
1) Index-time approach: Use a catch-all field. This is nothing but appending all the text from all the fields (total text from your input doc) and place that resulting huge text in a single field. You've to add an additional field while indexing to act as a catch-all field.
2) Search-time approach: Use a BooleanQuery to combine multiple queries, for example TermQuery instances. Those multiple queries can be formed to cover all the target fields.
Example check at the end of the article.
Use approach 2 if you know the target-field list at runtime. Otherwise, you've got to use the 1st approach.

Another easy approach to search across all fields using "MultifieldQueryParser" is use IndexReader.FieldOption.ALL in your query.
Here is example in c#.
Directory directory = FSDirectory.Open(new DirectoryInfo(HostingEnvironment.MapPath(VirtualIndexPath)));
//get analyzer
Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_29);
//get index reader and searcher
IndexReader indexReader__1 = IndexReader.Open(directory, true);
Searcher indexSearch = new IndexSearcher(indexReader__1);
//add all possible fileds in multifieldqueryparser using indexreader getFieldNames method
var queryParser = new MultiFieldQueryParser(Version.LUCENE_29, indexReader__1.GetFieldNames(IndexReader.FieldOption.ALL).ToArray(), analyzer);
var query = queryParser.Parse(Criteria);
TopDocs resultDocs = null;
//perform search
resultDocs = indexSearch.Search(query, indexReader__1.MaxDoc());
var hits = resultDocs.scoreDocs;
click here to check out my pervious answer to same quesiton in vb.net

Related

Why is EWS search filter returning fewer emails than I see from outlook?

I am using the following code to retrieve emails whose subject contains "MS" and "QW". I see more than 8 emails satisfying the search criteria, but the code returns only two emails. Can anyone help me take a look at what the problem is in here?
var filter1 = new SearchFilter.ContainsSubstring(ItemSchema.Subject, "MS", ContainmentMode.Substring, ComparisonMode.IgnoreCase);
var filter2 = new SearchFilter.ContainsSubstring(ItemSchema.Subject, "QW");
var sf = new SearchFilter.SearchFilterCollection(LogicalOperator.And, filter1, filter2);
var findResults = service.FindItems(WellKnownFolderName.Inbox, sf view);
A few things I can see if with the Second search filter you haven't specified the ContainmentMode or ComparsisonMode. You also don't seem to have added them to the SearchFilter Collection. eg you should have
sf.Add(filter1);
sf.Add(filter2);
That Search will yield pretty poor performance on a folder with a large number I would suggest you maybe look at AQS then you can do
service.FindItems(WellKnownFolderName.Inbox, "Subject:MS AND Subject:QW" ,view);
That will search against the Content Indexes and yield better performance.

Lucene get documents containing specific field name

I am using lucene in my project and I got to one issue, that I need to find documents which contain fields with specific name. I was only able to find solutions where you creating search term containing pairs name,value like this:
IndexSearcher searcher = new IndexSearcher(directoryReader);
TermQuery query = new TermQuery(new Term("name", "value"));
TopDocs topdocs = searcher.query(query, numberToReturn);
but as I stated, I need to find documents only by provided name of field and obtain access to value of field specified by name in selected documents.
Although I am working with Lucene.NET I will be thankfull for solution in any language.
Thank you in advance.
I find out this solution and little change made it:
var queryParser = new QueryParser(Version.LUCENE_30, "content", analyzer);
queryParser.AllowLeadingWildcard = true;
var query = queryParser.Parse( "*" );

Can we get match items' position at Lucence.net search result?

I'm using Lucene.net to implement fulltext search feature in an Asp.net application. The search result page should high light the match items. I got the instance of Lucene.Net.Search.Hits and used .Doc(int i) method to get Lucene Document.
But I don't know how to get the position of match item by existing method or property of some Lucene class. Does Lucene.net provide any feature to support high light query string?
You can use Highlighter or FastVectorHighlighter which can be found in contrib
As the previous answerer said, you should use either Highlighter or FastVectorHighlighter from contrib.
Here's an example of using Highlighter lib to get highlighted fragments:
Formatter formatter = new SimpleHTMLFormatter("<span><b>", "</b></span>");
Lucene.Net.Highlight.Scorer scorer = new QueryScorer(query, field);
Lucene.Net.Highlight.Encoder encoder = new SimpleHTMLEncoder();
var highlighter = new Highlighter(formatter, encoder, scorer);
highlighter.SetTextFragmenter(new SimpleFragmenter(100));
string[] fragments =
highlighter.GetBestFragments(DefaultAnalyzer, field, doc.Get(field), 3);
Some Highlighter-related gotchas:
To highlight a field, it should be added to index with Field.Store.YES option
Your query should be rewritten before passing it to highlighter
The analyzer you pass to highlighter should be the same you use for indexing and searching

Make Lucene index a value and store another

I want Lucene.NET to store a value while indexing a modified, stripped-down version of the stored value. e.g. Consider the value:
this_example-has some/weird (chars) 100%
I want it stored right like that (so that I can retrieve exactly that for showing in the results list), but I want lucene to index it as:
this example has some weird chars 100
(you see, like a "sanitized" version of the original value) for a simplified search.
I figure this would be the job of an analyzer, but I don't want to mess with rolling my own. Ideally, the solution should remove everything that is not a letter, a number or quotes, replacing the removed chars by a white-space before indexing.
Any suggestions on how to implement that?
This is because I am indexing products for an e-commerce search, and some have realy creepy names. I think this would improve search assertiveness.
Thanks in advance.
If you don't want a custom analyzer, try storing the value as a separate non-indexed field, and use a simple regex to generate the sanitized version.
var input = "this_example-has some/weird (chars) 100%";
var output = Regex.Replace(input, #"[\W_]+", " ");
You mention that you need another Analyzer for some searching functionality. Dont forget the PerFieldAnalyzerWrapper which will allow you to use different analyzers within the same document.
public static void Main() {
var wrapper = new PerFieldAnalyzerWrapper(defaultAnalyzer: new StandardAnalyzer(Version.LUCENE_29));
wrapper.AddAnalyzer(fieldName: "id", analyzer: new KeywordAnalyzer());
IndexWriter writer = null; // TODO: Retrieve these.
Document document = null;
writer.AddDocument(document, analyzer: wrapper);
}
You are correct that this is the work of the analyzer. And I'd start by using a tool like luke to see what the standard analyzer does with your term before getting into what to use -- it tends to do a good job stripping noise characters and words.

Lucene.net Fuzzy Phrase Search

I have tried this myself for a considerable period and looked everywhere around the net - but have been unable to find ANY examples of Fuzzy Phrase searching via Lucene.NET 2.9.2. ( C# )
Is something able to advise how to do this in detail and/or provide some example code - I would seriously seriously appreciate any help as I am totally stuck ?
I assume that you have Lucene running and created a search index with some fields in it. So let's assume further that:
var fields = ... // a string[] of the field names you wish to search in
var version = Version.LUCENE_29; // your Lucene version
var queryString = "some string to search for";
Once you have all of these you can go ahead and define a search query on multiple fields like this:
var analyzer = LuceneIndexProvider.CreateAnalyzer();
var query = new MultiFieldQueryParser(version, fields, analyzer).Parse(queryString);
Maybe you already got that far and are only missing the fuzzy part. I simply add a tilde ~ to every word in the queryString to tell Lucene to do a fuzzy search for all words in the queryString:
if (fuzzy && !string.IsNullOrEmpty(queryString)) {
// first escape the queryString so that e.g. ~ will be escaped
queryString = QueryParser.Escape(queryString);
// now split, add ~ and join the queryString back together
queryString = string.Join("~ ",
queryString.Split(' ', StringSplitOptions.RemoveEmptyEntries)) + "~";
// now queryString will be "some~ string~ to~ search~ for~"
}
The key point here is that Lucene uses fuzzy search only for terms that end with a ~. That and some more helpful info was found on
http://scatteredcode.wordpress.com/2011/05/26/performing-a-fuzzy-search-with-multiple-terms-through-multiple-lucene-net-document-fields/.