handling wildcard search with Special character in Lucene.net search - special-characters

how do i perform wildcard search a word in lucene that contain special character. for example i have a word like "91-95483534" if i search like "91*" it works and if i search like "91-95483534" also works fine. but my senario is that to search "91-9548*". if i perform like this "91-9548*". i got no output. am i missing anything. my actual code is given below:
MultiFieldQueryParser queryParser = new MultiFieldQueryParser(new string[] {"column1","column2"}, new StandardAnalyzer());
queryParser.SetAllowLeadingWildcard(true);
Query query = queryParser.Parse(QueryParser.Escape(strKeyWord) + "*");

As you used StandardAnalyzer, that indexed your word as 91 and 95483534 if you used INDEX_ANALYZED when you index....
if you want to search as 91-9548* , use INDEX_NOT_ANALYZED when you index that specified field which have "91-95483534" as terms
http://lucene.apache.org/core/old_versioned_docs/versions/3_0_3/api/core/org/apache/lucene/document/Field.Index.html

Related

Possible to get words matched fuzzily by MongoDB full text search?

I'm writing a UI that presents the results of a MongoDB full text search query, visually highlighting the matched search terms in each result; this works well enough for full word or phrase matches, but not for partial/fuzzy matches.
For example, if I search for "delete" a will get a search result that contains "deletion", which does not contain the full word "delete" and therefore won't be highlighted if I merely highlight the full search term matches. I do want the partial matches, though.
Is there any way to project the set of matched words/substrings when I execute the query?
I've so far been unable to find anything in the docs that hints at this being possible, but I thought it worth asking around. Any help would be greatly appreciated.
You can use the Mongo DB Atlas feature where you can search your text based on different Analyzers that MongoDB provides. And you can then do a search like this: Without the fuzzy object, it would do a full-text-match search.
$search:{
{
index: 'analyzer_name_created_from_atlas_search',
text: {
query: 'Text to do a full match or fuzzy match with',
path: 'sentence',
fuzzy:{
maxEdits: 2 #max 2 is allowed
}
}
}
}

How to allow leading wild cards in custom smart search web part (Kentico 10)

I have a custom index for my products and I am using the Subset Analyzer. This Analyzer works great, but if you do field searches, it does not work.
For example, I have a document with the following fields:
"documentname", "My-Document-Name"
"tags", "1234,5678,9101"
"documentdescription", "This is a great Document, My-Document-Name."
When I just search "name AND tags:(1234)", I get this document in my results because it searches +_content:name.
-- However:
When I search "documentname:(name)^3.0 AND tags:(1234)", I do not get this document in my results.
Of course, when I do "documentname:(*name*)^3.0" I get a parse error saying: '*' or '?' not allowed as first character in WildcardQuery.
How can I either enable wildcard query in my custom CMS.Search webpart?
First of all you have to make sure that a field you checking is in the index with proper name. documentname might not be in the index it can be called _title, depends how you index is set up. Get lukeall and check your index (it should be in \CMS\App_Data\CMSModules\SmartSearch\YourIndexName). You can use luke to test your searches as well.
For examples there is no tags but there is documenttags field.
P.S. Wildcards are working and you are right you can't use them as a first character by default (lucene documentation says: You cannot use a * or ? symbol as the first character of a search), but there is a way to set it up in lucene.net, although i dont know if there are setting for that in Kentico. But i dont think you need wildcards, so your query should be (assuming you have documentname and documenttags in the index):
+(documentname:"My-Name" AND documenttags:"tag1")

Multiple Keyword matching problems

I have a product description of the following text:
"DODGE H4C14S03706G-2G ILH48 37.06 W/ BALDOR VEM3558T"
I attempt a search "H4C1" and Algolia produces relevant results however
if I perform a search of "H4C1 VEM35" Algolia produces no results.
Is there a way to get Algolia to produce relevant results on this search?
By default, Algolia is only doing prefix matching for the last word of the search query, which means it would only match records containing H4C1 as a full word, and VEM35 as a full word or a prefix. However H4C1 is only a prefix of H4C14S03706G-2G, not a full word.
You can change the behavior by tweaking the queryType setting (the value you want is prefixAll), you can find more info in the FAQ and the documentation

whoosh doesn't search for short words like "C#"

i am using whoosh to index over 200,000 books. but i have encountered some problems with it.
the whoosh query parser returns NullQuery for words like "C#", "C++" with meta-characters in them and also for some other short words. this words are used in the title and body of some documents so i am not using keyword type for them. i guess the problem is in the analysis or query-parsing phase of searching or indexing but i can't touch my data blindly. can anyone help me to correct this issue. Tnx.
i fixed the problem by creating a StandardAnalyzer with a regex pattern that meets my requirements,here is the regex pattern:
'\w+[#+.\w]*'
this will make tokenizing of fields to be done successfully, and also the searching goes well.
but when i use queries like "some query++*" or "some##*" the parsed query will be a single Every query, just the '*'. also i found that this is not related to my analyzer and this is the Whoosh's default behavior. so here is my new question: is this behavior correct or it is a bug??
note: removing the WildcardPlugin from the query-parser solves this problem but i also need the WildcardPlugin.
now i am using the following code:
from whoosh.util import rcompile
#for matching words like: '.NET', 'C++' and 'C#'
word_pattern = rcompile('(\.|[\w]+)(\.?\w+|#|\+\+)*')
#i don't need words shorter that two characters so i don't change the minsize default
analyzer = analysis.StandardAnalyzer(expression=word_pattern)
... now in my schema:
...
title = fields.TEXT(analyzer=analyzer),
...
this will solve my first problem, yes. but the main problem is in searching. i don't want to let users to search using the Every query or *. but when i parse queries like C++* i end up an Every(*) query. i know that there is some problem but i can't figure out what it is.
I had the same issue and found out that StandardAnalyzer() uses minsize=2 by default. So in your schema, you have to tell it otherwise.
schema = whoosh.fields.Schema(
name = whoosh.fields.TEXT(stored=True, analyzer=whoosh.analysis.StandardAnalyzer(minsize=1)),
# ...
)

Lucene.net Fuzzy Phrase Search

I have tried this myself for a considerable period and looked everywhere around the net - but have been unable to find ANY examples of Fuzzy Phrase searching via Lucene.NET 2.9.2. ( C# )
Is something able to advise how to do this in detail and/or provide some example code - I would seriously seriously appreciate any help as I am totally stuck ?
I assume that you have Lucene running and created a search index with some fields in it. So let's assume further that:
var fields = ... // a string[] of the field names you wish to search in
var version = Version.LUCENE_29; // your Lucene version
var queryString = "some string to search for";
Once you have all of these you can go ahead and define a search query on multiple fields like this:
var analyzer = LuceneIndexProvider.CreateAnalyzer();
var query = new MultiFieldQueryParser(version, fields, analyzer).Parse(queryString);
Maybe you already got that far and are only missing the fuzzy part. I simply add a tilde ~ to every word in the queryString to tell Lucene to do a fuzzy search for all words in the queryString:
if (fuzzy && !string.IsNullOrEmpty(queryString)) {
// first escape the queryString so that e.g. ~ will be escaped
queryString = QueryParser.Escape(queryString);
// now split, add ~ and join the queryString back together
queryString = string.Join("~ ",
queryString.Split(' ', StringSplitOptions.RemoveEmptyEntries)) + "~";
// now queryString will be "some~ string~ to~ search~ for~"
}
The key point here is that Lucene uses fuzzy search only for terms that end with a ~. That and some more helpful info was found on
http://scatteredcode.wordpress.com/2011/05/26/performing-a-fuzzy-search-with-multiple-terms-through-multiple-lucene-net-document-fields/.