How to allow leading wild cards in custom smart search web part (Kentico 10) - lucene.net

I have a custom index for my products and I am using the Subset Analyzer. This Analyzer works great, but if you do field searches, it does not work.
For example, I have a document with the following fields:
"documentname", "My-Document-Name"
"tags", "1234,5678,9101"
"documentdescription", "This is a great Document, My-Document-Name."
When I just search "name AND tags:(1234)", I get this document in my results because it searches +_content:name.
-- However:
When I search "documentname:(name)^3.0 AND tags:(1234)", I do not get this document in my results.
Of course, when I do "documentname:(*name*)^3.0" I get a parse error saying: '*' or '?' not allowed as first character in WildcardQuery.
How can I either enable wildcard query in my custom CMS.Search webpart?

First of all you have to make sure that a field you checking is in the index with proper name. documentname might not be in the index it can be called _title, depends how you index is set up. Get lukeall and check your index (it should be in \CMS\App_Data\CMSModules\SmartSearch\YourIndexName). You can use luke to test your searches as well.
For examples there is no tags but there is documenttags field.
P.S. Wildcards are working and you are right you can't use them as a first character by default (lucene documentation says: You cannot use a * or ? symbol as the first character of a search), but there is a way to set it up in lucene.net, although i dont know if there are setting for that in Kentico. But i dont think you need wildcards, so your query should be (assuming you have documentname and documenttags in the index):
+(documentname:"My-Name" AND documenttags:"tag1")

Related

Possible to get words matched fuzzily by MongoDB full text search?

I'm writing a UI that presents the results of a MongoDB full text search query, visually highlighting the matched search terms in each result; this works well enough for full word or phrase matches, but not for partial/fuzzy matches.
For example, if I search for "delete" a will get a search result that contains "deletion", which does not contain the full word "delete" and therefore won't be highlighted if I merely highlight the full search term matches. I do want the partial matches, though.
Is there any way to project the set of matched words/substrings when I execute the query?
I've so far been unable to find anything in the docs that hints at this being possible, but I thought it worth asking around. Any help would be greatly appreciated.
You can use the Mongo DB Atlas feature where you can search your text based on different Analyzers that MongoDB provides. And you can then do a search like this: Without the fuzzy object, it would do a full-text-match search.
$search:{
{
index: 'analyzer_name_created_from_atlas_search',
text: {
query: 'Text to do a full match or fuzzy match with',
path: 'sentence',
fuzzy:{
maxEdits: 2 #max 2 is allowed
}
}
}
}

How can you filter search by matching String to a field in Algolia?

I'm trying to create filters for a search on an Android app where a specific field in Algolia must exactly match the given String in order to come up as a hit. For example if Algolia has a field like "foo" and I only want to return hits where "foo" is equal to "bar", then I would expect that I would have to use a line of code like this:
query.setFilters("foo: \"bar\"");
Any guesses as to why this isn't working like I see in the examples or how to do so?
Ah, I thought that attributesForFaceting was done by setting what was searchable or not. It was on a different page within the dashboard than I was previously using. Thanks #pixelastic.

Multiple Keyword matching problems

I have a product description of the following text:
"DODGE H4C14S03706G-2G ILH48 37.06 W/ BALDOR VEM3558T"
I attempt a search "H4C1" and Algolia produces relevant results however
if I perform a search of "H4C1 VEM35" Algolia produces no results.
Is there a way to get Algolia to produce relevant results on this search?
By default, Algolia is only doing prefix matching for the last word of the search query, which means it would only match records containing H4C1 as a full word, and VEM35 as a full word or a prefix. However H4C1 is only a prefix of H4C14S03706G-2G, not a full word.
You can change the behavior by tweaking the queryType setting (the value you want is prefixAll), you can find more info in the FAQ and the documentation

whoosh doesn't search for short words like "C#"

i am using whoosh to index over 200,000 books. but i have encountered some problems with it.
the whoosh query parser returns NullQuery for words like "C#", "C++" with meta-characters in them and also for some other short words. this words are used in the title and body of some documents so i am not using keyword type for them. i guess the problem is in the analysis or query-parsing phase of searching or indexing but i can't touch my data blindly. can anyone help me to correct this issue. Tnx.
i fixed the problem by creating a StandardAnalyzer with a regex pattern that meets my requirements,here is the regex pattern:
'\w+[#+.\w]*'
this will make tokenizing of fields to be done successfully, and also the searching goes well.
but when i use queries like "some query++*" or "some##*" the parsed query will be a single Every query, just the '*'. also i found that this is not related to my analyzer and this is the Whoosh's default behavior. so here is my new question: is this behavior correct or it is a bug??
note: removing the WildcardPlugin from the query-parser solves this problem but i also need the WildcardPlugin.
now i am using the following code:
from whoosh.util import rcompile
#for matching words like: '.NET', 'C++' and 'C#'
word_pattern = rcompile('(\.|[\w]+)(\.?\w+|#|\+\+)*')
#i don't need words shorter that two characters so i don't change the minsize default
analyzer = analysis.StandardAnalyzer(expression=word_pattern)
... now in my schema:
...
title = fields.TEXT(analyzer=analyzer),
...
this will solve my first problem, yes. but the main problem is in searching. i don't want to let users to search using the Every query or *. but when i parse queries like C++* i end up an Every(*) query. i know that there is some problem but i can't figure out what it is.
I had the same issue and found out that StandardAnalyzer() uses minsize=2 by default. So in your schema, you have to tell it otherwise.
schema = whoosh.fields.Schema(
name = whoosh.fields.TEXT(stored=True, analyzer=whoosh.analysis.StandardAnalyzer(minsize=1)),
# ...
)

Mongoid, find object by searching by part of the Id?

I want to be able to search for my objects by searching for the last 4 characters of the id. How can I do that?
Book.where(_id: params[:q])
Where the param would be something like a3f4, and in this case the actual id for the object that I want to be found would be:
bc313c1f5053b66121a8a3f4
Notice the last for characters are what we searched for. How can I search for just "part" of my objects id? instead of having my user search manually by typing in the entire id?
I found in MongoDB's help docs, that I can provide a regex:
db.x.find({someId : {$regex : "123\\[456\\]"}}) // use "\\" to escape
Is there a way for me to search using the regular mongo ruby driver and not using Mongoid?
Usually, in Mongoid you can search with a regexp like you normally would with a string in your call to where() ie:
Book.where(:title => /^Alice/) # returns all books with titles starting with 'Alice'
However this doesn't work in your case, because the _id field is not stored as a string, but as an ObjectID. However, you could add (and index) a field on your models which could provide this functionality for you, which you can populate in an after_create callback.
<shameless_plug>
Alternatively, if you're just looking for a shorter solution to the default Mongoid IDs, I could suggest something like mongoid_token which makes it pretty easy to add shorter tokens/ids to your Mongoid documents.
</shameless_plug>