MongoDB - Using regex wildcards for search that properly filter results - mongodb

I have a Mongo search set up that goes through my entries based on numerous criteria.
Currently the easiest way (I know it's not performance-friendly due to using wildcards, but I can't figure out a better way to do this due to case insensitivity and users not putting in whole words) is to use regex wildcards in the search. The search ends up looking like this:
{ gender: /Womens/i, designer: /Voodoo Girl/i } // Should return ~200 results
{ gender: /Mens/i, designer: /Voodoo Girl/i } // Should return 0 results
In the example above, both searches are returning ~200 results ("Voodoo Girl" is a womenswear label and all corresponding entries have a gender: "Womens" field.). Bizarrely, when I do other searches, like:
{ designer: /Voodoo Girl/i, store: /Store XYZ/i } // should return 0 results
I get the correct number of results (0). Is this an order thing? How can I ensure that my search only returns results that match all of my wildcarded queries?
For reference, the queries are being made in nodeJS through a simple db.products.find({criteria}) lookup.

To answer the aside real fast, something like ElasticSearch is a wonderful way to get more powerful, performant searching capabilities in your app.
Now, the reason that your searches are returning results is that "mens" is a substring of "womens"! You probably want either /^Mens/i and /^Womens/i (if Mens starts the gender field), or /\bMens\b/ if it can appear in the middle of the field. The first form will only match the given field from the beginning of the string, while the second form looks for the given word surrounded by word boundaries (that is, not as a substring of another word).
If you can use the /^Mens/ form (note the lack of the /i), it's advisable, as anchored case-sensitive regex queries can use indexes, while other regex forms cannot.
$regex can only use an index efficiently when the regular expression has an anchor for the beginning (i.e. ^) of a string and is a case-sensitive match.

Related

Possible to get words matched fuzzily by MongoDB full text search?

I'm writing a UI that presents the results of a MongoDB full text search query, visually highlighting the matched search terms in each result; this works well enough for full word or phrase matches, but not for partial/fuzzy matches.
For example, if I search for "delete" a will get a search result that contains "deletion", which does not contain the full word "delete" and therefore won't be highlighted if I merely highlight the full search term matches. I do want the partial matches, though.
Is there any way to project the set of matched words/substrings when I execute the query?
I've so far been unable to find anything in the docs that hints at this being possible, but I thought it worth asking around. Any help would be greatly appreciated.
You can use the Mongo DB Atlas feature where you can search your text based on different Analyzers that MongoDB provides. And you can then do a search like this: Without the fuzzy object, it would do a full-text-match search.
$search:{
{
index: 'analyzer_name_created_from_atlas_search',
text: {
query: 'Text to do a full match or fuzzy match with',
path: 'sentence',
fuzzy:{
maxEdits: 2 #max 2 is allowed
}
}
}
}

Firestore security rules - wildcarding Collection Names?

I have a set of Collections whose names all start with ABC and I want to write a single rule that applies to all of them regardless of what follows ABC. Something like:
match /ABC*/{anyid} {
allow read, write;
}
Is this possible? In the Rules Console there are no syntax errors highlighted, but the Simulator won't allow me to access the table with:
GET /ABC123/456
Any ideas?
As far as I know it is not currently possible to match on a partial collection (or document) name. It sounds like an interesting feature request though, so I recommend filing a feature request.
In the meantime, the only thing I can think of is matching all collections, and then testing the path through resource['__name__']:
match /53829635/{document} {
match /{col}/{doc} {
allow read: if resource['__name__'][5].matches('ABC.*')
}
}
The resource['__name__'] expression returns a Path, which can be indexes as an array to get the path segments. It has a form /databases/(default)/documents/collection/document, so the subcollection is at index 5. Since that is just a string, we can use matches on it. In this case I allow reading from any subcollection whose name starts with ABC.
Update: it turns out that you can also simply access the col wildcard, instead of looking up from the path. So this would work the same:
allow read: if col.matches('ABC.*')

How to allow leading wild cards in custom smart search web part (Kentico 10)

I have a custom index for my products and I am using the Subset Analyzer. This Analyzer works great, but if you do field searches, it does not work.
For example, I have a document with the following fields:
"documentname", "My-Document-Name"
"tags", "1234,5678,9101"
"documentdescription", "This is a great Document, My-Document-Name."
When I just search "name AND tags:(1234)", I get this document in my results because it searches +_content:name.
-- However:
When I search "documentname:(name)^3.0 AND tags:(1234)", I do not get this document in my results.
Of course, when I do "documentname:(*name*)^3.0" I get a parse error saying: '*' or '?' not allowed as first character in WildcardQuery.
How can I either enable wildcard query in my custom CMS.Search webpart?
First of all you have to make sure that a field you checking is in the index with proper name. documentname might not be in the index it can be called _title, depends how you index is set up. Get lukeall and check your index (it should be in \CMS\App_Data\CMSModules\SmartSearch\YourIndexName). You can use luke to test your searches as well.
For examples there is no tags but there is documenttags field.
P.S. Wildcards are working and you are right you can't use them as a first character by default (lucene documentation says: You cannot use a * or ? symbol as the first character of a search), but there is a way to set it up in lucene.net, although i dont know if there are setting for that in Kentico. But i dont think you need wildcards, so your query should be (assuming you have documentname and documenttags in the index):
+(documentname:"My-Name" AND documenttags:"tag1")

how to find partial search in Mongodb?

How to find partial search?
Now Im trying to find
db.content.find({$text: {$search: "Customer london"}})
It finds all records matching customer, and all records matching london.
If I am searching for a part of a word for example lond or custom
db.content.find({$text: {$search: "lond"}})
It returns an empty result. How can I modify the query to get the same result like when I am searching for london?
You can use regex to get around with it (https://docs.mongodb.com/manual/reference/operator/query/regex/). However, it will work for following :
if you have word Cooking, following queries may give you result
cooking(exact matching)
coo(part of the word)
cooked(The word containing the english root of the document word, where cook is the root word from which cooking or cooked are derived)
If you would like to go one step further and get a result document containing cooking when you type vooking (missplled V instead of C), go for elasticsearch.
Elasticsearch is easy to setup, has extremely powerful edge-ngram analyzer which converts each words into smaller weightage words. Hence when you misspell, you will still get a document based on score elasticsearch gives to that document.
You can read about it here : https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-edgengram-tokenizer.html
it will always return the empty array for partial words like when you are searching for lond to get this type of text london..
Because it take full words and search same as that they are ..
Not achive same results like :-
LO LON LOND LONDO LONDON
Here you may get help from ELASTIC-SEARCH . It is quite good for full text search when implement with mongoDB.
Refrence : ElasticSearch
Thanks
The find all is to an Array
clientDB.collection('details').find({}).toArray().then((docs) =>
I now used the str.StartWith in a for loop to pick out my record.
if (docs[i].name.startsWith('U', 0)) {
return console.log(docs[i].name);
} else {
console.log('Record not found!!!')
};
This may not be efficient, but it works for now

whoosh doesn't search for short words like "C#"

i am using whoosh to index over 200,000 books. but i have encountered some problems with it.
the whoosh query parser returns NullQuery for words like "C#", "C++" with meta-characters in them and also for some other short words. this words are used in the title and body of some documents so i am not using keyword type for them. i guess the problem is in the analysis or query-parsing phase of searching or indexing but i can't touch my data blindly. can anyone help me to correct this issue. Tnx.
i fixed the problem by creating a StandardAnalyzer with a regex pattern that meets my requirements,here is the regex pattern:
'\w+[#+.\w]*'
this will make tokenizing of fields to be done successfully, and also the searching goes well.
but when i use queries like "some query++*" or "some##*" the parsed query will be a single Every query, just the '*'. also i found that this is not related to my analyzer and this is the Whoosh's default behavior. so here is my new question: is this behavior correct or it is a bug??
note: removing the WildcardPlugin from the query-parser solves this problem but i also need the WildcardPlugin.
now i am using the following code:
from whoosh.util import rcompile
#for matching words like: '.NET', 'C++' and 'C#'
word_pattern = rcompile('(\.|[\w]+)(\.?\w+|#|\+\+)*')
#i don't need words shorter that two characters so i don't change the minsize default
analyzer = analysis.StandardAnalyzer(expression=word_pattern)
... now in my schema:
...
title = fields.TEXT(analyzer=analyzer),
...
this will solve my first problem, yes. but the main problem is in searching. i don't want to let users to search using the Every query or *. but when i parse queries like C++* i end up an Every(*) query. i know that there is some problem but i can't figure out what it is.
I had the same issue and found out that StandardAnalyzer() uses minsize=2 by default. So in your schema, you have to tell it otherwise.
schema = whoosh.fields.Schema(
name = whoosh.fields.TEXT(stored=True, analyzer=whoosh.analysis.StandardAnalyzer(minsize=1)),
# ...
)