mongodb evaluate if string ends with another substring - mongodb

I would like to project a field with a value if another field ends with a substring but another value if it's not
How can I do that?
Example (I omit what's not important):
Doc 1:
{
'Field1': 'A perfect normal string'
}
Doc N:
{
'Field1': 'This one ends with my substring'
}
The ideal will be:
$project: {
'HasSubstring': {$cond: [{$regex: 'substring$'}, true, false]}
}
But this doesn't work because we can't (????) use $regex inside a $cond
Anyone could point me, please?
Thanks a lot
PS: I can't use the regex filter in the match because I need both docs for groupping them

How about:
Store 'Field1Reversed' field , with characters reversed, then use:
$project: {
'HasSubstring': {$cond:[{$eq:[ 'gnirtsbus', {$substr:["$Field1Reversed",0,9]}] } ,true,false]}
}
There are unicode implications to using substr, but these can be resolved in your app if necessary (measure length of comparand to get the proper substr # of bytes)

Related

How can I use an aggregation pipeline to see which documents have a field with a string that starts with any of the strings in a list?

I am using mongo server version 3.4, so my question pertains to the functionality of that version. I cannot upgrade anytime soon, so please keep that in mind. If have a field in some documents in a MongoDB collection that may contain a string but also have trailing characters, how might I find them when submitting multiple "startsWith" strings to be evaluated in the same query? I may have some difficulty explaining this, so let me show some examples. Let's say that I have a field called "description" in all of my documents. This description might be encoded so that the text is not completely straightforward. Some values might be:
green:A-4_ABC
yellow:C-12_456
red:A-431_ZXCVQ
yellow_green:C-12_999
brown:B-3_R
gray:EN-44_195
EDIT: I think I made a mistake with using words in my keys. The keys are a randomized string of numbers, letters, and underscores, followed by a colon, then one to three letters, followed by a dash, then a couple of numbers, then an underscore, and lastly followed by several alphanumeric characters:
LKEF543SLI54EH2J897FQ_HF234EWOH:ZX-82_FR2
I realize that this sounds arbitrary and stupid, but it is an encoding of information that is intended to result in a unique key. It is in data that I receive, so I cannot change it, unfortunately.
Now, I want to find all of the documents with descriptions that start with any of the following values, and all of these values must be submitted in the same query. I might have hundreds of submitted values, and I need to get all matching documents at once. Here is a short list of what might be submitted in a single query:
green:A-4
red:A-431
gray:EN-44
yellow_green:C-12
Note that it was not accidental that the text is everything prior to the last underscore. And, as with one of the examples, there might be more than one underscore. With my use case, I cannot create a query that hard-codes these strings in the javascript regex format. And the $in filter does not work with "startsWith" functionality, particularly when you pass in a list of strings (though I am familiar with supplying a list of hard-coded javascript regexes). Is there any way to use the $in operator where I can take a list of strings that are passed in from the user who wants to run a query like this? Or is there something equivalent? The cherry on the top of all of this would be to find a way to project the matching document with the string that it matched (either from the query, or by some substring magic that I cannot seem to figure out).
EDIT: Specifically, when I find each document, I want to be able to project everything from they key up until the LAST underscore, like:
LKEF543SLI54EH2J897FQ_HF234EWOH:ZX-82
(along with its value)
Thanks in advance for any nudges in the right direction.
We use $objectToArray to get {k:field_name, v:field_value} array. Then we split by _ token all values and convert to object with $arrayToObject operator.
Next step we apply $match operator to filter documents and exclude data with $unset.
Note: If your document contains array or subdocuments, we may use $filter before we convert $objectToArray.
db.collection.aggregate([
{
$addFields: {
data: {
$arrayToObject: {
$map: {
input: {
$objectToArray: "$$ROOT"
},
in: {
k: "$$this.k",
v: {
$arrayElemAt: [
{
$split: [
{
$toString: "$$this.v"
},
"_"
]
},
0
]
}
}
}
}
}
}
},
{
$match: {
"data.green": "A-4",
"data.red": "A-431",
"data.gray": "EN-44",
"data.yellow_green": "C-12"
}
},
{
$unset: "data"
}
])
MongoPlayground

MongoDB how to use $toUpper with $match [duplicate]

Example:
> db.stuff.save({"foo":"bar"});
> db.stuff.find({"foo":"bar"}).count();
1
> db.stuff.find({"foo":"BAR"}).count();
0
You could use a regex.
In your example that would be:
db.stuff.find( { foo: /^bar$/i } );
I must say, though, maybe you could just downcase (or upcase) the value on the way in rather than incurring the extra cost every time you find it. Obviously this wont work for people's names and such, but maybe use-cases like tags.
UPDATE:
The original answer is now obsolete. Mongodb now supports advanced full text searching, with many features.
ORIGINAL ANSWER:
It should be noted that searching with regex's case insensitive /i means that mongodb cannot search by index, so queries against large datasets can take a long time.
Even with small datasets, it's not very efficient. You take a far bigger cpu hit than your query warrants, which could become an issue if you are trying to achieve scale.
As an alternative, you can store an uppercase copy and search against that. For instance, I have a User table that has a username which is mixed case, but the id is an uppercase copy of the username. This ensures case-sensitive duplication is impossible (having both "Foo" and "foo" will not be allowed), and I can search by id = username.toUpperCase() to get a case-insensitive search for username.
If your field is large, such as a message body, duplicating data is probably not a good option. I believe using an extraneous indexer like Apache Lucene is the best option in that case.
Starting with MongoDB 3.4, the recommended way to perform fast case-insensitive searches is to use a Case Insensitive Index.
I personally emailed one of the founders to please get this working, and he made it happen! It was an issue on JIRA since 2009, and many have requested the feature. Here's how it works:
A case-insensitive index is made by specifying a collation with a strength of either 1 or 2. You can create a case-insensitive index like this:
db.cities.createIndex(
{ city: 1 },
{
collation: {
locale: 'en',
strength: 2
}
}
);
You can also specify a default collation per collection when you create them:
db.createCollection('cities', { collation: { locale: 'en', strength: 2 } } );
In either case, in order to use the case-insensitive index, you need to specify the same collation in the find operation that was used when creating the index or the collection:
db.cities.find(
{ city: 'new york' }
).collation(
{ locale: 'en', strength: 2 }
);
This will return "New York", "new york", "New york" etc.
Other notes
The answers suggesting to use full-text search are wrong in this case (and potentially dangerous). The question was about making a case-insensitive query, e.g. username: 'bill' matching BILL or Bill, not a full-text search query, which would also match stemmed words of bill, such as Bills, billed etc.
The answers suggesting to use regular expressions are slow, because even with indexes, the documentation states:
"Case insensitive regular expression queries generally cannot use indexes effectively. The $regex implementation is not collation-aware and is unable to utilize case-insensitive indexes."
$regex answers also run the risk of user input injection.
If you need to create the regexp from a variable, this is a much better way to do it: https://stackoverflow.com/a/10728069/309514
You can then do something like:
var string = "SomeStringToFind";
var regex = new RegExp(["^", string, "$"].join(""), "i");
// Creates a regex of: /^SomeStringToFind$/i
db.stuff.find( { foo: regex } );
This has the benefit be being more programmatic or you can get a performance boost by compiling it ahead of time if you're reusing it a lot.
Keep in mind that the previous example:
db.stuff.find( { foo: /bar/i } );
will cause every entries containing bar to match the query ( bar1, barxyz, openbar ), it could be very dangerous for a username search on a auth function ...
You may need to make it match only the search term by using the appropriate regexp syntax as:
db.stuff.find( { foo: /^bar$/i } );
See http://www.regular-expressions.info/ for syntax help on regular expressions
db.company_profile.find({ "companyName" : { "$regex" : "Nilesh" , "$options" : "i"}});
db.zipcodes.find({city : "NEW YORK"}); // Case-sensitive
db.zipcodes.find({city : /NEW york/i}); // Note the 'i' flag for case-insensitivity
TL;DR
Correct way to do this in mongo
Do not Use RegExp
Go natural And use mongodb's inbuilt indexing , search
Step 1 :
db.articles.insert(
[
{ _id: 1, subject: "coffee", author: "xyz", views: 50 },
{ _id: 2, subject: "Coffee Shopping", author: "efg", views: 5 },
{ _id: 3, subject: "Baking a cake", author: "abc", views: 90 },
{ _id: 4, subject: "baking", author: "xyz", views: 100 },
{ _id: 5, subject: "Café Con Leche", author: "abc", views: 200 },
{ _id: 6, subject: "Сырники", author: "jkl", views: 80 },
{ _id: 7, subject: "coffee and cream", author: "efg", views: 10 },
{ _id: 8, subject: "Cafe con Leche", author: "xyz", views: 10 }
]
)
Step 2 :
Need to create index on whichever TEXT field you want to search , without indexing query will be extremely slow
db.articles.createIndex( { subject: "text" } )
step 3 :
db.articles.find( { $text: { $search: "coffee",$caseSensitive :true } } ) //FOR SENSITIVITY
db.articles.find( { $text: { $search: "coffee",$caseSensitive :false } } ) //FOR INSENSITIVITY
One very important thing to keep in mind when using a Regex based query - When you are doing this for a login system, escape every single character you are searching for, and don't forget the ^ and $ operators. Lodash has a nice function for this, should you be using it already:
db.stuff.find({$regex: new RegExp(_.escapeRegExp(bar), $options: 'i'})
Why? Imagine a user entering .* as his username. That would match all usernames, enabling a login by just guessing any user's password.
Suppose you want to search "column" in "Table" and you want case insensitive search. The best and efficient way is:
//create empty JSON Object
mycolumn = {};
//check if column has valid value
if(column) {
mycolumn.column = {$regex: new RegExp(column), $options: "i"};
}
Table.find(mycolumn);
It just adds your search value as RegEx and searches in with insensitive criteria set with "i" as option.
Mongo (current version 2.0.0) doesn't allow case-insensitive searches against indexed fields - see their documentation. For non-indexed fields, the regexes listed in the other answers should be fine.
For searching a variable and escaping it:
const escapeStringRegexp = require('escape-string-regexp')
const name = 'foo'
db.stuff.find({name: new RegExp('^' + escapeStringRegexp(name) + '$', 'i')})
Escaping the variable protects the query against attacks with '.*' or other regex.
escape-string-regexp
The best method is in your language of choice, when creating a model wrapper for your objects, have your save() method iterate through a set of fields that you will be searching on that are also indexed; those set of fields should have lowercase counterparts that are then used for searching.
Every time the object is saved again, the lowercase properties are then checked and updated with any changes to the main properties. This will make it so you can search efficiently, but hide the extra work needed to update the lc fields each time.
The lower case fields could be a key:value object store or just the field name with a prefixed lc_. I use the second one to simplify querying (deep object querying can be confusing at times).
Note: you want to index the lc_ fields, not the main fields they are based off of.
Using Mongoose this worked for me:
var find = function(username, next){
User.find({'username': {$regex: new RegExp('^' + username, 'i')}}, function(err, res){
if(err) throw err;
next(null, res);
});
}
If you're using MongoDB Compass:
Go to the collection, in the filter type -> {Fieldname: /string/i}
For Node.js using Mongoose:
Model.find({FieldName: {$regex: "stringToSearch", $options: "i"}})
The aggregation framework was introduced in mongodb 2.2 . You can use the string operator "$strcasecmp" to make a case-insensitive comparison between strings. It's more recommended and easier than using regex.
Here's the official document on the aggregation command operator: https://docs.mongodb.com/manual/reference/operator/aggregation/strcasecmp/#exp._S_strcasecmp .
You can use Case Insensitive Indexes:
The following example creates a collection with no default collation, then adds an index on the name field with a case insensitive collation. International Components for Unicode
/* strength: CollationStrength.Secondary
* Secondary level of comparison. Collation performs comparisons up to secondary * differences, such as diacritics. That is, collation performs comparisons of
* base characters (primary differences) and diacritics (secondary differences). * Differences between base characters takes precedence over secondary
* differences.
*/
db.users.createIndex( { name: 1 }, collation: { locale: 'tr', strength: 2 } } )
To use the index, queries must specify the same collation.
db.users.insert( [ { name: "Oğuz" },
{ name: "oğuz" },
{ name: "OĞUZ" } ] )
// does not use index, finds one result
db.users.find( { name: "oğuz" } )
// uses the index, finds three results
db.users.find( { name: "oğuz" } ).collation( { locale: 'tr', strength: 2 } )
// does not use the index, finds three results (different strength)
db.users.find( { name: "oğuz" } ).collation( { locale: 'tr', strength: 1 } )
or you can create a collection with default collation:
db.createCollection("users", { collation: { locale: 'tr', strength: 2 } } )
db.users.createIndex( { name : 1 } ) // inherits the default collation
I'm surprised nobody has warned about the risk of regex injection by using /^bar$/i if bar is a password or an account id search. (I.e. bar => .*#myhackeddomain.com e.g., so here comes my bet: use \Q \E regex special chars! provided in PERL
db.stuff.find( { foo: /^\Qbar\E$/i } );
You should escape bar variable \ chars with \\ to avoid \E exploit again when e.g. bar = '\E.*#myhackeddomain.com\Q'
Another option is to use a regex escape char strategy like the one described here Javascript equivalent of Perl's \Q ... \E or quotemeta()
Use RegExp,
In case if any other options do not work for you, RegExp is a good option. It makes the string case insensitive.
var username = new RegExp("^" + "John" + "$", "i");;
use username in queries, and then its done.
I hope it will work for you too. All the Best.
If there are some special characters in the query, regex simple will not work. You will need to escape those special characters.
The following helper function can help without installing any third-party library:
const escapeSpecialChars = (str) => {
return str.replace(/[-[\]{}()*+?.,\\^$|#\s]/g, "\\$&");
}
And your query will be like this:
db.collection.find({ field: { $regex: escapeSpecialChars(query), $options: "i" }})
Hope it will help!
Using a filter works for me in C#.
string s = "searchTerm";
var filter = Builders<Model>.Filter.Where(p => p.Title.ToLower().Contains(s.ToLower()));
var listSorted = collection.Find(filter).ToList();
var list = collection.Find(filter).ToList();
It may even use the index because I believe the methods are called after the return happens but I haven't tested this out yet.
This also avoids a problem of
var filter = Builders<Model>.Filter.Eq(p => p.Title.ToLower(), s.ToLower());
that mongodb will think p.Title.ToLower() is a property and won't map properly.
I had faced a similar issue and this is what worked for me:
const flavorExists = await Flavors.findOne({
'flavor.name': { $regex: flavorName, $options: 'i' },
});
Yes it is possible
You can use the $expr like that:
$expr: {
$eq: [
{ $toLower: '$STRUNG_KEY' },
{ $toLower: 'VALUE' }
]
}
Please do not use the regex because it may make a lot of problems especially if you use a string coming from the end user.
I've created a simple Func for the case insensitive regex, which I use in my filter.
private Func<string, BsonRegularExpression> CaseInsensitiveCompare = (field) =>
BsonRegularExpression.Create(new Regex(field, RegexOptions.IgnoreCase));
Then you simply filter on a field as follows.
db.stuff.find({"foo": CaseInsensitiveCompare("bar")}).count();
These have been tested for string searches
{'_id': /.*CM.*/} ||find _id where _id contains ->CM
{'_id': /^CM/} ||find _id where _id starts ->CM
{'_id': /CM$/} ||find _id where _id ends ->CM
{'_id': /.*UcM075237.*/i} ||find _id where _id contains ->UcM075237, ignore upper/lower case
{'_id': /^UcM075237/i} ||find _id where _id starts ->UcM075237, ignore upper/lower case
{'_id': /UcM075237$/i} ||find _id where _id ends ->UcM075237, ignore upper/lower case
For any one using Golang and wishes to have case sensitive full text search with mongodb and the mgo godoc globalsign library.
collation := &mgo.Collation{
Locale: "en",
Strength: 2,
}
err := collection.Find(query).Collation(collation)
As you can see in mongo docs - since version 3.2 $text index is case-insensitive by default: https://docs.mongodb.com/manual/core/index-text/#text-index-case-insensitivity
Create a text index and use $text operator in your query.

sort by string length in Mongodb/pymongo

I was wondering if anyone knows how to sort a mongodb find() result by string length.
I have tried something like db.foo.find().sort({item.lenght:-1}) but obviously doesn't work. Can somebody help me and also suggest me a way to do the same thing but in pymongo?
There are lot of things ( and basic API ) I would personally love to see in the aggregation framework such as:
Math functions
log (as in logarithm)
ceil
floor
Array
sum
String
length
Just to name a few.
And that is without resorting to obscure usages of the $mod operator or other means in such cases as "ceil" and "floor". But I digress.
Your "string length" falls into this category. Raise a JIRA issue about it. But for now you you can use mapReduce and the existing JavaScript functionality:
db.collection.mapReduce(
function() {
emit( this.item.length, this.item );
},
function(key,values) {
return values;
},
{ "out": { "inline": 1 } }
)
So while that does actually have the "mapReduce" funky style of returning a re-shaped document and with of course everything matching the same length in an array, what it does do is take advantage of the nature of "mapReduce" ( not just restricted to MongoDB ) and allows the emitted "key" value to be sorted in the response.
There is now a solution for this in MongoDB v3.4+ using the aggregation framework using $strLenBytes. Given the following document:
{_id: 0, name: "Bob"}
We can use
db.mycollection.aggregate([{
$project: {
byteLength: {$strLenBytes: "$name"}
}
}])
Which will return 3 for the number of bytes.
No, actually is not possible.
I was dealing with a similar problem, what I did was to store the string length of every object as a property of the object itself. This bypassed the problem.
If you think that shall be implemented (I do) I recomend you to upvote the issue in JIRA, which, for some reason have not so many votes:
https://jira.mongodb.org/browse/SERVER-5319

Case Insensitive Search in MongoDB - no Regex and no toLower(), toUpper() [duplicate]

Example:
> db.stuff.save({"foo":"bar"});
> db.stuff.find({"foo":"bar"}).count();
1
> db.stuff.find({"foo":"BAR"}).count();
0
You could use a regex.
In your example that would be:
db.stuff.find( { foo: /^bar$/i } );
I must say, though, maybe you could just downcase (or upcase) the value on the way in rather than incurring the extra cost every time you find it. Obviously this wont work for people's names and such, but maybe use-cases like tags.
UPDATE:
The original answer is now obsolete. Mongodb now supports advanced full text searching, with many features.
ORIGINAL ANSWER:
It should be noted that searching with regex's case insensitive /i means that mongodb cannot search by index, so queries against large datasets can take a long time.
Even with small datasets, it's not very efficient. You take a far bigger cpu hit than your query warrants, which could become an issue if you are trying to achieve scale.
As an alternative, you can store an uppercase copy and search against that. For instance, I have a User table that has a username which is mixed case, but the id is an uppercase copy of the username. This ensures case-sensitive duplication is impossible (having both "Foo" and "foo" will not be allowed), and I can search by id = username.toUpperCase() to get a case-insensitive search for username.
If your field is large, such as a message body, duplicating data is probably not a good option. I believe using an extraneous indexer like Apache Lucene is the best option in that case.
Starting with MongoDB 3.4, the recommended way to perform fast case-insensitive searches is to use a Case Insensitive Index.
I personally emailed one of the founders to please get this working, and he made it happen! It was an issue on JIRA since 2009, and many have requested the feature. Here's how it works:
A case-insensitive index is made by specifying a collation with a strength of either 1 or 2. You can create a case-insensitive index like this:
db.cities.createIndex(
{ city: 1 },
{
collation: {
locale: 'en',
strength: 2
}
}
);
You can also specify a default collation per collection when you create them:
db.createCollection('cities', { collation: { locale: 'en', strength: 2 } } );
In either case, in order to use the case-insensitive index, you need to specify the same collation in the find operation that was used when creating the index or the collection:
db.cities.find(
{ city: 'new york' }
).collation(
{ locale: 'en', strength: 2 }
);
This will return "New York", "new york", "New york" etc.
Other notes
The answers suggesting to use full-text search are wrong in this case (and potentially dangerous). The question was about making a case-insensitive query, e.g. username: 'bill' matching BILL or Bill, not a full-text search query, which would also match stemmed words of bill, such as Bills, billed etc.
The answers suggesting to use regular expressions are slow, because even with indexes, the documentation states:
"Case insensitive regular expression queries generally cannot use indexes effectively. The $regex implementation is not collation-aware and is unable to utilize case-insensitive indexes."
$regex answers also run the risk of user input injection.
If you need to create the regexp from a variable, this is a much better way to do it: https://stackoverflow.com/a/10728069/309514
You can then do something like:
var string = "SomeStringToFind";
var regex = new RegExp(["^", string, "$"].join(""), "i");
// Creates a regex of: /^SomeStringToFind$/i
db.stuff.find( { foo: regex } );
This has the benefit be being more programmatic or you can get a performance boost by compiling it ahead of time if you're reusing it a lot.
Keep in mind that the previous example:
db.stuff.find( { foo: /bar/i } );
will cause every entries containing bar to match the query ( bar1, barxyz, openbar ), it could be very dangerous for a username search on a auth function ...
You may need to make it match only the search term by using the appropriate regexp syntax as:
db.stuff.find( { foo: /^bar$/i } );
See http://www.regular-expressions.info/ for syntax help on regular expressions
db.company_profile.find({ "companyName" : { "$regex" : "Nilesh" , "$options" : "i"}});
db.zipcodes.find({city : "NEW YORK"}); // Case-sensitive
db.zipcodes.find({city : /NEW york/i}); // Note the 'i' flag for case-insensitivity
TL;DR
Correct way to do this in mongo
Do not Use RegExp
Go natural And use mongodb's inbuilt indexing , search
Step 1 :
db.articles.insert(
[
{ _id: 1, subject: "coffee", author: "xyz", views: 50 },
{ _id: 2, subject: "Coffee Shopping", author: "efg", views: 5 },
{ _id: 3, subject: "Baking a cake", author: "abc", views: 90 },
{ _id: 4, subject: "baking", author: "xyz", views: 100 },
{ _id: 5, subject: "Café Con Leche", author: "abc", views: 200 },
{ _id: 6, subject: "Сырники", author: "jkl", views: 80 },
{ _id: 7, subject: "coffee and cream", author: "efg", views: 10 },
{ _id: 8, subject: "Cafe con Leche", author: "xyz", views: 10 }
]
)
Step 2 :
Need to create index on whichever TEXT field you want to search , without indexing query will be extremely slow
db.articles.createIndex( { subject: "text" } )
step 3 :
db.articles.find( { $text: { $search: "coffee",$caseSensitive :true } } ) //FOR SENSITIVITY
db.articles.find( { $text: { $search: "coffee",$caseSensitive :false } } ) //FOR INSENSITIVITY
One very important thing to keep in mind when using a Regex based query - When you are doing this for a login system, escape every single character you are searching for, and don't forget the ^ and $ operators. Lodash has a nice function for this, should you be using it already:
db.stuff.find({$regex: new RegExp(_.escapeRegExp(bar), $options: 'i'})
Why? Imagine a user entering .* as his username. That would match all usernames, enabling a login by just guessing any user's password.
Suppose you want to search "column" in "Table" and you want case insensitive search. The best and efficient way is:
//create empty JSON Object
mycolumn = {};
//check if column has valid value
if(column) {
mycolumn.column = {$regex: new RegExp(column), $options: "i"};
}
Table.find(mycolumn);
It just adds your search value as RegEx and searches in with insensitive criteria set with "i" as option.
Mongo (current version 2.0.0) doesn't allow case-insensitive searches against indexed fields - see their documentation. For non-indexed fields, the regexes listed in the other answers should be fine.
For searching a variable and escaping it:
const escapeStringRegexp = require('escape-string-regexp')
const name = 'foo'
db.stuff.find({name: new RegExp('^' + escapeStringRegexp(name) + '$', 'i')})
Escaping the variable protects the query against attacks with '.*' or other regex.
escape-string-regexp
The best method is in your language of choice, when creating a model wrapper for your objects, have your save() method iterate through a set of fields that you will be searching on that are also indexed; those set of fields should have lowercase counterparts that are then used for searching.
Every time the object is saved again, the lowercase properties are then checked and updated with any changes to the main properties. This will make it so you can search efficiently, but hide the extra work needed to update the lc fields each time.
The lower case fields could be a key:value object store or just the field name with a prefixed lc_. I use the second one to simplify querying (deep object querying can be confusing at times).
Note: you want to index the lc_ fields, not the main fields they are based off of.
Using Mongoose this worked for me:
var find = function(username, next){
User.find({'username': {$regex: new RegExp('^' + username, 'i')}}, function(err, res){
if(err) throw err;
next(null, res);
});
}
If you're using MongoDB Compass:
Go to the collection, in the filter type -> {Fieldname: /string/i}
For Node.js using Mongoose:
Model.find({FieldName: {$regex: "stringToSearch", $options: "i"}})
The aggregation framework was introduced in mongodb 2.2 . You can use the string operator "$strcasecmp" to make a case-insensitive comparison between strings. It's more recommended and easier than using regex.
Here's the official document on the aggregation command operator: https://docs.mongodb.com/manual/reference/operator/aggregation/strcasecmp/#exp._S_strcasecmp .
You can use Case Insensitive Indexes:
The following example creates a collection with no default collation, then adds an index on the name field with a case insensitive collation. International Components for Unicode
/* strength: CollationStrength.Secondary
* Secondary level of comparison. Collation performs comparisons up to secondary * differences, such as diacritics. That is, collation performs comparisons of
* base characters (primary differences) and diacritics (secondary differences). * Differences between base characters takes precedence over secondary
* differences.
*/
db.users.createIndex( { name: 1 }, collation: { locale: 'tr', strength: 2 } } )
To use the index, queries must specify the same collation.
db.users.insert( [ { name: "Oğuz" },
{ name: "oğuz" },
{ name: "OĞUZ" } ] )
// does not use index, finds one result
db.users.find( { name: "oğuz" } )
// uses the index, finds three results
db.users.find( { name: "oğuz" } ).collation( { locale: 'tr', strength: 2 } )
// does not use the index, finds three results (different strength)
db.users.find( { name: "oğuz" } ).collation( { locale: 'tr', strength: 1 } )
or you can create a collection with default collation:
db.createCollection("users", { collation: { locale: 'tr', strength: 2 } } )
db.users.createIndex( { name : 1 } ) // inherits the default collation
I'm surprised nobody has warned about the risk of regex injection by using /^bar$/i if bar is a password or an account id search. (I.e. bar => .*#myhackeddomain.com e.g., so here comes my bet: use \Q \E regex special chars! provided in PERL
db.stuff.find( { foo: /^\Qbar\E$/i } );
You should escape bar variable \ chars with \\ to avoid \E exploit again when e.g. bar = '\E.*#myhackeddomain.com\Q'
Another option is to use a regex escape char strategy like the one described here Javascript equivalent of Perl's \Q ... \E or quotemeta()
Use RegExp,
In case if any other options do not work for you, RegExp is a good option. It makes the string case insensitive.
var username = new RegExp("^" + "John" + "$", "i");;
use username in queries, and then its done.
I hope it will work for you too. All the Best.
If there are some special characters in the query, regex simple will not work. You will need to escape those special characters.
The following helper function can help without installing any third-party library:
const escapeSpecialChars = (str) => {
return str.replace(/[-[\]{}()*+?.,\\^$|#\s]/g, "\\$&");
}
And your query will be like this:
db.collection.find({ field: { $regex: escapeSpecialChars(query), $options: "i" }})
Hope it will help!
Using a filter works for me in C#.
string s = "searchTerm";
var filter = Builders<Model>.Filter.Where(p => p.Title.ToLower().Contains(s.ToLower()));
var listSorted = collection.Find(filter).ToList();
var list = collection.Find(filter).ToList();
It may even use the index because I believe the methods are called after the return happens but I haven't tested this out yet.
This also avoids a problem of
var filter = Builders<Model>.Filter.Eq(p => p.Title.ToLower(), s.ToLower());
that mongodb will think p.Title.ToLower() is a property and won't map properly.
I had faced a similar issue and this is what worked for me:
const flavorExists = await Flavors.findOne({
'flavor.name': { $regex: flavorName, $options: 'i' },
});
Yes it is possible
You can use the $expr like that:
$expr: {
$eq: [
{ $toLower: '$STRUNG_KEY' },
{ $toLower: 'VALUE' }
]
}
Please do not use the regex because it may make a lot of problems especially if you use a string coming from the end user.
I've created a simple Func for the case insensitive regex, which I use in my filter.
private Func<string, BsonRegularExpression> CaseInsensitiveCompare = (field) =>
BsonRegularExpression.Create(new Regex(field, RegexOptions.IgnoreCase));
Then you simply filter on a field as follows.
db.stuff.find({"foo": CaseInsensitiveCompare("bar")}).count();
These have been tested for string searches
{'_id': /.*CM.*/} ||find _id where _id contains ->CM
{'_id': /^CM/} ||find _id where _id starts ->CM
{'_id': /CM$/} ||find _id where _id ends ->CM
{'_id': /.*UcM075237.*/i} ||find _id where _id contains ->UcM075237, ignore upper/lower case
{'_id': /^UcM075237/i} ||find _id where _id starts ->UcM075237, ignore upper/lower case
{'_id': /UcM075237$/i} ||find _id where _id ends ->UcM075237, ignore upper/lower case
For any one using Golang and wishes to have case sensitive full text search with mongodb and the mgo godoc globalsign library.
collation := &mgo.Collation{
Locale: "en",
Strength: 2,
}
err := collection.Find(query).Collation(collation)
As you can see in mongo docs - since version 3.2 $text index is case-insensitive by default: https://docs.mongodb.com/manual/core/index-text/#text-index-case-insensitivity
Create a text index and use $text operator in your query.

Checking if a field contains a string

I'm looking for an operator, which allows me to check, if the value of a field contains a certain string.
Something like:
db.users.findOne({$contains:{"username":"son"}})
Is that possible?
You can do it with the following code.
db.users.findOne({"username" : {$regex : "son"}});
As Mongo shell support regex, that's completely possible.
db.users.findOne({"username" : /.*son.*/});
If we want the query to be case-insensitive, we can use "i" option, like shown below:
db.users.findOne({"username" : /.*son.*/i});
See: http://www.mongodb.org/display/DOCS/Advanced+Queries#AdvancedQueries-RegularExpressions
https://docs.mongodb.com/manual/reference/sql-comparison/
http://php.net/manual/en/mongo.sqltomongo.php
MySQL
SELECT * FROM users WHERE username LIKE "%Son%"
MongoDB
db.users.find({username:/Son/})
As of version 2.4, you can create a text index on the field(s) to search and use the $text operator for querying.
First, create the index:
db.users.createIndex( { "username": "text" } )
Then, to search:
db.users.find( { $text: { $search: "son" } } )
Benchmarks (~150K documents):
Regex (other answers) => 5.6-6.9 seconds
Text Search => .164-.201 seconds
Notes:
A collection can have only one text index. You can use a wildcard text index if you want to search any string field, like this: db.collection.createIndex( { "$**": "text" } ).
A text index can be large. It contains one index entry for each unique post-stemmed word in each indexed field for each document inserted.
A text index will take longer to build than a normal index.
A text index does not store phrases or information about the proximity of words in the documents. As a result, phrase queries will run much more effectively when the entire collection fits in RAM.
As this is one of the first hits in the search engines, and none of the above seems to work for MongoDB 3.x, here is one regex search that does work:
db.users.find( { 'name' : { '$regex' : yourvalue, '$options' : 'i' } } )
No need to create and extra index or alike.
Here's what you have to do if you are connecting MongoDB through Python
db.users.find({"username": {'$regex' : '.*' + 'Son' + '.*'}})
you may also use a variable name instead of 'Son' and therefore the string concatenation.
Simplest way to accomplish this task
If you want the query to be case-sensitive
db.getCollection("users").find({'username':/Son/})
If you want the query to be case-insensitive
db.getCollection("users").find({'username':/Son/i})
ideal answer its use index
i option for case-insensitive
db.users.findOne({"username" : new RegExp(search_value, 'i') });
This should do the work
db.users.find({ username: { $in: [ /son/i ] } });
The i is just there to prevent restrictions of matching single cases of letters.
You can check the $regex documentation on MongoDB documentation.
Here's a link: https://docs.mongodb.com/manual/reference/operator/query/regex/
I use this code and it work for search substring
db.users.find({key: { $regex: new RegExp(value, 'i')}})
If you need to do the search for more than one attribute you can use the $or. For example
Symbol.find(
{
$or: [
{ 'symbol': { '$regex': input, '$options': 'i' } },
{ 'name': { '$regex': input, '$options': 'i' } }
]
}
).then((data) => {
console.log(data)
}).catch((err) => {
console.log(err)
})
Here you are basing your search on if the input is contained in the symbol attribute or the name attribute.
If the regex is not working in your Aggregate solution and you have nested object. Try this aggregation pipeline: (If your object structure is simple then, just remove the other conditions from below query):
db.user.aggregate({$match:
{$and:[
{"UserObject.Personal.Status":"ACTV"},
{"UserObject.Personal.Address.Home.Type":"HME"},
{"UserObject.Personal.Address.Home.Value": /.*son.*/ }
]}}
)
One other way would be to directly query like this:
db.user.findOne({"UserObject.Personal.Address.Home.Value": /.*son.*/ });
If your regex includes a variable, make sure to escape it.
function escapeRegExp(string) {
return string.replace(/[.*+?^${}()|[\]\\]/g, '\\$&'); // $& means the whole matched string
}
This can be used like this
new RegExp(escapeRegExp(searchString), 'i')
Or in a mongoDb query like this
{ '$regex': escapeRegExp(searchString) }
Posted same comment here
For aggregation framework
Field search
('$options': 'i' for case insensitive search)
db.users.aggregate([
{
$match: {
'email': { '$regex': '#gmail.com', '$options': 'i' }
}
}
]);
Full document search
(only works on fields indexed with a text index
db.articles.aggregate([
{
$match: { $text: { $search: 'brave new world' } }
}
])
How to ignore HTML tags in a RegExp match:
var text = '<p>The <b>tiger</b> (<i>Panthera tigris</i>) is the largest cat species, most recognizable for its pattern of dark vertical stripes on reddish-orange fur with a lighter underside. The species is classified in the genus <i>Panthera</i> with the lion, leopard, jaguar, and snow leopard. It is an apex predator, primarily preying on ungulates such as deer and bovids.</p>';
var searchString = 'largest cat species';
var rx = '';
searchString.split(' ').forEach(e => {
rx += '('+e+')((?:\\s*(?:<\/?\\w[^<>]*>)?\\s*)*)';
});
rx = new RegExp(rx, 'igm');
console.log(text.match(rx));
This is probably very easy to turn into a MongoDB aggregation filter.