Mongo $in query with case-insensitivity - mongodb

I'm using Mongoose.js to perform an $in query, like so:
userModel.find({
'twitter_username': {
$in: friends
}
})
friends is just an array of strings. However, I'm having some case issues, and wondering if I can use Mongo's $regex functionality to make this $in query case-insensitive?

From the docs:
To include a regular expression in an $in query expression, you can
only use JavaScript regular expression objects (i.e. /pattern/ ). For
example:
{ name: { $in: [ /^acme/i, /^ack/ ] } }
One way is to create regular Expression for each Match and form the friends array.
var friends = [/^name1$/i,/^name2$/i];
or,
var friends = [/^(name1|name2)$/i]
userModel.find({"twitter_username":{$in:friends }})

Its a little tricky to do something like that
at first you should convert friends to new regex array list with:
var insesitiveFriends = [];
friends.forEach(function(item)
{
var re = new RegExp(item, "i");
insesitiveFriends.push(re);
})
then run the query
db.test.find(
{
'twitter_username':
{
$in: insesitiveFriends
}
})
I have the sample documents in test collection
/* 0 */
{
"_id" : ObjectId("5485e2111bb8a63952bc933d"),
"twitter_username" : "David"
}
/* 1 */
{
"_id" : ObjectId("5485e2111bb8a63952bc933e"),
"twitter_username" : "david"
}
and with var friends = ['DAvid','bob']; I got both documents

Sadly, mongodb tends to be pretty case-sensitive at the core. You have a couple of options:
1) Create a separate field that is a lowercase'd version of the twitter_username, and index that one instead. Your object would have twitter_username and twitter_username_lc. The non-lowerecase one you can use for display etc, but the lowercase one you index and use in your where clause etc.
This is the route I chose to go for my application.
2) Create a really ugly regex from your string of usernames in a loop prior to your find, then pass it in:
db.users.find({handle:{$regex: /^benhowdle89|^will shaver|^superman/i } })
Note that using the 'starts with' ^ carrot performs better if the field is indexed.

Related

MongoDB : Match with element in an array

I am working on a collection called Publications. Each publication has an array of objectives which are ids. I have also a custom array of objectives hand written. Now, I want to select all the publications that contains at least one element of the custom objectives array in their objectives. How can I do that ?
I've been trying to make this works with '$setIntersection' then '$count' and verify that the count is greater than 0 but I don't know how to implement this.
Example :
publication_1: {
'_id': ObjectId("sdfsdf46543")
'objectives': [ObjectId("1654351456341"), ObjectId("123456789")]
}
publication_2: {
'_id': ObjectId("sdfs216546543")
'objectives': [ObjectId("1654351456341"), ObjectId("46531132")]
}
custom_array = [ObjectId("123456789"), ObjectId("2416315463")]
The mongo query should return publication_1.
You can do like the following:
db.publications.find({
"objectives": {
"$in": [
ObjectId("123456789"),
ObjectId("2416315463")
]
}
})
Notice: "123456789" is not a valid ObjectId so the query itself may not work. Here is the working example
Mongodb playground link: https://mongoplayground.net/p/MbZK99Pd5YR
objectives is an array of objects, I guess you can just query that field directly:
let custom_array = [ObjectId("123456789"), ObjectId("2416315463")];
// You can search the array with $in property.
let result = await Model.find({ objectives: {$in : custom_array} })

How to check if a portion of an _id from one collection appears in another

I have a collection where the _id is of the form [message_code]-[language_code] and another where the _id is just [message_code]. What I'd like to do is find all documents from the first collection where the message_code portion of the _id does not appear in the second collection.
Example:
> db.colA.find({})
{ "_id" : "TRM1-EN" }
{ "_id" : "TRM1-ES" }
{ "_id" : "TRM2-EN" }
{ "_id" : "TRM2-ES" }
> db.colB.find({})
{ "_id" : "TRM1" }
I want a query that will return TRM2-EN and TRM-ES from colA. Of course in my live data, there are thousands of records in each collection.
According to this question which is trying to do something similar, we have to save the results from a query against colB and use it in an $in condition in a query against colA. In my case, I need to strip the -[language_code] portion before doing this comparison, but I can't find a way to do so.
If all else fails, I'll just create a new field in colA that contains only the message code, but is there a better way do it?
Edit:
Based on Michael's answer, I was able to come up with this solution:
var arr = db.colB.distinct("_id")
var regexs = arr.map(function(elm){
return new RegExp(elm);
})
var result = db.colA.find({_id : {$nin : regexs}}, {_id : true})
Edit:
Upon closer inspection, the above method doesn't work after all. In the end, I just had to add the new field.
Disclaimer: This is a little hack it may not end well.
Get distinct _id using collection.distinct method.
Build a regular expression array using Array.prototype.map()
var arr = db.colB.distinct('_id');
arr.map(function(elm, inx, tab) {
tab[inx] = new RegExp(elm);
});
db.colA.find({ '_id': { '$nin': arr }})
I'd add a new field to colA since you can index it and if you have hundreds of thousands of documents in each collection splitting the strings will be painfully slow.
But if you don't want to do that you could make use of the aggregation framework's $substr operator to extract the [message-code] then do a $match on the result.

MongoDB: filter records by checking if subfield keys include a specified set

I have records in a MongoDB collection with the following structure:
{
'field1': {
'a': 3,
'b': 1,
'c': 4,
...
}
}
I want to find all records for which the keys in field1 are in the following set: ['a','b'].
How can I structure a MongoDB query which will do this?
I found this post describing how to find all records which have a particular subfield. I would like to do the same, but testing for multiple subfields.
Thanks!
EDIT: I am aware I could write a query of the following form:
{'$and': [{'field1.a': {'$exists': true}, {'field1.b': {'$exists': true}]}
However, I would like to find a way to pass in a list of the subfield keys I'm looking for, instead of adding another $exists for each additional key.
I don't know of any MongoDB query that would be able to do this automatically. However, if you're willing to take advantage of the JavaScript available in the mongo shell, you can generate the $and query dynamically, which could be helpful. For example, given your sample data, you could do something like the following:
var q = { "$and" : [] };
var arr = ["a", "b", "c"];
for (key in arr) {
var field = "field1." + arr[key];
var clause = {};
clause[field] = { "$exists" : true };
q["$and"].push(clause);
}
db.collection.find(q);
This would definitely be easier to run than editing the query manually every time you add a key.
[EDIT]
Note that you do not need to use an explicit $and in the query, but just separate the clauses with commas. From this page in the documentation.
MongoDB provides an implicit AND operation when specifying a comma separated list of expressions. Using an explicit AND with the $and operator is necessary when the same field or operator has to be specified in multiple expressions.
This means that you can generate a simpler query as follows:
var q = {};
var arr = ["a", "b", "c"];
for (key in arr) {
var field = "field1." + arr[key];
q[field] = { "$exists" : true };
}
db.collection.find(q);

How can I use a $elemMatch on first level array?

Consider the following document:
{
"_id" : "ID_01",
"code" : ["001", "002", "003"],
"Others" : "544554"
}
I went through this MongoDB doc for elemmatch-query & elemmatch-projection, but not able to figure it out how to use the same for the above document.
Could anyone tell me how can I use $elemMatch for the field code?
You'll want to use the $in operator rather than $elemMatch in this case as $in can be used to search for a value (or values) inside a specific field. $in requires a list of values to be passed as an array. Additionally, and for your case, it will find either a single value, or by searching in an array of values. The entire matching document is returned.
For example, you might use it like this:
db.mycodes.find( { code: { $in: ["001"] } } )
Which could be simplified to just be:
db.mycodes.find({ code: "001" })
As MongoDB will look in an array for a single match like above ("001").
Or if you want to search for "001" or "002":
db.mycodes.find( { code: { $in: ["001", "002"] } } )
$in documentation
If you're simply looking to match all documents with an array containing a given value, you can just specify the value on the reference to that array, e.g.
db.mycodes.find( { code: '001' } )
Which thus would return you all documents that contained '001' in their code array

Checking if a field contains a string

I'm looking for an operator, which allows me to check, if the value of a field contains a certain string.
Something like:
db.users.findOne({$contains:{"username":"son"}})
Is that possible?
You can do it with the following code.
db.users.findOne({"username" : {$regex : "son"}});
As Mongo shell support regex, that's completely possible.
db.users.findOne({"username" : /.*son.*/});
If we want the query to be case-insensitive, we can use "i" option, like shown below:
db.users.findOne({"username" : /.*son.*/i});
See: http://www.mongodb.org/display/DOCS/Advanced+Queries#AdvancedQueries-RegularExpressions
https://docs.mongodb.com/manual/reference/sql-comparison/
http://php.net/manual/en/mongo.sqltomongo.php
MySQL
SELECT * FROM users WHERE username LIKE "%Son%"
MongoDB
db.users.find({username:/Son/})
As of version 2.4, you can create a text index on the field(s) to search and use the $text operator for querying.
First, create the index:
db.users.createIndex( { "username": "text" } )
Then, to search:
db.users.find( { $text: { $search: "son" } } )
Benchmarks (~150K documents):
Regex (other answers) => 5.6-6.9 seconds
Text Search => .164-.201 seconds
Notes:
A collection can have only one text index. You can use a wildcard text index if you want to search any string field, like this: db.collection.createIndex( { "$**": "text" } ).
A text index can be large. It contains one index entry for each unique post-stemmed word in each indexed field for each document inserted.
A text index will take longer to build than a normal index.
A text index does not store phrases or information about the proximity of words in the documents. As a result, phrase queries will run much more effectively when the entire collection fits in RAM.
As this is one of the first hits in the search engines, and none of the above seems to work for MongoDB 3.x, here is one regex search that does work:
db.users.find( { 'name' : { '$regex' : yourvalue, '$options' : 'i' } } )
No need to create and extra index or alike.
Here's what you have to do if you are connecting MongoDB through Python
db.users.find({"username": {'$regex' : '.*' + 'Son' + '.*'}})
you may also use a variable name instead of 'Son' and therefore the string concatenation.
Simplest way to accomplish this task
If you want the query to be case-sensitive
db.getCollection("users").find({'username':/Son/})
If you want the query to be case-insensitive
db.getCollection("users").find({'username':/Son/i})
ideal answer its use index
i option for case-insensitive
db.users.findOne({"username" : new RegExp(search_value, 'i') });
This should do the work
db.users.find({ username: { $in: [ /son/i ] } });
The i is just there to prevent restrictions of matching single cases of letters.
You can check the $regex documentation on MongoDB documentation.
Here's a link: https://docs.mongodb.com/manual/reference/operator/query/regex/
I use this code and it work for search substring
db.users.find({key: { $regex: new RegExp(value, 'i')}})
If you need to do the search for more than one attribute you can use the $or. For example
Symbol.find(
{
$or: [
{ 'symbol': { '$regex': input, '$options': 'i' } },
{ 'name': { '$regex': input, '$options': 'i' } }
]
}
).then((data) => {
console.log(data)
}).catch((err) => {
console.log(err)
})
Here you are basing your search on if the input is contained in the symbol attribute or the name attribute.
If the regex is not working in your Aggregate solution and you have nested object. Try this aggregation pipeline: (If your object structure is simple then, just remove the other conditions from below query):
db.user.aggregate({$match:
{$and:[
{"UserObject.Personal.Status":"ACTV"},
{"UserObject.Personal.Address.Home.Type":"HME"},
{"UserObject.Personal.Address.Home.Value": /.*son.*/ }
]}}
)
One other way would be to directly query like this:
db.user.findOne({"UserObject.Personal.Address.Home.Value": /.*son.*/ });
If your regex includes a variable, make sure to escape it.
function escapeRegExp(string) {
return string.replace(/[.*+?^${}()|[\]\\]/g, '\\$&'); // $& means the whole matched string
}
This can be used like this
new RegExp(escapeRegExp(searchString), 'i')
Or in a mongoDb query like this
{ '$regex': escapeRegExp(searchString) }
Posted same comment here
For aggregation framework
Field search
('$options': 'i' for case insensitive search)
db.users.aggregate([
{
$match: {
'email': { '$regex': '#gmail.com', '$options': 'i' }
}
}
]);
Full document search
(only works on fields indexed with a text index
db.articles.aggregate([
{
$match: { $text: { $search: 'brave new world' } }
}
])
How to ignore HTML tags in a RegExp match:
var text = '<p>The <b>tiger</b> (<i>Panthera tigris</i>) is the largest cat species, most recognizable for its pattern of dark vertical stripes on reddish-orange fur with a lighter underside. The species is classified in the genus <i>Panthera</i> with the lion, leopard, jaguar, and snow leopard. It is an apex predator, primarily preying on ungulates such as deer and bovids.</p>';
var searchString = 'largest cat species';
var rx = '';
searchString.split(' ').forEach(e => {
rx += '('+e+')((?:\\s*(?:<\/?\\w[^<>]*>)?\\s*)*)';
});
rx = new RegExp(rx, 'igm');
console.log(text.match(rx));
This is probably very easy to turn into a MongoDB aggregation filter.