MongoDB: filter records by checking if subfield keys include a specified set - mongodb

I have records in a MongoDB collection with the following structure:
{
'field1': {
'a': 3,
'b': 1,
'c': 4,
...
}
}
I want to find all records for which the keys in field1 are in the following set: ['a','b'].
How can I structure a MongoDB query which will do this?
I found this post describing how to find all records which have a particular subfield. I would like to do the same, but testing for multiple subfields.
Thanks!
EDIT: I am aware I could write a query of the following form:
{'$and': [{'field1.a': {'$exists': true}, {'field1.b': {'$exists': true}]}
However, I would like to find a way to pass in a list of the subfield keys I'm looking for, instead of adding another $exists for each additional key.

I don't know of any MongoDB query that would be able to do this automatically. However, if you're willing to take advantage of the JavaScript available in the mongo shell, you can generate the $and query dynamically, which could be helpful. For example, given your sample data, you could do something like the following:
var q = { "$and" : [] };
var arr = ["a", "b", "c"];
for (key in arr) {
var field = "field1." + arr[key];
var clause = {};
clause[field] = { "$exists" : true };
q["$and"].push(clause);
}
db.collection.find(q);
This would definitely be easier to run than editing the query manually every time you add a key.
[EDIT]
Note that you do not need to use an explicit $and in the query, but just separate the clauses with commas. From this page in the documentation.
MongoDB provides an implicit AND operation when specifying a comma separated list of expressions. Using an explicit AND with the $and operator is necessary when the same field or operator has to be specified in multiple expressions.
This means that you can generate a simpler query as follows:
var q = {};
var arr = ["a", "b", "c"];
for (key in arr) {
var field = "field1." + arr[key];
q[field] = { "$exists" : true };
}
db.collection.find(q);

Related

MongoDB : Match with element in an array

I am working on a collection called Publications. Each publication has an array of objectives which are ids. I have also a custom array of objectives hand written. Now, I want to select all the publications that contains at least one element of the custom objectives array in their objectives. How can I do that ?
I've been trying to make this works with '$setIntersection' then '$count' and verify that the count is greater than 0 but I don't know how to implement this.
Example :
publication_1: {
'_id': ObjectId("sdfsdf46543")
'objectives': [ObjectId("1654351456341"), ObjectId("123456789")]
}
publication_2: {
'_id': ObjectId("sdfs216546543")
'objectives': [ObjectId("1654351456341"), ObjectId("46531132")]
}
custom_array = [ObjectId("123456789"), ObjectId("2416315463")]
The mongo query should return publication_1.
You can do like the following:
db.publications.find({
"objectives": {
"$in": [
ObjectId("123456789"),
ObjectId("2416315463")
]
}
})
Notice: "123456789" is not a valid ObjectId so the query itself may not work. Here is the working example
Mongodb playground link: https://mongoplayground.net/p/MbZK99Pd5YR
objectives is an array of objects, I guess you can just query that field directly:
let custom_array = [ObjectId("123456789"), ObjectId("2416315463")];
// You can search the array with $in property.
let result = await Model.find({ objectives: {$in : custom_array} })

How to check if a portion of an _id from one collection appears in another

I have a collection where the _id is of the form [message_code]-[language_code] and another where the _id is just [message_code]. What I'd like to do is find all documents from the first collection where the message_code portion of the _id does not appear in the second collection.
Example:
> db.colA.find({})
{ "_id" : "TRM1-EN" }
{ "_id" : "TRM1-ES" }
{ "_id" : "TRM2-EN" }
{ "_id" : "TRM2-ES" }
> db.colB.find({})
{ "_id" : "TRM1" }
I want a query that will return TRM2-EN and TRM-ES from colA. Of course in my live data, there are thousands of records in each collection.
According to this question which is trying to do something similar, we have to save the results from a query against colB and use it in an $in condition in a query against colA. In my case, I need to strip the -[language_code] portion before doing this comparison, but I can't find a way to do so.
If all else fails, I'll just create a new field in colA that contains only the message code, but is there a better way do it?
Edit:
Based on Michael's answer, I was able to come up with this solution:
var arr = db.colB.distinct("_id")
var regexs = arr.map(function(elm){
return new RegExp(elm);
})
var result = db.colA.find({_id : {$nin : regexs}}, {_id : true})
Edit:
Upon closer inspection, the above method doesn't work after all. In the end, I just had to add the new field.
Disclaimer: This is a little hack it may not end well.
Get distinct _id using collection.distinct method.
Build a regular expression array using Array.prototype.map()
var arr = db.colB.distinct('_id');
arr.map(function(elm, inx, tab) {
tab[inx] = new RegExp(elm);
});
db.colA.find({ '_id': { '$nin': arr }})
I'd add a new field to colA since you can index it and if you have hundreds of thousands of documents in each collection splitting the strings will be painfully slow.
But if you don't want to do that you could make use of the aggregation framework's $substr operator to extract the [message-code] then do a $match on the result.

Mongo $in query with case-insensitivity

I'm using Mongoose.js to perform an $in query, like so:
userModel.find({
'twitter_username': {
$in: friends
}
})
friends is just an array of strings. However, I'm having some case issues, and wondering if I can use Mongo's $regex functionality to make this $in query case-insensitive?
From the docs:
To include a regular expression in an $in query expression, you can
only use JavaScript regular expression objects (i.e. /pattern/ ). For
example:
{ name: { $in: [ /^acme/i, /^ack/ ] } }
One way is to create regular Expression for each Match and form the friends array.
var friends = [/^name1$/i,/^name2$/i];
or,
var friends = [/^(name1|name2)$/i]
userModel.find({"twitter_username":{$in:friends }})
Its a little tricky to do something like that
at first you should convert friends to new regex array list with:
var insesitiveFriends = [];
friends.forEach(function(item)
{
var re = new RegExp(item, "i");
insesitiveFriends.push(re);
})
then run the query
db.test.find(
{
'twitter_username':
{
$in: insesitiveFriends
}
})
I have the sample documents in test collection
/* 0 */
{
"_id" : ObjectId("5485e2111bb8a63952bc933d"),
"twitter_username" : "David"
}
/* 1 */
{
"_id" : ObjectId("5485e2111bb8a63952bc933e"),
"twitter_username" : "david"
}
and with var friends = ['DAvid','bob']; I got both documents
Sadly, mongodb tends to be pretty case-sensitive at the core. You have a couple of options:
1) Create a separate field that is a lowercase'd version of the twitter_username, and index that one instead. Your object would have twitter_username and twitter_username_lc. The non-lowerecase one you can use for display etc, but the lowercase one you index and use in your where clause etc.
This is the route I chose to go for my application.
2) Create a really ugly regex from your string of usernames in a loop prior to your find, then pass it in:
db.users.find({handle:{$regex: /^benhowdle89|^will shaver|^superman/i } })
Note that using the 'starts with' ^ carrot performs better if the field is indexed.

Can MongoDB aggregate "top x" results in this document schema?

{
"_id" : "user1_20130822",
"metadata" : {
"date" : ISODate("2013-08-22T00:00:00.000Z"),
"username" : "user1"
},
"tags" : {
"abc" : 19,
"123" : 2,
"bca" : 64,
"xyz" : 14,
"zyx" : 12,
"321" : 7
}
}
Given the schema example above, is there a way to query this to retrieve the top "x" tags: E.g., Top 3 "tags" sorted descending?
Is this possible in a single document? e.g., top tags for a user on a given day
What if i have multiple documents that need to be combined together before getting the top? e.g., top tags for a user in a given month
I know this can be done by using a "document per user per tag per day" or by making "tags" an array, but I'd like to be able to do this as above, as it makes in place $inc's easier (many more of these happening than reads).
Or do I need to return back the whole document, and defer to the client on the sorting/limiting?
When you use object-keys as tag-names, you are making this kind of reporting very difficult. The aggreation framework has no $unwind-equivalent for objects. But there is always MapReduce.
Have your map-function emit one document for each key/value pair in the tags-subdocument. It should look something like this;
var mapFunction = function() {
for (var key in this.tags) {
emit(key, this.tags[key]);
}
}
Your reduce-function would then sum up the values emitted for the same key.
var reduceFunction = function(key, values) {
var sum = 0;
for (var i = 0; i < values.length; i++) {
sum += values[i];
}
return sum;
}
The complete MapReduce command would look something like this:
db.runCommand(
{
mapReduce: "yourcollection", // the collection where your data is stored
query: { _id : "user1_20130822" }, // or however you want to limit the results
map: mapFunction,
reduce: reduceFunction,
out: "inline", // means that the output is returned directly.
}
)
This will return all tags in unpredictable order. MapReduce has a sort and a limit option, but these only work on a field which has an index in the original collection, so you can't use it on a computed field. To get only the top 3, you would have to sort the results on the application-level. When you insist on doing the sorting and limiting on the database, define an output-collection to store the mapReduce results in (with the out-option set to out: { replace: "temporaryCollectionName" }) and then query that collection with sort and limit afterwards.
Keep in mind that when you use an intermediate collection, you must make sure that no two users run MapReduces with different queries into the same collection. When you have multiple users which want to view your top-3 list, you could let them query the output-collection and do the MapReduce in the background at regular intervales.

$unwind an object in aggregation framework

In the MongoDB aggregation framework, I was hoping to use the $unwind operator on an object (ie. a JSON collection). Doesn't look like this is possible, is there a workaround? Are there plans to implement this?
For example, take the article collection from the aggregation documentation . Suppose there is an additional field "ratings" that is a map from user -> rating. Could you calculate the average rating for each user?
Other than this, I'm quite pleased with the aggregation framework.
Update: here's a simplified version of my JSON collection per request. I'm storing genomic data. I can't really make genotypes an array, because the most common lookup is to get the genotype for a random person.
variants: [
{
name: 'variant1',
genotypes: {
person1: 2,
person2: 5,
person3: 7,
}
},
{
name: 'variant2',
genotypes: {
person1: 3,
person2: 3,
person3: 2,
}
}
]
It is not possible to do the type of computation you are describing with the aggregation framework - and it's not because there is no $unwind method for non-arrays. Even if the person:value objects were documents in an array, $unwind would not help.
The "group by" functionality (whether in MongoDB or in any relational database) is done on the value of a field or column. We group by value of field and sum/average/etc based on the value of another field.
Simple example is a variant of what you suggest, ratings field added to the example article collection, but not as a map from user to rating but as an array like this:
{ title : title of article", ...
ratings: [
{ voter: "user1", score: 5 },
{ voter: "user2", score: 8 },
{ voter: "user3", score: 7 }
]
}
Now you can aggregate this with:
[ {$unwind: "$ratings"},
{$group : {_id : "$ratings.voter", averageScore: {$avg:"$ratings.score"} } }
]
But this example structured as you describe it would look like this:
{ title : title of article", ...
ratings: {
user1: 5,
user2: 8,
user3: 7
}
}
or even this:
{ title : title of article", ...
ratings: [
{ user1: 5 },
{ user2: 8 },
{ user3: 7 }
]
}
Even if you could $unwind this, there is nothing to aggregate on here. Unless you know the complete list of all possible keys (users) you cannot do much with this. [*]
An analogous relational DB schema to what you have would be:
CREATE TABLE T (
user1: integer,
user2: integer,
user3: integer
...
);
That's not what would be done, instead we would do this:
CREATE TABLE T (
username: varchar(32),
score: integer
);
and now we aggregate using SQL:
select username, avg(score) from T group by username;
There is an enhancement request for MongoDB that may allow you to do this in the aggregation framework in the future - the ability to project values to keys to vice versa. Meanwhile, there is always map/reduce.
[*] There is a complicated way to do this if you know all unique keys (you can find all unique keys with a method similar to this) but if you know all the keys you may as well just run a sequence of queries of the form db.articles.find({"ratings.user1":{$exists:true}},{_id:0,"ratings.user1":1}) for each userX which will return all their ratings and you can sum and average them simply enough rather than do a very complex projection the aggregation framework would require.
Since 3.4.4, you can transform object to array using $objectToArray
See:
https://docs.mongodb.com/manual/reference/operator/aggregation/objectToArray/
This is an old question, but I've run across a tidbit of information through trial and error that people may find useful.
It's actually possible to unwind on a dummy value by fooling the parser this way:
db.Opportunity.aggregate(
{ $project: {
Field1: 1, Field2: 1, Field3: 1,
DummyUnwindField: { $ifNull: [null, [1.0]] }
}
},
{ $unwind: "$DummyUnwindField" }
);
This will produce 1 row per document, regardless of whether or not the value exists. You may be able tinker with this to generate the results you want. I had hoped to combine this with multiple $unwinds to (sort of like emit() in map/reduce), but alas, the last $unwind wins or they combine as an intersection rather than union which makes it impossible to achieve the results I was looking for. I am sadly disappointed with the aggregate framework functionality as it doesn't fit the one use case I was hoping to use it for (and seems strangely like a lot of the questions on StackOverflow in this area are asking) - ordering results based on match rate. Improving the poor map reduce performance would have made this entire feature unnecessary.
This is what I found & extended.
Lets create experimental database in mongo
db.copyDatabase('livedb' , 'experimentdb')
Now Use experimentdb & convert Array to object in your experimentcollection
db.getCollection('experimentcollection').find({}).forEach(function(e){
if(e.store){
e.ratings = [e.ratings]; //Objects name to be converted to array eg:ratings
db.experimentcollection.save(e);
}
})
Some nerdy js code to convert json to flat object
var flatArray = [];
var data = db.experimentcollection.find().toArray();
for (var index = 0; index < data.length; index++) {
var flatObject = {};
for (var prop in data[index]) {
var value = data[index][prop];
if (Array.isArray(value) && prop === 'ratings') {
for (var i = 0; i < value.length; i++) {
for (var inProp in value[i]) {
flatObject[inProp] = value[i][inProp];
}
}
}else{
flatObject[prop] = value;
}
}
flatArray.push(flatObject);
}
printjson(flatArray);