Retrieve embedded document in MongoDB - mongodb

I have a collection like with elements like this:
{
_id: 585b...,
data: [
{
name: "John",
age: 30
},
{
name: "Jane",
age: 31
}
]
}
I know how to find the document that contains John:
db.People.find({"data.name", "John"})
But then I get the entire document. How can I get just the embedded document. So I want to return this:
{
name: "John",
age: 30
}
For context: this is part of a larger dataset and I need to check if certain updates are made to this specific document. Due to the way the application is implemented, the embedded document won't always be at the same index.
So how can I query and return an embedded document?

Use a second parameter to suppress the ID
db.people.find({"data.name", "John"}, {_id : 0})
This will output
data: [
{
name: "John",
age: 30
},
{
name: "Jane",
age: 31
}
]
To get just the embedded documents, use aggregation.
db.test.aggregate([
{
$unwind : "$data"
},
{
$match : {"data.name" : "John"}
},
{
$project : {
_id : 0,
name : "$data.name",
age : "$data.age"
}
}
])

Related

How to update a document with a reference to its previous state?

Is it possible to reference the root document during an update operation such that a document like this:
{"name":"foo","value":1}
can be updated with new values and have the full (previous) document pushed into a new field (creating an update history):
{"name":"bar","value":2,"previous":[{"name:"foo","value":1}]}
And so on..
{"name":"baz","value":3,"previous":[{"name:"foo","value":1},{"name:"bar","value":2}]}
I figure I'll have to use the new aggregate set operator in Mongo 4.2, but how can I achieve this?
Ideally I don't want to have to reference each field explicitly. I'd prefer to push the root document (minus the _id and previous fields) without knowing what the other fields are.
In addition to the new $set operator, what makes your use case really easier with Mongo 4.2 is the fact that db.collection.update() now accepts an aggregation pipeline, finally allowing the update of a field based on its current value:
// { name: "foo", value: 1 }
db.collection.update(
{},
[{ $set: {
previous: {
$ifNull: [
{ $concatArrays: [ "$previous", [{ name: "$name", value: "$value" }] ] },
[ { name: "$name", value: "$value" } ]
]
},
name: "bar",
value: 2
}}],
{ multi: true }
)
// { name: "bar", value: 2, previous: [{ name: "foo", value: 1 }] }
// and if applied again:
// { name: "baz", value: 3, previous: [{ name: "foo", value: 1 }, { name: "bar", value: 2 } ] }
The first part {} is the match query, filtering which documents to update (in our case probably all documents).
The second part [{ $set: { previous: { $ifNull: [ ... } ] is the update aggregation pipeline (note the squared brackets signifying the use of an aggregation pipeline):
$set is a new aggregation operator and an alias of $addFields. It's used to add/replace a new field (in our case "previous") with values from the current document.
Using an $ifNull check, we can determine whether "previous" already exists in the document or not (this is not the case for the first update).
If "previous" doesn't exist (is null), then we have to create it and set it with an array of one element: the current document: [ { name: "$name", value: "$value" } ].
If "previous" already exist, then we concatenate ($concatArrays) the existing array with the current document.
Don't forget { multi: true }, otherwise only the first matching document will be updated.
As you mentioned "root" in your question and if your schema is not the same for all documents (if you can't tell which fields should be used and pushed in the "previous" array), then you can use the $$ROOT variable which represents the current document and filter out the "previous" array. In this case, replace both { name: "$name", value: "$value" } from the previous query with:
{ $arrayToObject: { $filter: {
input: { $objectToArray: "$$ROOT" },
as: "root",
cond: { $ne: [ "$$root.k", "previous" ] }
}}}
Imho, you are making your life indefinitely more complex for no reason with such complicated data models.
Think of what you really want to achieve. You want to correlate different values in one or more interconnected series which are written to the collection consecutively.
Storing this in one document comes with some strings attached. While it seems to be reasonable in the beginning, let me name a few:
How do you get the most current document if you do not know it's value for name?
How do you deal with very large series, which make the document hit the 16MB limit?
What is the benefit of the added complexity?
Simplify first
So, let's assume you have only one series for a moment. It gets as simple as
[{
"_id":"foo",
"ts": ISODate("2019-07-03T17:40:00.000Z"),
"value":1
},{
"_id":"bar",
"ts": ISODate("2019-07-03T17:45:00.000"),
"value":2
},{
"_id":"baz",
"ts": ISODate("2019-07-03T17:50:00.000"),
"value":3
}]
Assuming the name is unique, we can use it as _id, potentially saving an index.
You can actually get the semantic equivalent by simply doing a
> db.seriesa.find().sort({ts:-1})
{ "_id" : "baz", "ts" : ISODate("2019-07-03T17:50:00Z"), "value" : 3 }
{ "_id" : "bar", "ts" : ISODate("2019-07-03T17:45:00Z"), "value" : 2 }
{ "_id" : "foo", "ts" : ISODate("2019-07-03T17:40:00Z"), "value" : 1 }
Say you only want to have the two latest values, you can use limit():
> db.seriesa.find().sort({ts:-1}).limit(2)
{ "_id" : "baz", "ts" : ISODate("2019-07-03T17:50:00Z"), "value" : 3 }
{ "_id" : "bar", "ts" : ISODate("2019-07-03T17:45:00Z"), "value" : 2 }
Should you really need to have the older values in a queue-ish array
db.seriesa.aggregate([{
$group: {
_id: "queue",
name: {
$last: "$_id"
},
value: {
$last: "$value"
},
previous: {
$push: {
name: "$_id",
value: "$value"
}
}
}
}, {
$project: {
name: 1,
value: 1,
previous: {
$slice: ["$previous", {
$subtract: [{
$size: "$previous"
}, 1]
}]
}
}
}])
Nail it
Now, let us say you have more than one series of data. Basically, there are two ways of dealing with it: put different series in different collections or put all the series in one collection and make a distinction by a field, which for obvious reasons should be indexed.
So, when to use what? It boils down wether you want to do aggregations over all series (maybe later down the road) or not. If you do, you should put all series into one collection. Of course, we have to slightly modify our data model:
[{
"name":"foo",
"series": "a"
"ts": ISODate("2019-07-03T17:40:00.000Z"),
"value":1
},{
"name":"bar",
"series": "a"
"ts": ISODate("2019-07-03T17:45:00.000"),
"value":2
},{
"name":"baz",
"series": "a"
"ts": ISODate("2019-07-03T17:50:00.000"),
"value":3
},{
"name":"foo",
"series": "b"
"ts": ISODate("2019-07-03T17:40:00.000Z"),
"value":1
},{
"name":"bar",
"series": "b"
"ts": ISODate("2019-07-03T17:45:00.000"),
"value":2
},{
"name":"baz",
"series": "b"
"ts": ISODate("2019-07-03T17:50:00.000"),
"value":3
}]
Note that for demonstration purposes, I fell back for the default ObjectId value for _id.
Next, we create an index over series and ts, as we are going to need it for our query:
> db.series.ensureIndex({series:1,ts:-1})
And now our simple query looks like this
> db.series.find({"series":"b"},{_id:0}).sort({ts:-1})
{ "name" : "baz", "series" : "b", "ts" : ISODate("2019-07-03T17:50:00Z"), "value" : 3 }
{ "name" : "bar", "series" : "b", "ts" : ISODate("2019-07-03T17:45:00Z"), "value" : 2 }
{ "name" : "foo", "series" : "b", "ts" : ISODate("2019-07-03T17:40:00Z"), "value" : 1 }
In order to generate the queue-ish like document, we need to add a match state
> db.series.aggregate([{
$match: {
"series": "b"
}
},
// other stages omitted for brevity
])
Note that the index we created earlier will be utilized here.
Or, we can generate a document like this for every series by simply using series as the _id in the $group stage and replace _id with name where appropriate
db.series.aggregate([{
$group: {
_id: "$series",
name: {
$last: "$name"
},
value: {
$last: "$value"
},
previous: {
$push: {
name: "$name",
value: "$value"
}
}
}
}, {
$project: {
name: 1,
value: 1,
previous: {
$slice: ["$previous", {
$subtract: [{
$size: "$previous"
}, 1]
}]
}
}
}])
Conclusion
Stop Being Clever when it comes to data models in MongoDB. Most of the problems with data models I saw in the wild and the vast majority I see on SO come from the fact that someone tried to be Smart (by premature optimization) ™.
Unless we are talking of ginormous series (which can not be, since you settled for a 16MB limit in your approach), the data model and queries above are highly efficient without adding unneeded complexity.
addMultipleData: (req, res, next) => {
let name = req.body.name ? req.body.name : res.json({ message: "Please enter Name" });
let value = req.body.value ? req.body.value : res.json({ message: "Please Enter Value" });
if (!req.body.name || !req.body.value) { return; }
//Step 1
models.dynamic.findOne({}, function (findError, findResponse) {
if (findResponse == null) {
let insertedValue = {
name: name,
value: value
}
//Step 2
models.dynamic.create(insertedValue, function (error, response) {
res.json({
message: "succesfully inserted"
})
})
}
else {
let pushedValue = {
name: findResponse.name,
value: findResponse.value
}
let updateWith = {
$set: { name: name, value: value },
$push: { previous: pushedValue }
}
let options = { upsert: true }
//Step 3
models.dynamic.updateOne({}, updateWith, options, function (error, updatedResponse) {
if (updatedResponse.nModified == 1) {
res.json({
message: "succesfully inserted"
})
}
})
}
})
}
//This is the schema
var multipleAddSchema = mongoose.Schema({
"name":String,
"value":Number,
"previous":[]
})

Aggregate on array of embedded documents

I have a mongodb collection with multiple documents. Each document has an array with multiple subdocuments (or embedded documents i guess?). Each of these subdocuments is in this format:
{
"name": string,
"count": integer
}
Now I want to aggregate these subdocuments to find
The top X counts and their name.
Same as 1. but the names have to match a regex before sorting and limiting.
I have tried the following for 1. already - it does return me the top X but unordered, so I'd have to order them again which seems somewhat inefficient.
[{
$match: {
_id: id
}
}, {
$unwind: {
path: "$array"
}
}, {
$sort: {
'count': -1
}
}, {
$limit: x
}]
Since i'm rather new to mongodb this is pretty confusing for me. Happy for any help. Thanks in advance.
The sort has to include the array name in order to avoid an additional sort later on.
Given the following document to work with:
{
students: [{
count: 4,
name: "Ann"
}, {
count: 7,
name: "Brad"
}, {
count: 6,
name: "Beth"
}, {
count: 8,
name: "Catherine"
}]
}
As an example, the following aggregation query will match any name containing the letters "h" and "e". This needs to happen after the "$unwind" step in order to only keep the ones you need.
db.tests.aggregate([
{$match: {
_id: ObjectId("5c1b191b251d9663f4e3ce65")
}},
{$unwind: {
path: "$students"
}},
{$match: {
"students.name": /[he]/
}},
{$sort: {
"students.count": -1
}},
{$limit: 2}
])
This is the output given the above mentioned input:
{ "_id" : ObjectId("5c1b191b251d9663f4e3ce65"), "students" : { "count" : 8, "name" : "Catherine" } }
{ "_id" : ObjectId("5c1b191b251d9663f4e3ce65"), "students" : { "count" : 6, "name" : "Beth" } }
Both names contain the letters "h" and "e", and the output is sorted from high to low.
When setting the limit to 1, the output is limited to:
{ "_id" : ObjectId("5c1b191b251d9663f4e3ce65"), "students" : { "count" : 8, "name" : "Catherine" } }
In this case only the highest count has been kept after having matched the names.
=====================
Edit for the extra question:
Yes, the first $match can be changed to filter on specific universities.
{$match: {
university: "University X"
}},
That will give one or more matching documents (in case you have a document per year or so) and the rest of the aggregation steps would still be valid.
The following match would retrieve the students for the given university for a given academic year in case that would be needed.
{$match: {
university: "University X",
academic_year: "2018-2019"
}},
That should narrow it down to get the correct documents.

MongoDB partial match on objects

For example, I'm having a document like this:
{
name: "Test",
params: {
device: "windows",
gender: "m",
age: 28
}
}
Now I get the following params input:
{
device: "windows",
gender: "m"
}
In this case, I want to find all documents that partially match with the input object. Is there an easy way to solve this issue? I've tried $elemMatch but this seems only to work with arrays.
If I understood your question correctly, you have an input object which may contain some of the fields in the main document's params object, in any order, for example:
{
device: "windows",
gender: "m"
}
{
gender: "m",
device: "windows"
}
{
device: "windows",
age: 28
}
and you want to match only if all the fields in the input object exist in the main document:
{
device: "linux", // NO MATCH
gender: "m"
}
{
gender: "m", // MATCH
device: "windows"
}
{
device: "windows", // NO MATCH
age: 29
}
Correct?
Option 1
Does your param object always contain only those three fields (device, gender and age)?
If so, you can manually project each of them at the root level, do your match, and "unproject" them again:
db.my_collection.aggregate([
{
$project: {
name: 1,
params: 1,
device: "$params.device", // add these three for your match stage
gender: "$params.gender",
age: "$params.age"
}
},
{
$match: input_params
},
{
$project: {name: 1, params: 1} // back to original
}
]);
But I assume you want to do this for any number of fields inside your params object.
Option 2
Do you have a way to manipulate the input object coming in? If so, you could preface all fields with "params.":
let input_params =
{
device: "windows",
gender: "m"
};
let new_input_params =
{
"params.device": "windows",
"params.gender": "m"
};
Then your query would just be:
db.my_collection.find(new_input_params);
Option 3
If you have no way of modifying the input, you can use the $replaceRoot aggregation operator (as of version 3.4, credit to this answer) to flatten your params into the root document. Since this will replace the root document with the embedded one, you need to extract the fields you are interested in first, at least the _id field:
db.my_collection.aggregate([
{
$addFields: {"params.id": "$_id"} // save _id inside the params doc,
// as you will lose the original one
},
{
$replaceRoot: {newRoot: "$params"}
},
{
$match: input_params
},
...
]);
This will match the documents and retain the _id field which you can use to get the rest of the document again, through a $lookup for example:
...
{
$lookup: {
from: "my_collection",
localField: "id",
foreignField: "_id",
as: "doc"
}
},
...
This will get your documents in the following format:
{
"device" : "windows",
"gender" : "m",
"age" : 28,
"id" : ObjectId("XXX"),
"doc" : [
{
"_id" : ObjectId("XXX"),
"name" : "Test",
"params" : {
"device" : "windows",
"gender" : "m",
"age" : 28
}
}
]
}
To go full circle and get your original document format back, you can:
...
{
$unwind: "$doc" // get rid of the [] around the "doc" object
// ($lookup always results in array)
},
{
$replaceRoot: {newRoot: "$doc"} // get your "doc" back to the root
}
...
As I write this, I cannot believe there is no cleaner way of doing this using MongoDB alone, but I cannot think of any.
I hope this helps!
You can use $or operator for partial comparison of the fields from params object. You can use like this:
db.collection.find({
$or: [
{
"params.device": "windows"
},
{
"params.gender": "m"
}
]
})
Using projection
A document may have many fields, but not all of these fields may be necessary and important when requested. In this case, you can only include the required fields in the sample by using the projection.
By default, queries in MongoDB return all fields in matching
documents. To limit the amount of data that MongoDB sends to
applications, you can include a projection document to specify or
restrict fields to return.
For example, you previously added the following documents to the database:
> db.users.insertOne({"name": "Jennifer", "gender": "female", languages: ["english", "spanish"]})
> db.users.insertOne({"name": "Alex", "gender": "male", languages: ["english", "french"]})
> db.users.insertOne({"name": "Maria", "gender": "female", languages: ["english", "german"]})
And now you want to display information that match the field gender and has value female, so you can specify it via query:
db.users.find({gender: "female"}, {name: 1})
The operation returns the following documents:
{ "name" : "Jennifer", "gender" : "female" }
{ "name" : "Maria", "gender" : "female" }
The <value> can be any of the following:
{ field1: <value>, field2: <value> ... }
1 or true to include the field in the return documents.
0 or false to exclude the field.
Expression using a Projection Operators.
Moreover, if you don't want to concretize the selection, but want to display all the documents, then we can leave the first brackets empty:
db.users.find({}, {name: 1, _id: 0})

MongoDB: Find and then modify the resulting object

is it possible in MongoDB to find some objects that match a query and then to modify the result without modifying the persistent data?
For example, let
students = [
{ name: "Alice", age: 25 },
{ name: "Bob", age: 22 },
{ name: "Carol", age: 19 },
{ name: "Dave", age: 18}
]
Now, I want to query all students that are younger than 20 and in the search result, I just want to replace "age: X" with "under20: 1" resulting in the following:
result = [
{ name: "Carol", under20: 1 },
{ name: "Dave", under20: 1}
]
without changing anything in the database.
Sure, it is possible to get the result and then call a forEach on it, but that sounds so inefficient because I have to rerun every object again, so I'm searching for an alternative. Or is there no one?
A possible solution would be to use an aggregation pipline with a $match followed by a $project:
db.students.aggregate(
[
{
$match: { age: { $lt: 20 } }
},
{
$project:
{
_id: false,
name: true,
under20: { $literal: 1 }
}
}
]);
The $literal: 1 is required as just using under20: 1 is the same as under20: true, requesting that field under20 be included in the result: which would fail as under20 does not exist in the document produced by the match.
Or to return all documents in students and conditionally generate the value for under20 a possible solution would be to use $cond:
db.students.aggregate(
[
{
$project:
{
_id: false,
name: true,
under20:
{
$cond: { if: { $lt: [ "$age", 20 ] }, then: 1, else: 0 }
}
}
}
]);

MongoDB/Mongoose orderby date and groupby user

I have a collection where multiple documents may have the same userId field. I would like to groupby userId so that I get a list of unique userIds, but also a sort by date so that each returned document is the latest document for that userId. I've done queries like this with sql, and I'm really hoping its possible with mongo.
In this example collection:
{ userId: 456, date: 5/16/1988 },
{ userId: 456, date: 5/17/1988 },
{ userId: 789, date: 5/18/1988 },
{ userId: 789, date: 5/17/1988 }
I would want to return:
{ userId: 456, date: 5/17/1988 },
{ userId: 789, date: 5/18/1988 }
Here is how you would do it in mongo. Note that it got this to work with a date format of yyyy-mm-dd.
db.collection.aggregate({
$group: {
id : '$userId',
date: { $max: '$date'}
}
})
Sources: http://docs.mongodb.org/manual/
In your question you say you want the full document returned. You can do this by returning the full document as a field in the $group operator.
db.coll.aggregate([
{$sort:{date:-1}},
{$group: {_id: "$userId", doc:{$first: "$$CURRENT"}} }
])
I created four documents like in your question with Date type as some random dates. This would give you the following result:
{
"_id" : 789,
"doc" : {
"_id" : ObjectId("53d80e246ebc37d0c33321ba"),
"userId" : 789,
"date" : ISODate("2014-07-05T04:00:00.000Z")
}
},
{
"_id" : 456,
"doc" : {
"_id" : ObjectId("53d80e246ebc37d0c33321b8"),
"userId" : 456,
"date" : ISODate("2014-07-05T04:00:00.000Z")
}
}
See http://docs.mongodb.org/manual/reference/operator/aggregation/group/#variables for more info on $$CURENT
Although they probably are the right way to go, I wasn't able to use the db.collection.aggregate methods because I need to use other things like .populate() on a Model.find in this situation. So I came up with a work around where I sort on userID and date in the find(options) like so:
{ sort: { updateDate: -1, userId: -1 } }
Then I wrote a function on the front end to extract the latest record for each user:
filterLatest: function(docs) {
var lastUserId = null;
var latestDocs = [];
docs.forEach(function(doc) {
if(lastUserId != doc.userId) latestDocs.push(doc);
lastUserId = doc.userId;
});
return latestDocs;
}