MongoDB partial match on objects - mongodb

For example, I'm having a document like this:
{
name: "Test",
params: {
device: "windows",
gender: "m",
age: 28
}
}
Now I get the following params input:
{
device: "windows",
gender: "m"
}
In this case, I want to find all documents that partially match with the input object. Is there an easy way to solve this issue? I've tried $elemMatch but this seems only to work with arrays.

If I understood your question correctly, you have an input object which may contain some of the fields in the main document's params object, in any order, for example:
{
device: "windows",
gender: "m"
}
{
gender: "m",
device: "windows"
}
{
device: "windows",
age: 28
}
and you want to match only if all the fields in the input object exist in the main document:
{
device: "linux", // NO MATCH
gender: "m"
}
{
gender: "m", // MATCH
device: "windows"
}
{
device: "windows", // NO MATCH
age: 29
}
Correct?
Option 1
Does your param object always contain only those three fields (device, gender and age)?
If so, you can manually project each of them at the root level, do your match, and "unproject" them again:
db.my_collection.aggregate([
{
$project: {
name: 1,
params: 1,
device: "$params.device", // add these three for your match stage
gender: "$params.gender",
age: "$params.age"
}
},
{
$match: input_params
},
{
$project: {name: 1, params: 1} // back to original
}
]);
But I assume you want to do this for any number of fields inside your params object.
Option 2
Do you have a way to manipulate the input object coming in? If so, you could preface all fields with "params.":
let input_params =
{
device: "windows",
gender: "m"
};
let new_input_params =
{
"params.device": "windows",
"params.gender": "m"
};
Then your query would just be:
db.my_collection.find(new_input_params);
Option 3
If you have no way of modifying the input, you can use the $replaceRoot aggregation operator (as of version 3.4, credit to this answer) to flatten your params into the root document. Since this will replace the root document with the embedded one, you need to extract the fields you are interested in first, at least the _id field:
db.my_collection.aggregate([
{
$addFields: {"params.id": "$_id"} // save _id inside the params doc,
// as you will lose the original one
},
{
$replaceRoot: {newRoot: "$params"}
},
{
$match: input_params
},
...
]);
This will match the documents and retain the _id field which you can use to get the rest of the document again, through a $lookup for example:
...
{
$lookup: {
from: "my_collection",
localField: "id",
foreignField: "_id",
as: "doc"
}
},
...
This will get your documents in the following format:
{
"device" : "windows",
"gender" : "m",
"age" : 28,
"id" : ObjectId("XXX"),
"doc" : [
{
"_id" : ObjectId("XXX"),
"name" : "Test",
"params" : {
"device" : "windows",
"gender" : "m",
"age" : 28
}
}
]
}
To go full circle and get your original document format back, you can:
...
{
$unwind: "$doc" // get rid of the [] around the "doc" object
// ($lookup always results in array)
},
{
$replaceRoot: {newRoot: "$doc"} // get your "doc" back to the root
}
...
As I write this, I cannot believe there is no cleaner way of doing this using MongoDB alone, but I cannot think of any.
I hope this helps!

You can use $or operator for partial comparison of the fields from params object. You can use like this:
db.collection.find({
$or: [
{
"params.device": "windows"
},
{
"params.gender": "m"
}
]
})

Using projection
A document may have many fields, but not all of these fields may be necessary and important when requested. In this case, you can only include the required fields in the sample by using the projection.
By default, queries in MongoDB return all fields in matching
documents. To limit the amount of data that MongoDB sends to
applications, you can include a projection document to specify or
restrict fields to return.
For example, you previously added the following documents to the database:
> db.users.insertOne({"name": "Jennifer", "gender": "female", languages: ["english", "spanish"]})
> db.users.insertOne({"name": "Alex", "gender": "male", languages: ["english", "french"]})
> db.users.insertOne({"name": "Maria", "gender": "female", languages: ["english", "german"]})
And now you want to display information that match the field gender and has value female, so you can specify it via query:
db.users.find({gender: "female"}, {name: 1})
The operation returns the following documents:
{ "name" : "Jennifer", "gender" : "female" }
{ "name" : "Maria", "gender" : "female" }
The <value> can be any of the following:
{ field1: <value>, field2: <value> ... }
1 or true to include the field in the return documents.
0 or false to exclude the field.
Expression using a Projection Operators.
Moreover, if you don't want to concretize the selection, but want to display all the documents, then we can leave the first brackets empty:
db.users.find({}, {name: 1, _id: 0})

Related

How to update a document with a reference to its previous state?

Is it possible to reference the root document during an update operation such that a document like this:
{"name":"foo","value":1}
can be updated with new values and have the full (previous) document pushed into a new field (creating an update history):
{"name":"bar","value":2,"previous":[{"name:"foo","value":1}]}
And so on..
{"name":"baz","value":3,"previous":[{"name:"foo","value":1},{"name:"bar","value":2}]}
I figure I'll have to use the new aggregate set operator in Mongo 4.2, but how can I achieve this?
Ideally I don't want to have to reference each field explicitly. I'd prefer to push the root document (minus the _id and previous fields) without knowing what the other fields are.
In addition to the new $set operator, what makes your use case really easier with Mongo 4.2 is the fact that db.collection.update() now accepts an aggregation pipeline, finally allowing the update of a field based on its current value:
// { name: "foo", value: 1 }
db.collection.update(
{},
[{ $set: {
previous: {
$ifNull: [
{ $concatArrays: [ "$previous", [{ name: "$name", value: "$value" }] ] },
[ { name: "$name", value: "$value" } ]
]
},
name: "bar",
value: 2
}}],
{ multi: true }
)
// { name: "bar", value: 2, previous: [{ name: "foo", value: 1 }] }
// and if applied again:
// { name: "baz", value: 3, previous: [{ name: "foo", value: 1 }, { name: "bar", value: 2 } ] }
The first part {} is the match query, filtering which documents to update (in our case probably all documents).
The second part [{ $set: { previous: { $ifNull: [ ... } ] is the update aggregation pipeline (note the squared brackets signifying the use of an aggregation pipeline):
$set is a new aggregation operator and an alias of $addFields. It's used to add/replace a new field (in our case "previous") with values from the current document.
Using an $ifNull check, we can determine whether "previous" already exists in the document or not (this is not the case for the first update).
If "previous" doesn't exist (is null), then we have to create it and set it with an array of one element: the current document: [ { name: "$name", value: "$value" } ].
If "previous" already exist, then we concatenate ($concatArrays) the existing array with the current document.
Don't forget { multi: true }, otherwise only the first matching document will be updated.
As you mentioned "root" in your question and if your schema is not the same for all documents (if you can't tell which fields should be used and pushed in the "previous" array), then you can use the $$ROOT variable which represents the current document and filter out the "previous" array. In this case, replace both { name: "$name", value: "$value" } from the previous query with:
{ $arrayToObject: { $filter: {
input: { $objectToArray: "$$ROOT" },
as: "root",
cond: { $ne: [ "$$root.k", "previous" ] }
}}}
Imho, you are making your life indefinitely more complex for no reason with such complicated data models.
Think of what you really want to achieve. You want to correlate different values in one or more interconnected series which are written to the collection consecutively.
Storing this in one document comes with some strings attached. While it seems to be reasonable in the beginning, let me name a few:
How do you get the most current document if you do not know it's value for name?
How do you deal with very large series, which make the document hit the 16MB limit?
What is the benefit of the added complexity?
Simplify first
So, let's assume you have only one series for a moment. It gets as simple as
[{
"_id":"foo",
"ts": ISODate("2019-07-03T17:40:00.000Z"),
"value":1
},{
"_id":"bar",
"ts": ISODate("2019-07-03T17:45:00.000"),
"value":2
},{
"_id":"baz",
"ts": ISODate("2019-07-03T17:50:00.000"),
"value":3
}]
Assuming the name is unique, we can use it as _id, potentially saving an index.
You can actually get the semantic equivalent by simply doing a
> db.seriesa.find().sort({ts:-1})
{ "_id" : "baz", "ts" : ISODate("2019-07-03T17:50:00Z"), "value" : 3 }
{ "_id" : "bar", "ts" : ISODate("2019-07-03T17:45:00Z"), "value" : 2 }
{ "_id" : "foo", "ts" : ISODate("2019-07-03T17:40:00Z"), "value" : 1 }
Say you only want to have the two latest values, you can use limit():
> db.seriesa.find().sort({ts:-1}).limit(2)
{ "_id" : "baz", "ts" : ISODate("2019-07-03T17:50:00Z"), "value" : 3 }
{ "_id" : "bar", "ts" : ISODate("2019-07-03T17:45:00Z"), "value" : 2 }
Should you really need to have the older values in a queue-ish array
db.seriesa.aggregate([{
$group: {
_id: "queue",
name: {
$last: "$_id"
},
value: {
$last: "$value"
},
previous: {
$push: {
name: "$_id",
value: "$value"
}
}
}
}, {
$project: {
name: 1,
value: 1,
previous: {
$slice: ["$previous", {
$subtract: [{
$size: "$previous"
}, 1]
}]
}
}
}])
Nail it
Now, let us say you have more than one series of data. Basically, there are two ways of dealing with it: put different series in different collections or put all the series in one collection and make a distinction by a field, which for obvious reasons should be indexed.
So, when to use what? It boils down wether you want to do aggregations over all series (maybe later down the road) or not. If you do, you should put all series into one collection. Of course, we have to slightly modify our data model:
[{
"name":"foo",
"series": "a"
"ts": ISODate("2019-07-03T17:40:00.000Z"),
"value":1
},{
"name":"bar",
"series": "a"
"ts": ISODate("2019-07-03T17:45:00.000"),
"value":2
},{
"name":"baz",
"series": "a"
"ts": ISODate("2019-07-03T17:50:00.000"),
"value":3
},{
"name":"foo",
"series": "b"
"ts": ISODate("2019-07-03T17:40:00.000Z"),
"value":1
},{
"name":"bar",
"series": "b"
"ts": ISODate("2019-07-03T17:45:00.000"),
"value":2
},{
"name":"baz",
"series": "b"
"ts": ISODate("2019-07-03T17:50:00.000"),
"value":3
}]
Note that for demonstration purposes, I fell back for the default ObjectId value for _id.
Next, we create an index over series and ts, as we are going to need it for our query:
> db.series.ensureIndex({series:1,ts:-1})
And now our simple query looks like this
> db.series.find({"series":"b"},{_id:0}).sort({ts:-1})
{ "name" : "baz", "series" : "b", "ts" : ISODate("2019-07-03T17:50:00Z"), "value" : 3 }
{ "name" : "bar", "series" : "b", "ts" : ISODate("2019-07-03T17:45:00Z"), "value" : 2 }
{ "name" : "foo", "series" : "b", "ts" : ISODate("2019-07-03T17:40:00Z"), "value" : 1 }
In order to generate the queue-ish like document, we need to add a match state
> db.series.aggregate([{
$match: {
"series": "b"
}
},
// other stages omitted for brevity
])
Note that the index we created earlier will be utilized here.
Or, we can generate a document like this for every series by simply using series as the _id in the $group stage and replace _id with name where appropriate
db.series.aggregate([{
$group: {
_id: "$series",
name: {
$last: "$name"
},
value: {
$last: "$value"
},
previous: {
$push: {
name: "$name",
value: "$value"
}
}
}
}, {
$project: {
name: 1,
value: 1,
previous: {
$slice: ["$previous", {
$subtract: [{
$size: "$previous"
}, 1]
}]
}
}
}])
Conclusion
Stop Being Clever when it comes to data models in MongoDB. Most of the problems with data models I saw in the wild and the vast majority I see on SO come from the fact that someone tried to be Smart (by premature optimization) ™.
Unless we are talking of ginormous series (which can not be, since you settled for a 16MB limit in your approach), the data model and queries above are highly efficient without adding unneeded complexity.
addMultipleData: (req, res, next) => {
let name = req.body.name ? req.body.name : res.json({ message: "Please enter Name" });
let value = req.body.value ? req.body.value : res.json({ message: "Please Enter Value" });
if (!req.body.name || !req.body.value) { return; }
//Step 1
models.dynamic.findOne({}, function (findError, findResponse) {
if (findResponse == null) {
let insertedValue = {
name: name,
value: value
}
//Step 2
models.dynamic.create(insertedValue, function (error, response) {
res.json({
message: "succesfully inserted"
})
})
}
else {
let pushedValue = {
name: findResponse.name,
value: findResponse.value
}
let updateWith = {
$set: { name: name, value: value },
$push: { previous: pushedValue }
}
let options = { upsert: true }
//Step 3
models.dynamic.updateOne({}, updateWith, options, function (error, updatedResponse) {
if (updatedResponse.nModified == 1) {
res.json({
message: "succesfully inserted"
})
}
})
}
})
}
//This is the schema
var multipleAddSchema = mongoose.Schema({
"name":String,
"value":Number,
"previous":[]
})

Aggregate on array of embedded documents

I have a mongodb collection with multiple documents. Each document has an array with multiple subdocuments (or embedded documents i guess?). Each of these subdocuments is in this format:
{
"name": string,
"count": integer
}
Now I want to aggregate these subdocuments to find
The top X counts and their name.
Same as 1. but the names have to match a regex before sorting and limiting.
I have tried the following for 1. already - it does return me the top X but unordered, so I'd have to order them again which seems somewhat inefficient.
[{
$match: {
_id: id
}
}, {
$unwind: {
path: "$array"
}
}, {
$sort: {
'count': -1
}
}, {
$limit: x
}]
Since i'm rather new to mongodb this is pretty confusing for me. Happy for any help. Thanks in advance.
The sort has to include the array name in order to avoid an additional sort later on.
Given the following document to work with:
{
students: [{
count: 4,
name: "Ann"
}, {
count: 7,
name: "Brad"
}, {
count: 6,
name: "Beth"
}, {
count: 8,
name: "Catherine"
}]
}
As an example, the following aggregation query will match any name containing the letters "h" and "e". This needs to happen after the "$unwind" step in order to only keep the ones you need.
db.tests.aggregate([
{$match: {
_id: ObjectId("5c1b191b251d9663f4e3ce65")
}},
{$unwind: {
path: "$students"
}},
{$match: {
"students.name": /[he]/
}},
{$sort: {
"students.count": -1
}},
{$limit: 2}
])
This is the output given the above mentioned input:
{ "_id" : ObjectId("5c1b191b251d9663f4e3ce65"), "students" : { "count" : 8, "name" : "Catherine" } }
{ "_id" : ObjectId("5c1b191b251d9663f4e3ce65"), "students" : { "count" : 6, "name" : "Beth" } }
Both names contain the letters "h" and "e", and the output is sorted from high to low.
When setting the limit to 1, the output is limited to:
{ "_id" : ObjectId("5c1b191b251d9663f4e3ce65"), "students" : { "count" : 8, "name" : "Catherine" } }
In this case only the highest count has been kept after having matched the names.
=====================
Edit for the extra question:
Yes, the first $match can be changed to filter on specific universities.
{$match: {
university: "University X"
}},
That will give one or more matching documents (in case you have a document per year or so) and the rest of the aggregation steps would still be valid.
The following match would retrieve the students for the given university for a given academic year in case that would be needed.
{$match: {
university: "University X",
academic_year: "2018-2019"
}},
That should narrow it down to get the correct documents.

How to get a distinct sub document list by excluding one named field?

I'm looking for a pre 3.4 solution (i.e. no replaceRoot and project by exclusion option) for how to get a distinct list of sub document values, by excluding one named field.
I have something like:-
{
"field1":"sadasdasdasdas",
"field2":"whatever bla dasdasda",
"source_display_tags" : {
"name" : "Hot Toast Cafe",
"updateId" : NumberLong(12345),
"address" : "2 Burnt Street",
"postal_code" : "HT1 8RT",
"town" : "Breadville"
}
}
And I wish to get a distinct list of source_display_tags by excluding the updateId field, i.e.
{
"source_display_tags" : {
"name" : "Hot Toast Cafe",
"address" : "2 Burnt Street",
"postal_code" : "HT1 8RT",
"town" : "Breadville"}
},
{
"source_display_tags" : {
"name" : "Banana Grove Restaurant",
etc...
in some way which does not involve naming any of the other fields. i.e. I don't want to do this:-
db.getCollection('updates').aggregate([
{$unwind:"$source_display_tags"},
{"$project": {_id:0, name: "$source_display_tags.name", address: "$source_display_tags.address", town: "$source_display_tags.town", postal_code: "$source_display_tags.postal_code"}},
{ $group: { _id: { name: "$name", address: "$address", town: "$town", postal_code:"$postal_code"}}}
])
Is there any way to produce this output by only naming the "updateId" field?
Add another $project to your existing query and that will do the magic for us, In the $project pipeline have a tag named source_display_tags and pass the content of _id (from the previous pipeline result) to this tag and suppress the _id,
here is the query
db.getCollection('updates').aggregate([
{$unwind:"$source_display_tags"},
{"$project": { _id:0,
name: "$source_display_tags.name",
address: "$source_display_tags.address",
town: "$source_display_tags.town",
postal_code: "$source_display_tags.postal_code"
}},
{ "$group": {
_id: {
name: "$name",
address: "$address",
town: "$town",
postal_code:"$postal_code"
}}
},
{"$project":{source_display_tags: "$_id", "_id":0}}
])
If there is no need of grouping we can simply use the below find query
db.so.find( {},
{
"source_display_tags.name":1,
"source_display_tags.address":1,
"source_display.tags.postal_code":1,
"source_display_tags.town":1,
_id:0
}
);

(mongo) How could i get the documents that have a value in array along with size

I have a mongo collection with something like the below:
{
"_id" : ObjectId("59e013e83260c739f029ee21"),
"createdAt" : ISODate("2017-10-13T01:16:24.653+0000"),
"updatedAt" : ISODate("2017-11-11T17:13:52.956+0000"),
"age" : NumberInt(34),
"attributes" : [
{
"year" : "2017",
"contest" : [
{
"name" : "Category1",
"division" : "Department1"
},
{
"name" : "Category2",
"division" : "Department1"
}
]
},
{
"year" : "2016",
"contest" : [
{
"name" : "Category2",
"division" : "Department1"
}
]
},
{
"year" : "2015",
"contest" : [
{
"name" : "Category1",
"division" : "Department1"
}
]
}
],
"name" : {
"id" : NumberInt(9850214),
"first" : "john",
"last" : "afham"
}
}
now how could i get the number of documents who have contest with name category1 more than one time or more than 2 times ... and so on
I tried to use size and $gt but couldn't form a correct result
Assuming that a single contest will never contain the same name (e.g. "Category1") value more than once, here is what you can do.
The absence of any unwinds will result in improved performance in particular on big collections or data sets with loads of entries in your attributes arrays.
db.collection.aggregate({
$project: {
"numberOfOccurrences": {
$size: { // count the number of matching contest elements
$filter: { // get rid of all contest entries that do not contain at least one entry with name "Category1"
input: "$attributes",
cond: { $in: [ "Category1", "$$this.contest.name" ] }
}
}
}
}
}, {
$match: { // filter the number of documents
"numberOfOccurrences": {
$gt: 1 // put your desired min. number of matching contest entries here
}
}
}, {
$count: "numberOfDocuments" // count the number of matching documents
})
Try this on for size.
db.foo.aggregate([
// Start with breaking down attributes:
{$unwind: "$attributes"}
// Next, extract only name = Category1 from the contest array. This will yield
// an array of 0 or 1 because I am assuming that contest names WITHIN
// the contest array are unique. If found and we get an array of 1, turn that
// into a single doc instead of an array of a single doc by taking arrayElemAt 0.
// Otherwise, "x" is not set into the doc AT ALL. All other vars in the doc
// will go away after $project; if you want to keep them, change this to
// $addFields:
,{$project: {x: {$arrayElemAt: [ {$filter: {
input: "$attributes.contest",
as: "z",
cond: {$eq: [ "$$z.name", "Category1" ]}
}}, 0 ]}
}}
// We split up attributes before, creating multiple docs with the same _id. We
// must now "recombine" these _id (OP said he wants # of docs with name).
// We now have to capture all the single "x" that we created above; docs without
// Category1 will have NO "x" and we don't want to include them in the count.
// Also, we KNOW that name can only be Category 1 but division could vary, so
// let's capture that in the $push in case we might want it:
,{$group: {_id: "$_id", x: {$push: "$x.division"}}}
// One more pass to compute length of array:
,{$addFields: {len: {$size: "$x"}} }
// And lastly, the filter for one time or two times or n times:
,{$match: {len: {$gt: 2} }}
]);
First, we need to flatten the document by the attributes and contest fields. Then to group by the document initial _id and a contest names counting different contests along the way. Finally, we filter the result.
db.person.aggregate([
{ $unwind: "$attributes" },
{ $unwind: "$attributes.contest" },
{$group: {
_id: {initial_id: "$_id", contest: "$attributes.contest.name"},
count: {$sum: 1}
}
},
{$match: {$and: [{"_id.contest": "Category1"}, {"count": {$gt: 1}}]}}]);

Retrieve embedded document in MongoDB

I have a collection like with elements like this:
{
_id: 585b...,
data: [
{
name: "John",
age: 30
},
{
name: "Jane",
age: 31
}
]
}
I know how to find the document that contains John:
db.People.find({"data.name", "John"})
But then I get the entire document. How can I get just the embedded document. So I want to return this:
{
name: "John",
age: 30
}
For context: this is part of a larger dataset and I need to check if certain updates are made to this specific document. Due to the way the application is implemented, the embedded document won't always be at the same index.
So how can I query and return an embedded document?
Use a second parameter to suppress the ID
db.people.find({"data.name", "John"}, {_id : 0})
This will output
data: [
{
name: "John",
age: 30
},
{
name: "Jane",
age: 31
}
]
To get just the embedded documents, use aggregation.
db.test.aggregate([
{
$unwind : "$data"
},
{
$match : {"data.name" : "John"}
},
{
$project : {
_id : 0,
name : "$data.name",
age : "$data.age"
}
}
])