MongoDB aggregate framework not keeping what was added with $addFields - mongodb

Field is added but then disappears.
Here is the code from within the mongo shell:
> db.users.aggregate([{$addFields:{totalAge:{$sum:"$age"}}}])
{ "_id" : ObjectId("5acb81b53306361018814849"), "name" : "A", "age" : 1, "totalAge" : 1 }
{ "_id" : ObjectId("5acb81b5330636101881484a"), "name" : "B", "age" : 2, "totalAge" : 2 }
{ "_id" : ObjectId("5acb81b5330636101881484b"), "name" : "C", "age" : 3, "totalAge" : 3 }
> db.users.find().pretty()
{ "_id" : ObjectId("5acb81b53306361018814849"), "name" : "A", "age" : 1 }
{ "_id" : ObjectId("5acb81b5330636101881484a"), "name" : "B", "age" : 2 }
{ "_id" : ObjectId("5acb81b5330636101881484b"), "name" : "C", "age" : 3 }

Aggregation only reads data from your collection; it does not edit the collection too. The best way to think about aggregation is that you read some data and manipulate it for your immediate usage.
If you want change it in main source then you must use the update method.
Or an easier way (Not best but easy)
db.users.aggregate([{$addFields:{totalAge:{$sum:"$age"}}}]).forEach(function (x){
db.users.save(x)
})

Nozar's answer was correct, but .save() is now deprecated.
Instead of using his/her exact answer, modify it by using .updateOne and $set.
Old/deprecated answer:
db.users
.aggregate([{$addFields:{totalAge:{$sum:"$age"}}}])
.forEach(function (x){db.users.save(x)})
New/working answer:
db.users
.aggregate([{$addFields:{totalAge:{$sum:"$age"}}}])
.forEach(function (x){db.users.updateOne({name: x.name}, {$set: {totalAge: x.totalAge}})})
Note: In my example, I'm using 'name' to filter (essentially match, in this case) the documents in 'users' collection, but you can use any unique field (e.g. _id field).
Shoutout to Nozar for leading me to the updated answer I provided, as I just used it in my project! (In my project, I used a $match pipeline stage prior to the $addFields pipeline stage, as I was just looking to do this to a single document in my collection, rather than all documents)

The reason is that, your both approach totally different.In aggregation, you have use $addFields in that query you will get a totalAge. But according to your find query, you can get specific data which you have stored in a database.Here you did not calculate totalAge.
I hope you can understand it.

The aggregation pipeline doesn't alter the original data; what it does is to take a temporary in-memory copy of the data and perform a sequence of manipulations on it (still in-memory) and send it to the client.
It's similar to the way you can do db.collection.find().sort() ; the sorting there only changes what is being returned to the client, it doesn't change what is stored in the database.
The only exception is when you use the $out stage, which saves the result of the aggregation to another collection. You can see that, because they had to add a special type of stage to do this, that a normal aggregation does not write back to the stored data.

Related

Add object to object array if an object property is not given yet

Use Case
I've got a collection band_profiles and I've got a collection band_profiles_history. The history collection is supposed to store a band_profile snapshot every 24 hour and therefore I am using MongoDB's recommended format for historical tracking: Each month+year is it's own document and in an object array I will store the bandProfile snapshot along with the current day of the month.
My models:
A document in band_profiles_history looks like this:
{
"_id" : ObjectId("599e3bc406955db4cbffe0a8"),
"month" : 7,
"tag_lowercased" : "9yq88gg",
"year" : 2017,
"values" : [
{
"_id" : ObjectId("599e3bc41c073a7418fead91"),
"profile" : {
"_id" : ObjectId("5989a65d0f39d9fd70cde1fe"),
"tag" : "9YQ88GG",
"name_normalized" : "example name1",
},
"day" : 1
},
{
"_id" : ObjectId("599e3bc41c073a7418fead91"),
"profile" : {
"_id" : ObjectId("5989a65d0f39d9fd70cde1fe"),
"tag" : "9YQ88GG",
"name_normalized" : "new name",
},
"day" : 2
}
]
}
And a document in band_profiles:
{
"_id" : ObjectId("5989a6190f39d9fd70cddeb1"),
"tag" : "9V9LRGU",
"name_normalized" : "example name",
"tag_lowercased" : "9v9lrgu",
}
This is how I upsert my documents into band_profiles_history at the moment:
BandProfileHistory.update(
{ tag_lowercased: tag, year, month},
{ $push: {
values: { day, profile }
}
},
{ upsert: true }
)
My problem:
I only want to insert ONE snapshot for every day. Right now it would always push a new object into the object array values no matter if I already have an object for that day or not. How can I achieve that it would only push that object if there is no object for the current day yet?
Putting mongoose aside for a moment:
There is an operation addToSet that will add an element to an array if it doesn't already exists.
Caveat:
If the value is a document, MongoDB determines that the document is a duplicate if an existing document in the array matches the to-be-added document exactly; i.e. the existing document has the exact same fields and values and the fields are in the same order. As such, field order matters and you cannot specify that MongoDB compare only a subset of the fields in the document to determine whether the document is a duplicate of an existing array element.
Since you are trying to add an entire document you are subjected to this restriction.
So I see the following solutions for you:
Solution 1:
Read in the array, see if it contains the element you want and if not push it to the values array with push.
This has the disadvantage of NOT being an atomic operation meaning that you could end up would duplicates anyways. This could be acceptable if you ran a periodical clean up job to remove duplicates from this field on each document.
It's up to you to decide if this is acceptable.
Solution 2:
Assuming you are putting the field _id in the subdocuments of your values field, stop doing it. Assuming mongoose is doing this for you (because it does, from what I understand) stop it from doing it like it says here: Stop mongoose from creating _id for subdocument in arrays.
Next you need to ensure that the fields in the document always have the same order, because order matters when comparing documents in the addToSet operation as stated in the citation above.
Solution 3
Change the schema of your band_profiles_history to something like:
{
"_id" : ObjectId("599e3bc406955db4cbffe0a8"),
"month" : 7,
"tag_lowercased" : "9yq88gg",
"year" : 2017,
"values" : {
"1": { "_id" : ObjectId("599e3bc41c073a7418fead91"),
"profile" : {
"_id" : ObjectId("5989a65d0f39d9fd70cde1fe"),
"tag" : "9YQ88GG",
"name_normalized" : "example name1"
}
},
"2": {
"_id" : ObjectId("599e3bc41c073a7418fead91"),
"profile" : {
"_id" : ObjectId("5989a65d0f39d9fd70cde1fe"),
"tag" : "9YQ88GG",
"name_normalized" : "new name"
}
}
}
Notice that the day field became the key for the subdocuments on the values. Notice also that values is now an Object instead of an Array.
No you can run an update query that would update values.<day> only if values.<day> didn't exist.
Personally I don't like this as it is using the fact that JSON doesn't allow duplicate keys to support the schema.
First of all, sadly mongodb does not support uniqueness of a field in an array of a collection. You can see there is major bug opened for 7 years and not closed yet(that is a shame in my opinion).
What you can do from here is limited and all is on application level. I had same problem and solve it in application level. Do something like this:
First read your document with document _id and values.day.
If your reading in step 1 returns null, that means there is no record on values array for given day, so you can push the new value(I assume band_profile_history has record with _id value).
If your reading in step 1 returns a document, that means values array has a record for given day. In that case you can use setoperation with $operator.
Like others said, they will be not atomic but while you are dealing with your problem in application level, you can make whole bunch of code synchronized. There will be 2 queries to run on mongodb among of 3 queries. Like below:
db.getCollection('band_profiles_history').find({"_id": "1", "values.day": 3})
if returns null:
db.getCollection('band_profiles_history').update({"_id": "1"}, {$push: {"values": {<your new band profile history for given day>}}})
if returns not null:
db.getCollection('band_profiles_history').update({"_id": "1", "values.day": 3}, {$set: {"values.$": {<your new band profile history for given day>}}})
To check if object is empty
{ field: {$exists: false} }
or if it is an array
{ field: {$eq: []} }
Mongoose also supports field: {type: Date} so you can use it instead counting a days, and do updates only for current date.

Using $last on Mongo Aggregation Pipeline

I searched for similar questions but couldn't find any. Feel free to point me in their direction.
Say I have this data:
{ "_id" : ObjectId("5694c9eed4c65e923780f28e"), "name" : "foo1", "attr" : "foo" }
{ "_id" : ObjectId("5694ca3ad4c65e923780f290"), "name" : "foo2", "attr" : "foo" }
{ "_id" : ObjectId("5694ca47d4c65e923780f294"), "name" : "bar1", "attr" : "bar" }
{ "_id" : ObjectId("5694ca53d4c65e923780f296"), "name" : "bar2", "attr" : "bar" }
If I want to get the latest record for each attribute group, I can do this:
> db.content.aggregate({$group: {_id: '$attr', name: {$last: '$name'}}})
{ "_id" : "bar", "name" : "bar2" }
{ "_id" : "foo", "name" : "foo2" }
I would like to have my data grouped by attr and then sorted by _id so that only the latest record remains in each group, and that's how I can achieve this. BUT I need a way to avoid naming all the fields that I want in the result (in this example "name") because in my real use case they are not known ahead.
So, is there a way to achieve this, but without having to explicitly name each field using $last and just taking all fields instead? Of course, I would sort my data prior to grouping and I just need to somehow tell Mongo "take all values from the latest one".
See some possible options here:
Do multiple find().sort() queries for each of the attr values you
want to search.
Grab the original _id of the $last doc, then do a findOne() for each of those values (this is the more extensible option).
Use the $$ROOT system variable as shown here.
This wouldn't be the quickest operation, but I assume you're using this more for analytics, not in response to a user behavior.
Edited to add slouc's example posted in comments:
db.content.aggregate({$group: {_id: '$attr', lastItem: { $last: "$$ROOT" }}}).

I have big database on mongodb and can't find and use my info

This my code:
db.test.find() {
"_id" : ObjectId("4d3ed089fb60ab534684b7e9"),
"title" : "Sir",
"name" : {
"_id" : ObjectId("4d3ed089fb60ab534684b7ff"),
"first_name" : "Farid"
},
"addresses" : [
{
"city" : "Baku",
"country" : "Azerbaijan"
},{
"city" : "Susha",
"country" : "Azerbaijan"
},{
"city" : "Istanbul",
"country" : "Turkey"
}
]
}
I want get output only all city. Or I want get output only all country. How can i do it?
I'm not 100% about your code example, because if your 'find' by ID there's no need to search by anything else... but I wonder whether the following can help:
db.test.insert({name:'farid', addresses:[
{"city":"Baku", "country":"Azerbaijan"},
{"city":"Susha", "country":"Azerbaijan"},
{"city" : "Istanbul","country" : "Turkey"}
]});
db.test.insert({name:'elena', addresses:[
{"city" : "Ankara","country" : "Turkey"},
{"city":"Baku", "country":"Azerbaijan"}
]});
Then the following will show all countries:
db.test.aggregate(
{$unwind: "$addresses"},
{$group: {_id:"$country", countries:{$addToSet:"$addresses.country"}}}
);
result will be
{ "result" : [
{ "_id" : null,
"countries" : [ "Turkey", "Azerbaijan"]
}
],
"ok" : 1
}
Maybe there are other ways, but that's one I know.
With 'cities' you might want to take more care (because I know cities with the same name in different countries...).
Based on your question, there may be two underlying issues here:
First, it looks like you are trying to query a Collection called "test". Often times, "test" is the name of an actual database you are using. My concern, then, is that you are trying to query the database "test" to find any collections that have the key "city" or "country" on any of the internal documents. If this is the case, what you actually need to do is identify all of the collections in your database, and search them individually to see if any of these collections contain documents that include the keys you are looking for.
(For more information on how the db.collection.find() method works, check the MongoDB documentation here: http://docs.mongodb.org/manual/reference/method/db.collection.find/#db.collection.find)
Second, if this is actually what you are trying to do, all you need to for each collection is define a query that only returns the key of the document you are looking for. If you get more than 0 results from the query, you know documents have the "city" key. If they don't return results, you can ignore these collections. One caveat here is if data about "city" is in embedded documents within a collection. If this is the case, you may actually need to have some idea of which embedded documents may contain the key you are looking for.

matching fields internally in mongodb

I am having following document in mongodb
{
"_id" : ObjectId("517b88decd483543a8bdd95b"),
"studentId" : 23,
"students" : [
{
"id" : 23,
"class" : "a"
},
{
"id" : 55,
"class" : "b"
}
]
}
{
"_id" : ObjectId("517b9d05254e385a07fc4e71"),
"studentId" : 55,
"students" : [
{
"id" : 33,
"class" : "c"
}
]
}
Note: Not an actual data but schema is exactly same.
Requirement: Finding the document which matches the studentId and students.id(id inside the students array using single query.
I have tried the code like below
db.data.aggregate({$match:{"students.id":"$studentId"}},{$group:{_id:"$student"}});
Result: Empty Array, If i replace {"students.id":"$studentId"} to {"students.id":33} it is returning the second document in the above shown json.
Is it possible to get the documents for this scenario using single query?
If possible, I'd suggest that you set the condition while storing the data so that you can do a quick truth check (isInStudentsList). It would be super fast to do that type of query.
Otherwise, there is a relatively complex way of using the Aggregation framework pipeline to do what you want in a single query:
db.students.aggregate(
{$project:
{studentId: 1, studentIdComp: "$students.id"}},
{$unwind: "$studentIdComp"},
{$project : { studentId : 1,
isStudentEqual: { $eq : [ "$studentId", "$studentIdComp" ] }}},
{$match: {isStudentEqual: true}})
Given your input example the output would be:
{
"result" : [
{
"_id" : ObjectId("517b88decd483543a8bdd95b"),
"studentId" : 23,
"isStudentEqual" : true
}
],
"ok" : 1
}
A brief explanation of the steps:
Build a projection of the document with just studentId and a new field with an array containing just the id (so the first document it would contain [23, 55].
Using that structure, $unwind. That creates a new temporary document for each array element in the studentIdComp array.
Now, take those documents, and create a new document projection, which continues to have the studentId and adds a new field called isStudentEqual that compares the equality of two fields, the studentId and studentIdComp. Remember that at this point there is a single temporary document that contains those two fields.
Finally, check that the comparison value isStudentEqual is true and return those documents (which will contain the original document _id and the studentId.
If the student was in the list multiple times, you might need to group the results on studentId or _id to prevent duplicates (but I don't know that you'd need that).
Unfortunately it's impossible ;(
to solve this problem it is necessary to use a $where statement
(example: Finding embeded document in mongodb?),
but $where is restricted from being used with aggregation framework
db.data.find({students: {$elemMatch: {id: 23}} , studentId: 23});

Using Spring Data Mongodb, is it possible to get the max value of a field without pulling and iterating over an entire collection?

Using mongoTemplate.find(), I specify a Query with which I can call .limit() or .sort():
.limit() returns a Query object
.sort() returns a Sort object
Given this, I can say Query().limit(int).sort(), but this does not perform the desired operation, it merely sorts a limited result set.
I cannot call Query().sort().limit(int) either since .sort() returns a Sort()
So using Spring Data, how do I perform the following as shown in the mongoDB shell? Maybe there's a way to pass a raw query that I haven't found yet?
I would be ok with extending the Paging interface if need be...just doesn't seem to help any. Thanks!
> j = { order: 1 }
{ "order" : 1 }
> k = { order: 2 }
{ "order" : 2 }
> l = { order: 3 }
{ "order" : 3 }
> db.test.save(j)
> db.test.save(k)
> db.test.save(l)
> db.test.find()
{ "_id" : ObjectId("4f74d35b6f54e1f1c5850f19"), "order" : 1 }
{ "_id" : ObjectId("4f74d3606f54e1f1c5850f1a"), "order" : 2 }
{ "_id" : ObjectId("4f74d3666f54e1f1c5850f1b"), "order" : 3 }
> db.test.find().sort({ order : -1 }).limit(1)
{ "_id" : ObjectId("4f74d3666f54e1f1c5850f1b"), "order" : 3 }
You can do this in sping-data-mongodb. Mongo will optimize sort/limit combinations IF the sort field is indexed (or the #Id field). This produces very fast O(logN) or better results. Otherwise it is still O(N) as opposed to O(N*logN) because it will use a top-k algorithm and avoid the global sort (mongodb sort doc). This is from Mkyong's example but I do the sort first and set the limit to one second.
Query query = new Query();
query.with(new Sort(Sort.Direction.DESC, "idField"));
query.limit(1);
MyObject maxObject = mongoTemplate.findOne(query, MyObject.class);
Normally, things that are done with aggregate SQL queries, can be approached in (at least) three ways in NoSQL stores:
with Map/Reduce. This is effectively going through all the records, but more optimized (works with multiple threads, and in clusters). Here's the map/reduce tutorial for MongoDB.
pre-calculate the max value on each insert, and store it separately. So, whenever you insert a record, you compare it to the previous max value, and if it's greater - update the max value in the db.
fetch everything in memory and do the calculation in the code. That's the most trivial solution. It would probably work well for small data sets.
Choosing one over the other depends on your usage of this max value. If it is performed rarely, for example for some corner reporting, you can go with the map/reduce. If it is used often, then store the current max.
As far as I am aware Mongo totally supports sort then limit: see http://www.mongodb.org/display/DOCS/Sorting+and+Natural+Order
Get the max/min via map reduce is going to be very slow and should be avoided at all costs.
I don't know anything about Spring Data, but I can recommend Morphia to help with queries. Otherwise a basic way with the Java driver would be:
DBCollection coll = db.getCollection("...");
DBCursor curr = coll.find(new BasicDBObject()).sort(new BasicDBObject("order", -1))
.limit(1);
if (cur.hasNext())
System.out.println(cur.next());
Use aggregation $max .
As $max is an accumulator operator available only in the $group stage, you need to do a trick.
In the group operator use any constant as _id .
Lets take the example given in Mongodb site only --
Consider a sales collection with the following documents:
{ "_id" : 1, "item" : "abc", "price" : 10, "quantity" : 2, "date" : ISODate("2014-01-01T08:00:00Z") }
{ "_id" : 2, "item" : "jkl", "price" : 20, "quantity" : 1, "date" : ISODate("2014-02-03T09:00:00Z") }
{ "_id" : 3, "item" : "xyz", "price" : 5, "quantity" : 5, "date" : ISODate("2014-02-03T09:05:00Z") }
{ "_id" : 4, "item" : "abc", "price" : 10, "quantity" : 10, "date" : ISODate("2014-02-15T08:00:00Z") }
{ "_id" : 5, "item" : "xyz", "price" : 5, "quantity" : 10, "date" : ISODate("2014-02-15T09:05:00Z") }
If you want to find out the max price among all the items.
db.sales.aggregate(
[
{
$group:
{
_id: "1", //** This is the trick
maxPrice: { $max: "$price" }
}
}
]
)
Please note that the value of "_id" - it is "1". You can put any constant...
Since the first answer is correct but the code is obsolete, I'm replying with a similar solution that worked for me:
Query query = new Query();
query.with(Sort.by(Sort.Direction.DESC, "field"));
query.limit(1);
Entity maxEntity = mongoTemplate.findOne(query, Entity.class);