MongoDB / Morphia - Projection not working on recursive objects? - mongodb

I have a test object which works as nodes on a tree, containing 0 or more children instances of the same type. I'm persisting it on MongoDB and querying it with Morphia.
I perform the following query:
db.TestObject.find( {}, { _id: 1, childrenTestObjects: 1 } ).limit(6).sort( {_id: 1 } ).pretty();
Which results in:
{ "_id" : NumberLong(1) }
{ "_id" : NumberLong(2) }
{ "_id" : NumberLong(3) }
{ "_id" : NumberLong(4) }
{
"_id" : NumberLong(5),
"childrenTestObjects" : [
{
"stringValue" : "6eb887126d24e8f1cd8ad5033482c781",
"creationDate" : ISODate("1997-05-24T00:00:00Z")
"childrenTestObjects" : [
{
"stringValue" : "2ab8f86410b4f3bdcc747699295eb5a4",
"creationDate" : ISODate("2024-10-10T00:00:00Z"),
"_id" : NumberLong(7)
}
],
"_id" : NumberLong(6)
}
]
}
That's awesome, but also a little surprising. I'm having two issues with the results:
1) When I do a projection, it only applies to the top elements. The children elements still return other properties not in the projection (stringValue and creationDate). I'd like the field selection to apply to all documents and sub documents of the same type. This tree has an undermined number of sub items, so I can't specify that in the query explicitly. How to accomplish that?
2) To my surprise, limit applied to sub documents! You see that there was one embedded document with id 6. I was expecting to see 6 top level documents with N sub documents, but instead got just 5. How to tell MongoDB to return 6 top level elements, regardless of what is embedded in them? Without that having a consistent pagination system is impossible.
All your help has made learning MongoDB way faster and I really appreciate it! Thanks!

As for 1), projections retain fields in the results. In this case that field is childrenTestObjects which happens to be a document. So mongo returns that entire field which is, of course, the entire subdocument. Projections are not recursive so you'd have to specify each field explicitly.
As for 2), that doesn't sound right. it would help to see the query results without the projections added (full documents in each return document) and we can take it from there.

Related

query to retrieve multiple objects in an array in mongodb

Suppose I have a array of objects as below.
"array" : [
{
"id" : 1
},
{
"id" : 2
},
{
"id" : 2
},
{
"id" : 4
}
]
If I want to retrieve multiple objects ({id : 2}) from this array, the aggregation query goes like this.
db.coll.aggregate([{ $match : {"_id" : ObjectId("5492690f72ae469b0e37b61c")}}, { $unwind : "$array"}, { $match : { "array.id" : 2}}, { $group : { _id : "$_id", array : { $push : { id : "$array.id"}}}} ])
The output of above aggregation is
{
"_id" : ObjectId("5492690f72ae469b0e37b61c"),
"array" : [
{
"id" : 2
},
{
"id" : 2
}
]
}
Now the question is:
1) Is retrieving of multiple objects from an array possible using find() in MongoDB?
2) With respect to performance, is aggregation is the correct way to do? (Because we need to use four pipeline operators) ?
3) Can we use Java manipulation (looping the array and only keep {id : 2} objects) to do this after
find({"_id" : ObjectId("5492690f72ae469b0e37b61c")}) query? Because find will once retrieve the document and keeps it in RAM. But if we use aggregation four operations need to be performed in RAM to get the output.
Why I asked the 3) question is: Suppose if thousands of clients accessing at the same time, then RAM memory will be overloaded. If it is done using Java, less task on RAM.
4) For how long the workingSet will be in RAM??
Is my understanding correct???
Please correct me if I am wrong.
Please suggest me to have right insight on this..
No. You project the first matching one with $, you project all of them, or you project none of them.
No-ish. If you have to work with this array, aggregation is what will allow you to extract multiple matching elements, but the correct solution, conceptually and for performance, is to design your document structure so this problem does not arise, or arises only for rare queries whose performance is not particularly important.
Yes.
We have no information that would allow us to give a reasonable answer to this question. This is also out of scope relative to the rest of the question and should be a separate question.

Understanding overlapping mongodb projections

I'm struggling to understand how overlapping projections work in mongodb.
Here's a quick example to illustrating my conundrum (results from the in-browser mongodb console).
First, I created and inserted a simple document:
var doc = {
id:"universe",
systems: {
1: {
name:"milky_way",
coords:{x:1, y:1}
}
}
}
db.test.insert(doc);
// doc succesfully inserted
Next, I try a somewhat odd projection:
db.test.find({}, {"systems.1.coords":1, "systems.1":1});
//result
"_id" : ObjectId("537fd3541cdcaf1ba735becb"),
"systems" : {
"1" : {
"coords" : {
"y" : 1,
"x" : 1
}
}
}
}
I expected to see the entirety of "system 1," including the name field. But it appears the deeper path to "systems.1.coords" overwrote the shallower path to just "system.1"?
I decide to test this "deeper path overrides shallower path" theory:
db.test.find({}, {"systems.1.coords.x":1, "systems.1.coords":1});
//result
"_id" : ObjectId("537fd3541cdcaf1ba735becb"),
"systems" : {
"1" : {
"coords" : {
"y" : 1, // how'd this get here, but for the shallower projection?
"x" : 1
}
}
}
}
Here, my deeper projection didn't override the shallower one.
What gives? How is mongodb dealing with overlapping projections? I can't find the logic to it.
EDIT:
My confusion was stemming from what counted as a "top level" path.
This worked like I expected: .findOne({}, {"systems.1":1, "systems":1}) (i.e., a full set of systems is returned, notwithstanding that I started with what appeared to be a "narrower" projection).
However, this did not work like I expected: .findOne({}, {"systems.1.name":1, "systems.1":1}) (i.e., only the name field of system.1 is returned).
In short, going more than "one dot" deep leads to the overwriting discussed in the accepted answer.
You cannot do this sort of projection using .find() as the general projections allowed are basic field selection. What you are talking about is document re-shaping and for that you can use the $project operator with the .aggregate() method.
So by your initial example:
db.test.aggregate([
{ "$project": {
"coords": "$systems.1.coords",
"systems": 1
}}
])
That will give you output like this:
{
"_id" : ObjectId("537fe2127cb762d14e2a1007"),
"systems" : {
"1" : {
"name" : "milky_way",
"coords" : {
"x" : 1,
"y" : 1
}
}
},
"coords" : {
"x" : 1,
"y" : 1
}
}
Note the different field naming there as well, as if for no other reason, the version coming from .find() would result in overlapping paths ("systems" is the same) for the levels of fields you were trying to select and therefore cannot be projected as two fields the way you can do right here.
In much the same way, consider a statement like the following:
db.test.aggregate([
{ "$project": {
"systems": {
"1": {
"coords": "$systems.1.coords"
}
},
"systems": 1
}}
])
So that is not telling you it's invalid, it's just that one of the results in the projection is overwriting the other as essentially at a top level they are both callled "systems".
This is basically what you end up with when trying to do something like this with the projection method available to .find(). So the essential part is you need a different field name, and this is what the aggregation framework ( though not aggregating here ) allows you to do.

matching fields internally in mongodb

I am having following document in mongodb
{
"_id" : ObjectId("517b88decd483543a8bdd95b"),
"studentId" : 23,
"students" : [
{
"id" : 23,
"class" : "a"
},
{
"id" : 55,
"class" : "b"
}
]
}
{
"_id" : ObjectId("517b9d05254e385a07fc4e71"),
"studentId" : 55,
"students" : [
{
"id" : 33,
"class" : "c"
}
]
}
Note: Not an actual data but schema is exactly same.
Requirement: Finding the document which matches the studentId and students.id(id inside the students array using single query.
I have tried the code like below
db.data.aggregate({$match:{"students.id":"$studentId"}},{$group:{_id:"$student"}});
Result: Empty Array, If i replace {"students.id":"$studentId"} to {"students.id":33} it is returning the second document in the above shown json.
Is it possible to get the documents for this scenario using single query?
If possible, I'd suggest that you set the condition while storing the data so that you can do a quick truth check (isInStudentsList). It would be super fast to do that type of query.
Otherwise, there is a relatively complex way of using the Aggregation framework pipeline to do what you want in a single query:
db.students.aggregate(
{$project:
{studentId: 1, studentIdComp: "$students.id"}},
{$unwind: "$studentIdComp"},
{$project : { studentId : 1,
isStudentEqual: { $eq : [ "$studentId", "$studentIdComp" ] }}},
{$match: {isStudentEqual: true}})
Given your input example the output would be:
{
"result" : [
{
"_id" : ObjectId("517b88decd483543a8bdd95b"),
"studentId" : 23,
"isStudentEqual" : true
}
],
"ok" : 1
}
A brief explanation of the steps:
Build a projection of the document with just studentId and a new field with an array containing just the id (so the first document it would contain [23, 55].
Using that structure, $unwind. That creates a new temporary document for each array element in the studentIdComp array.
Now, take those documents, and create a new document projection, which continues to have the studentId and adds a new field called isStudentEqual that compares the equality of two fields, the studentId and studentIdComp. Remember that at this point there is a single temporary document that contains those two fields.
Finally, check that the comparison value isStudentEqual is true and return those documents (which will contain the original document _id and the studentId.
If the student was in the list multiple times, you might need to group the results on studentId or _id to prevent duplicates (but I don't know that you'd need that).
Unfortunately it's impossible ;(
to solve this problem it is necessary to use a $where statement
(example: Finding embeded document in mongodb?),
but $where is restricted from being used with aggregation framework
db.data.find({students: {$elemMatch: {id: 23}} , studentId: 23});

what is the real purpose of $ref (DBRef) in MongoDb

I want to use mongo for my app, and while I was thinking about designing issues, I came up with question, so what are the advantages/purposes of DBRef?
for example:
> names = ['apple', 'banana', 'orange', 'peach', 'pineapple']
[ "apple", "banana", "orange", "peach", "pineapple" ]
> for (i=0; i<5; i++) {
... db.fruits.insert({_id:i, name:names[i]})
... }
> db.fruits.find()
{ "_id" : 0, "name" : "apple" }
{ "_id" : 1, "name" : "banana" }
{ "_id" : 2, "name" : "orange" }
{ "_id" : 3, "name" : "peach" }
{ "_id" : 4, "name" : "pineapple" }
and I want to store those fruits in a basket collection:
> db.basket.insert({_id:1, items:[ {$ref:'fruits', $id:1}, {$ref:'fruits', $id:3} ] })
> db.basket.insert({_id:2, items:[{fruit_id: 1}, {fruit_id: 3}]})
> db.basket.find()
{ "_id" : 1, "items" : [ DBRef("fruits", 1), DBRef("fruits", 3) ] }
{ "_id" : 2, "items" : [ { "fruit_id" : 1 }, { "fruit_id" : 3 } ] }
What are the real difference between those two techniques? For me it looks like using DBRef you just have to insert more data without any advantages.... Please correct me if I'm wrong.
Basically a DBRef is a self describing ObjectID which a client side helper, which exists in all drivers (I think all), provides the ability within your application to get related rows easily.
They are not:
JOINs
Cascadeable relations
Server-side relations
Resolved Server-side
They also are not used within Map Reduce, the functionality was taken out due to complications with sharding.
It is not always great to use these though, for one they take quite a bit of space if you know the collection that is related to that row in comparison to just storing the ObjectID. Not only that but due to how they are resolved each related record needs to be lazy loaded one by one instead if being able to form a range (easily) to query for related rows all in one go, so they can increase the amount of queries you make to the database as well, in turn increasing cursors.
From "MongoDB: The Definitive Guide" DBRefs aren't necessary and storing a MongoID is more lightweight, but DBRefs offer some interesting functionality like the following:
Loading each DBRef in a document:
var note = db.notes.findOne({"_id":20});
note.references.forEach(function(ref) {
printjson(db[ref.$ref].findOne({"_id": ref.$id}));
});
They're also helpful if the references are stored across different collections and databases as the DBRef contains that info. If you use a MongoID you'd have to remember which DB and collection the MongoID is in reference to.
In your example a basket document's items array might contain references in the fruits collection, but also the vegetables collect. A DBRef would actually be handy in this case.

Using Spring Data Mongodb, is it possible to get the max value of a field without pulling and iterating over an entire collection?

Using mongoTemplate.find(), I specify a Query with which I can call .limit() or .sort():
.limit() returns a Query object
.sort() returns a Sort object
Given this, I can say Query().limit(int).sort(), but this does not perform the desired operation, it merely sorts a limited result set.
I cannot call Query().sort().limit(int) either since .sort() returns a Sort()
So using Spring Data, how do I perform the following as shown in the mongoDB shell? Maybe there's a way to pass a raw query that I haven't found yet?
I would be ok with extending the Paging interface if need be...just doesn't seem to help any. Thanks!
> j = { order: 1 }
{ "order" : 1 }
> k = { order: 2 }
{ "order" : 2 }
> l = { order: 3 }
{ "order" : 3 }
> db.test.save(j)
> db.test.save(k)
> db.test.save(l)
> db.test.find()
{ "_id" : ObjectId("4f74d35b6f54e1f1c5850f19"), "order" : 1 }
{ "_id" : ObjectId("4f74d3606f54e1f1c5850f1a"), "order" : 2 }
{ "_id" : ObjectId("4f74d3666f54e1f1c5850f1b"), "order" : 3 }
> db.test.find().sort({ order : -1 }).limit(1)
{ "_id" : ObjectId("4f74d3666f54e1f1c5850f1b"), "order" : 3 }
You can do this in sping-data-mongodb. Mongo will optimize sort/limit combinations IF the sort field is indexed (or the #Id field). This produces very fast O(logN) or better results. Otherwise it is still O(N) as opposed to O(N*logN) because it will use a top-k algorithm and avoid the global sort (mongodb sort doc). This is from Mkyong's example but I do the sort first and set the limit to one second.
Query query = new Query();
query.with(new Sort(Sort.Direction.DESC, "idField"));
query.limit(1);
MyObject maxObject = mongoTemplate.findOne(query, MyObject.class);
Normally, things that are done with aggregate SQL queries, can be approached in (at least) three ways in NoSQL stores:
with Map/Reduce. This is effectively going through all the records, but more optimized (works with multiple threads, and in clusters). Here's the map/reduce tutorial for MongoDB.
pre-calculate the max value on each insert, and store it separately. So, whenever you insert a record, you compare it to the previous max value, and if it's greater - update the max value in the db.
fetch everything in memory and do the calculation in the code. That's the most trivial solution. It would probably work well for small data sets.
Choosing one over the other depends on your usage of this max value. If it is performed rarely, for example for some corner reporting, you can go with the map/reduce. If it is used often, then store the current max.
As far as I am aware Mongo totally supports sort then limit: see http://www.mongodb.org/display/DOCS/Sorting+and+Natural+Order
Get the max/min via map reduce is going to be very slow and should be avoided at all costs.
I don't know anything about Spring Data, but I can recommend Morphia to help with queries. Otherwise a basic way with the Java driver would be:
DBCollection coll = db.getCollection("...");
DBCursor curr = coll.find(new BasicDBObject()).sort(new BasicDBObject("order", -1))
.limit(1);
if (cur.hasNext())
System.out.println(cur.next());
Use aggregation $max .
As $max is an accumulator operator available only in the $group stage, you need to do a trick.
In the group operator use any constant as _id .
Lets take the example given in Mongodb site only --
Consider a sales collection with the following documents:
{ "_id" : 1, "item" : "abc", "price" : 10, "quantity" : 2, "date" : ISODate("2014-01-01T08:00:00Z") }
{ "_id" : 2, "item" : "jkl", "price" : 20, "quantity" : 1, "date" : ISODate("2014-02-03T09:00:00Z") }
{ "_id" : 3, "item" : "xyz", "price" : 5, "quantity" : 5, "date" : ISODate("2014-02-03T09:05:00Z") }
{ "_id" : 4, "item" : "abc", "price" : 10, "quantity" : 10, "date" : ISODate("2014-02-15T08:00:00Z") }
{ "_id" : 5, "item" : "xyz", "price" : 5, "quantity" : 10, "date" : ISODate("2014-02-15T09:05:00Z") }
If you want to find out the max price among all the items.
db.sales.aggregate(
[
{
$group:
{
_id: "1", //** This is the trick
maxPrice: { $max: "$price" }
}
}
]
)
Please note that the value of "_id" - it is "1". You can put any constant...
Since the first answer is correct but the code is obsolete, I'm replying with a similar solution that worked for me:
Query query = new Query();
query.with(Sort.by(Sort.Direction.DESC, "field"));
query.limit(1);
Entity maxEntity = mongoTemplate.findOne(query, Entity.class);