I have a single collection named assets that contains documents in 2+ formats, ParentObject and ChildObject. I am currently associating ParentObject to ChildObject with two queries. Can this be done with an aggregate query?
ParentObject
{
"_id" : {
"oid" : "ParentFooABC",
"brand" : "acme"
},
"type": "com.ParentClass",
"title": "Title1234",
"availableDate": Date,
"expirationDate": Date
}
ChildObject
{
"_id" : {
"oid" : "ChildFoo",
"brand" : "acme"
},
"type": "com.ChildClass",
"parentObject": "ParentFooABC",
"title": "Title1234",
"modelNumber": "8HE56",
"modelLine": "Metro",
"availableDate": Date,
"expirationDate": Date,
"IDRequired": true
}
Currently I filter data like this
val parent = db.asset.find(MongoDBObject("_id.brand": MongoDBObject($eq: "acme")),MongoDBObject("type":"com.ParentClass"))
val children = db.asset.find(MongoDBObject("_id.brand": MongoDBObject($eq: "acme")),MongoDBObject("type":"com.ChildClass"), MongoDBObject("parentObject": "${parent._id.oid}"))
if(childs.nonEmpty) {
//I have confirmed this parent has a child associated and should be returned
val childModelNumbers = childs.map(child -> child.modelNumber)
val response = ResponseObject(parent, childModelNumbers)
}
Can I do this in an aggregate query?
Updated:
Mongo Version: db version v2.6.11
Language: Scala
Driver: Casbah 2.8.1
Technically yes, however what you're doing now is standard practice with mongodb. If you need to join collections of data frequently you probably should use an RDBMS. However if you occasionally need to aggregate data from 2 separate collection then there is $lookup. In general you'll find yourself populating data into your document from another document pretty frequently, but that's typically how a document database works. Using $lookup when you're really looking for an equivalent to a SQL JOIN isn't something I would recommend. It's not really the intended purpose, and you'd be better off doing exactly what you're doing now or moving to an RDBMS.
Addition:
Since the data is in the same collection you can use an aggregation to $group that data together. You just have to group them based on _id.brand.
A quick example(in mongodb shell):
db.asset.aggregate([
{ //start with a match to narrow down
$match: {"_id.brand": "acme"}
},
{ //group by `_id.brand`
$group: {
_id: "$_id.brand",
"parentObject": {$first: "$parentObject"},
title: {$first: "$title"},
modelNumer: {$addToSet: "$modelNumer"}, //accumulate child model # as array
modelLine: {$addToSet: "$modelLine"}, //accumulate child model line as array
availableDate: {$first: "$availableDate"},
expirationDate: {$first: "$expirationDate"}
}
}
]);
This should return documents where all the children's properties have been grouped into the parent document as an array(modelNumber,modelLine). It's probably better to do what you're doing above, it might even be better yet if you put the children into their own collection instead of keeping track of their type with a type field in the document. This way you know that the documents in each collection represent a given data structure. However, if you were to do that you would not be able to perform the aggregation in the example.
Related
I have multiple documents(3 documents in this example) in one collection that looks like this:
{
_id:123,
bizs:[{_id:'',name:'a'},{_id:'',name:'b'}]
},
{
_id:456,
bizs:[{_id:'',name:'e'},{_id:'',name:'f'}]
}
{
_id:789,
bizs:[{_id:'',name:'x'},{_id:'',name:'y'}]
}
Now, I want to update the bizs subdocument by matching with my array of ids.
That is to say, my array filter for update query is [123,789], which will match against the _id fields of each document.
I have tried using findByIdAndUpdate() but that doesn't allow an array for the update query
How can I update the 2 matching documents (like my example above) without having to put findByIdAndUpdate inside a forloop to match the array element with the _id?
You can not use findByIdAndUpdate when updating multiple documents, findByIdAndUpdate is from mongoose which is a wrapper to native MongoDB's findOneAndUpdate. When you pass a single string as a filter to findByIdAndUpdate like : Collection.findByIdAndUpdate({'5e179dac627ef7823643cd97'}, {}) - then mongoose will internally convert string to ObjectId() & form it as a filter like :_id : ObjectId('5e179dac627ef7823643cd97') to execute findOneAndUpdate. So it means you can only update one document at a time, So if you've multiple documents to be updated use update with option {multi : true} or updateMany.
Assume if you wanted to push a new object to bizs, this is how query looks like :
collection.updateMany({ _id: { $in: [123, 456] } }, {
$push: {
bizs: {
"_id": "",
"name": "new"
}
}
})
Note : Update operations doesn't return the documents in response rather they will return write result which has information about n docs matched & n docs modified.
I have no experience with NoSQL. So, I think, if I just try to ask about the code, my question can be incorrect. Instead, let me explain my problem.
Suppose I have e-store. I have catalogs
Catalogs = new Mongo.Collection('catalogs);
and products in that catalogs
Products = new Mongo.Collection('products');
Then, people add there orders to temporary collection
Order = new Mongo.Collection();
Then, people submit their comments, phone, etc and order. I save it to collection Operations:
Operations.insert({
phone: "phone",
comment: "comment",
etc: "etc"
savedOrder: Order //<- Array, right? Or Object will be better?
});
Nice, but when i want to get stats by every product, in what Operations product have used. How can I search thru my Operations and find every operation with that product?
Or this way is bad? How real pro's made this in real world?
If I understand it well, here is a sample document as stored in your Operation collection:
{
clientRef: "john-001",
phone: "12345678",
other: "etc.",
savedOrder: {
"someMetadataAboutOrder": "...",
"lines" : [
{ qty: 1, itemRef: "XYZ001", unitPriceInCts: 1050, desc: "USB Pen Drive 8G" },
{ qty: 1, itemRef: "ABC002", unitPriceInCts: 19995, desc: "Entry level motherboard" },
]
}
},
{
clientRef: "paul-002",
phone: null,
other: "etc.",
savedOrder: {
"someMetadataAboutOrder": "...",
"lines" : [
{ qty: 3, itemRef: "XYZ001", unitPriceInCts: 950, desc: "USB Pen Drive 8G" },
]
}
},
Given that, to find all operations having item reference XYZ001 you simply have to query:
> db.operations.find({"savedOrder.lines.itemRef":"XYZ001"})
This will return the whole document. If instead you are only interested in the client reference (and operation _id), you will use a projection as an extra argument to find:
> db.operations.find({"savedOrder.lines.itemRef":"XYZ001"}, {"clientRef": 1})
{ "_id" : ObjectId("556f07b5d5f2fb3f94b8c179"), "clientRef" : "john-001" }
{ "_id" : ObjectId("556f07b5d5f2fb3f94b8c17a"), "clientRef" : "paul-002" }
If you need to perform multi-documents (incl. multi-embedded documents) operations, you should take a look at the aggregation framework:
For example, to calculate the total of an order:
> db.operations.aggregate([
{$match: { "_id" : ObjectId("556f07b5d5f2fb3f94b8c179") }},
{$unwind: "$savedOrder.lines" },
{$group: { _id: "$_id",
total: {$sum: {$multiply: ["$savedOrder.lines.qty",
"$savedOrder.lines.unitPriceInCts"]}}
}}
])
{ "_id" : ObjectId("556f07b5d5f2fb3f94b8c179"), "total" : 21045 }
I'm an eternal newbie, but since no answer is posted, I'll give it a try.
First, start by installing robomongo or a similar software, it will allow you to have a look at your collections directly in mongoDB (btw, the default port is 3001)
The way I deal with your kind of problem is by using the _id field. It is a field automatically generated by mongoDB, and you can safely use it as an ID for any item in your collections.
Your catalog collection should have a string array field called product where you find all your products collection items _id. Same thing for the operations: if an order is an array of products _id, you can do the same and store this array of products _id in your savedOrder field. Feel free to add more fields in savedOrder if necessary, e.g. you make an array of objects products with additional fields such as discount.
Concerning your queries code, I assume you will find all you need on the web as soon as you figure out what your structure is.
For example, if you have a product array in your savedorder array, you can pull it out like that:
Operations.find({_id: "your operation ID"},{"savedOrder.products":1)
Basically, you ask for all the products _id in a specific operation. If you have several savedOrders in only one operation, you can specify too the savedOrder _id, if you used the one you had in your local collection.
Operations.find({_id: "your_operation_ID", "savedOrder._id": "your_savedOrder_ID"},{"savedOrder.products":1)
ps: to bad-ass coders here, if I'm doing it wrong, please tell me.
I find an answer :) Of course, this is not a reveal for real professionals, but is a big step for me. Maybe my experience someone find useful. All magic in using correct mongo operators. Let solve this problem in pseudocode.
We have a structure like this:
Operations:
1. Operation: {
_id: <- Mongo create this unique for us
phone: "phone1",
comment: "comment1",
savedOrder: [
{
_id: <- and again
productId: <- whe should save our product ID from 'products'
name: "Banana",
quantity: 100
},
{
_id:,
productId: <- Another ID, that we should save if order
name: "apple",
quantity: 50
}
]
And if we want to know, in what Operation user take "banana", we should use mongoDB operator"elemMatch" in Mongo docs
db.getCollection('operations').find({}, {savedOrder: {$elemMatch:{productId: "f5mhs8c2pLnNNiC5v"}}});
In simple, we get documents our saved order have products with id that we want to find. I don't know is it the best way, but it works for me :) Thank you!
I have a schema with subdocs.
// Schema
var company = {
_id: ObjectId,
publish: Boolean,
divisions: {
employees: [ObjectId]
}
};
I need to find all the subdocs (divisions) that match my query. It appears that I have to use 2 matches - one to filter out initial docs and a second one to filter out the matching subdocs from the resulting $unwind operation. Is there a more efficient way?
// Query
this.aggregate({
$match: {
'publish': 1,
'divisions.employees': new ObjectId(userid)
}
}, {
$unwind: '$divisions'
}, {
$match: {
'divisions.employees': new ObjectId(userid)
}
}
I found this ticket but I am unsure this does what I need.
Doing both matches is the right thing here. You could eliminate the first match stage and just unwind, but having an initial $match allows you to narrow down the pipeline to exactly those documents that will produce at least one output document (i.e. those documents for which publish : true and some employees ObjectId matches the given ObjectId). You will be able to use indexes, like an index on { publish : 1, divisions.employees : 1 }, to perform the first match stage quickly.
However, you should ask yourself why you are using the aggregation pipeline here and if it is appropriate. Will you commonly be querying for a given employee that's part of a company with publish : 1? Is this one of the main queries for your use case? If it's infrequent or not critical then the aggregation is a good way to do it. Otherwise, you should reconsider the schema. You could make this query easy with a schema like
{
"_id" : ObjectId,
"publish" : Boolean,
"company" : (unique identifier, possibly a String or ObjectId)
}
that models employees as documents and denormalizes company information into the employee document. Then your query is as easy as
db.employees.find({ "_id" : ObjectId(userid), "publish" : true })
which will be much quicker than the aggregation (not that the aggregation will be slow; this is just relatively quicker). I'm not telling you to do it this way - use your own knowledge of your use case to make the right call.
I have different types of data that would be difficult to model and scale with a relational database (e.g., a product type)
I'm interested in using Mongodb to solve this problem.
I am referencing the documentation at mongodb's website:
http://docs.mongodb.org/manual/tutorial/model-referenced-one-to-many-relationships-between-documents/
For the data type that I am storing, I need to also maintain a relational list of id's where this particular product is available (e.g., store location id's).
In their example regarding "one-to-many relationships with embedded documents", they have the following:
{
name: "O'Reilly Media",
founded: 1980,
location: "CA",
books: [12346789, 234567890, ...]
}
I am currently importing the data with a spreadsheet, and want to use a batchInsert.
To avoid duplicates, I assume that:
1) I need to do an ensure index on the ID, and ignore errors on the insert?
2) Do I then need to loop through all the ID's to insert a new related ID to the books?
Your question could possibly be defined a little better, but let's consider the case that you have rows in a spreadsheet or other source that are all de-normalized in some way. So in a JSON representation the rows would be something like this:
{
"publisher": "O'Reilly Media",
"founded": 1980,
"location": "CA",
"book": 12346789
},
{
"publisher": "O'Reilly Media",
"founded": 1980,
"location": "CA",
"book": 234567890
}
So in order to get those sort of row results into the structure you wanted, one way to do this would be using the "upsert" functionality of the .update() method:
So assuming you have some way of looping the input values and they are identified with some structure then an analog to this would be something like:
books.forEach(function(book) {
db.publishers.update(
{
"name": book.publisher
},
{
"$setOnInsert": {
"founded": book.founded,
"location": book.location,
},
"$addToSet": { "books": book.book }
},
{ "upsert": true }
);
})
This essentially simplified the code so that MongoDB is doing all of the data collection work for you. So where the "name" of the publisher is considered to be unique, what the statement does is first search for a document in the collection that matches the query condition given, as the "name".
In the case where that document is not found, then a new document is inserted. So either the database or driver will take care of creating the new _id value for this document and your "condition" is also automatically inserted to the new document since it was an implied value that should exist.
The usage of the $setOnInsert operator is to say that those fields will only be set when a new document is created. The final part uses $addToSet in order to "push" the book values that have not already been found into the "books" array (or set).
The reason for the separation is for when a document is actually found to exist with the specified "publisher" name. In this case, all of the fields under the $setOnInsert will be ignored as they should already be in the document. So only the $addToSet operation is processed and sent to the server in order to add the new entry to the "books" array (set) and where it does not already exist.
So that would be simplified logic compared to aggregating the new records in code before sending a new insert operation. However it is not very "batch" like as you are still performing some operation to the server for each row.
This is fixed in MongoDB version 2.6 and above as there is now the ability to do "batch" updates. So with a similar analog:
var batch = [];
books.forEach(function(book) {
batch.push({
"q": { "name": book.publisher },
"u": {
"$setOnInsert": {
"founded": book.founded,
"location": book.location,
},
"$addToSet": { "books": book.book }
},
"upsert": true
});
if ( ( batch.length % 500 ) == 0 ) {
db.runCommand( "update", "updates": batch );
batch = [];
}
});
db.runCommand( "update", "updates": batch );
So what is doing in setting up all of the constructed update statements into a single call to the server with a sensible size of operations sent in the batch, in this case once every 500 items processed. The actual limit is the BSON document maximum of 16MB so this can be altered appropriate to your data.
If your MongoDB version is lower than 2.6 then you either use the first form or do something similar to the second form using the existing batch insert functionality. But if you choose to insert then you need to do all the pre-aggregation work within your code.
All of the methods are of course supported with the PHP driver, so it is just a matter of adapting this to your actual code and which course you want to take.
I have a mongo collection 'books'. Here's a typical book:
BOOK
name: 'Test Book'
author: 'Joe Bloggs'
print_runs: [
{publisher: 'OUP', year: 1981},
{publisher: 'Penguin', year: 1987},
{publisher: 'Harper-Collins', year: 1992}
]
I'd like to be able to filter books to return only books whose last print run was after a given date, and/or before a given date...and I've been struggling to find a feasible query. Any suggestions appreciated.
There are a few options, as getting access to the "last" element in the array and only filtering on that is difficult/impossible with the normal find options in MongoDB queries. (Unfortunately, you can't $slice with find).
Store the most recent published publisher and year in the print_runs array and in a special (denormalized/copy) of the data directly on the book object. Book.last_published_by and Book.last_published_date for example. Queries would be simple and super fast.
MapReduce. This would be simple enough to emit the last element in the array and then "reduce" it to just that. You'd need to do incremental updates on the MapReduce to keep it accurate.
Write a relatively complex aggregation framework expression
The aggregation might look like:
db.so.aggregate({ $project :
{ _id: 1, "print_run_year" : "$print_runs.year" }},
{ $unwind: "$print_run_year" },
{ $group : { _id : "$_id", "newest" : { $max : "$print_run_year" }}},
{ $match : { "newest" : { $gt : 1991, $lt: 2000 } }
})
As it may require a bit of explanation:
It projects and unwinds the year of the print runs for each book.
Then, group on the _id (of the book, and create a new computed field called, newest which contains the highest print run year (from the projection).
Then, filter on newest using a $gt and $lt
I'd suggest option #1 above would be the best from an efficiency perspective, followed by the MapReduce, and then a distant third, option #3.