duplicate a set of objects instances that match the specific field - mongodb

I am a newbie in mongodb, my second day of playing with this, and I would like to get some help for my question below!
I would like to duplicate or copy a set of objects instances that match the specific field, and then copy them and save them with new ObjectIds.
Example
db.getCollection('food').find( { food_type: "fruit" } )
The query above with show me 8 documents with 8 fields each: food_type, name, description, color, origin, import_price, export_price, margin.
I have 8 types of fruits, and the query
totalFoodTypes = db.getCollection('food').count( { food_type: "fruit" } )
will return 8. I would like to copy all these 8 documents into another brand new 8 documents with new ObjectIds and food_type field value changed to Vegetable. Is there an easy way for doing that?
My current best thinking is to store the result from
db.getCollection('food').find( { food_type: "fruit" } )
and then loop totalFoodTypes documents and insert them one by one with Java.
Would be really appreciated if mongodb has some shortcut for doing this.

Related

Adding to a non-existent array in a collection with $addToSet

This should be very simple. I have two collections, one of which holds two types of data (name, age), and the other should simply add the age values to an array (with no duplicates).
I "start" my collections like usual:
People = new Mongo.Collection('people')
Ages = new Mongo.Collection('ages')
Right now I'm working with seed data, but the question could easily extend to when I actually want to dynamically add data to the array. I seed it like so:
Meteor.startup(function() {
if (People.find().count() === 0) {
[
{
name: 'John',
age: '24' //Yes, I want to store it as strings.
},
{ ... } //more data
]
.forEach(function(person) {
People.insert(person)
Ages.update({ $addToSet: {age: person.age}}) //Not working
})
}
})
That last part there is what's not working. I guess I figured $addToSet would fix things for me, since the docs say:
If the field is absent in the document to update, $addToSet creates
the array field with the specified value as its element.
Now I suppose I have to create the field first, but I'm not sure where or how. I have a strong, strong feeling that I'm overlooking something ridiculously simple here...
If I got it right, your db should look like that when filled
Persons (_id, name, age)
1, John, 24
2, Pete, 21
3, Michele, 27
4, Sandy, 21
Ages (_id, ageset)
?, [ 24, 21, 27 ]
Solution1: Just insert one record on a fix key and then only update this one.
Have a look at this MeteorPad
Solution2: Using a local Meteor.Collection which is synced by server an gets DISTINCT field values from package mrt:mongodb-aggregation.
Have a look at this MeteorPad
Solution3: Using a server side synced Mongo.Collection to hold the distinct ages list.
Have a look at this MeteorPad
Remark: Checkout log infos on server process. There are timeouts to add, change and remove a record for test and updates (5 sec, 10 sec, 15 sec)
Now right now, I see that you're defining your People collection, but I don't see you actually defining "person" or "Age" anywhere. Maybe thats just due to how you've formatted your answer.
Either way though, I'm not entirely sure you'd be getting anything to happen. As far as I know, you'll need to select the documents each time through the loop, as you want to update them.
This is how I'm doing something similar in an app I'm working on:
Meteor.users.update({ _id: Meteor.userId() }, { $addToSet: { 'profile.viewedRequests' : this._id }});
The key there being that I'm selecting an individual document, before attempting to update it.
Its either that, or you need to switch to People.update.

Mongo -Select parent document with maximum child documents count, Faster way?

I'm quite new to mongo, and trying to get work following query.and is working fine too, But it's taking a little bit more time. I think I'm doing something wrong.
There are many number of documents in a collection parent, near about 6000. Each document has certain number of childs (childs is an another collection with 40000 documents in it). parents & childs are associated with each other by an attribute in the document called parent_id. Please see the following code. Following code takes approximate 1 minute to execute the queries. I don't think mongo should take that much time.
function getChildMaxDocCount(){
var maxLen = 0;
var bigSizeParent = null;
db.parents.find().forEach(function (parent){
var currentcount = db.childs.count({parent_id:parent._id});
if(currcount > maxLen){
maxLen = currcount;
bigSizeParent = parent._id;
}
});
printjson({"maxLen":maxLen, "bigSizeParent":bigSizeParent });
}
Is there any feasible/optimal way to achieve this?
If I got you right, you want to have the parent with the most childs. This is easy to accomplish using the aggregation framework. When each child only can have one parent, the aggregation query would look like this
db.childs.aggregate(
{ $group: { _id:"$parent_id", children:{$sum:1} } },
{ $sort: { "children":-1 } },
{ $limit : 1 }
);
Which should return a document like:
{ _id:"SomeParentId", children:15}
If a child can have more than one parent, it heavily depends on the data modeling how the query would look like.
Have a look at the aggregation framework documentation for details.
Edit: Some explanation
The aggregation pipeline takes every document it is told do do so through a series of steps in a way that all documents are first processed through the first step and the resulting documents are put into the next step.
Step 1: Grouping
We group all documents into new documents (virtual ones, if you want) and tell mongod to increment the field children by one for each document which has the same parent_id. Since we are referring to a field of the current document, we need to add a $ sign.
Step 2: Sorting
Now that we have a bunch of documents which hold the parent_id and the number of children this parent has, we sort it by the children field in descending (-1) order.
Step3: Limiting
Since we are only interested in the parent_id which has the most children, we only let mongod return the first document after sorting.

How to store an ordered set of documents in MongoDB without using a capped collection

What's a good way to store a set of documents in MongoDB where order is important? I need to easily insert documents at an arbitrary position and possibly reorder them later.
I could assign each item an increasing number and sort by that, or I could sort by _id, but I don't know how I could then insert another document in between other documents. Say I want to insert something between an element with a sequence of 5 and an element with a sequence of 6?
My first guess would be to increment the sequence of all of the following elements so that there would be space for the new element using a query something like db.items.update({"sequence":{$gte:6}}, {$inc:{"sequence":1}}). My limited understanding of Database Administration tells me that a query like that would be slow and generally a bad idea, but I'm happy to be corrected.
I guess I could set the new element's sequence to 5.5, but I think that would get messy rather quickly. (Again, correct me if I'm wrong.)
I could use a capped collection, which has a guaranteed order, but then I'd run into issues if I needed to grow the collection. (Yet again, I might be wrong about that one too.)
I could have each document contain a reference to the next document, but that would require a query for each item in the list. (You'd get an item, push it onto the results array, and get another item based on the next field of the current item.) Aside from the obvious performance issues, I would also not be able to pass a sorted mongo cursor to my {#each} spacebars block expression and let it live update as the database changed. (I'm using the Meteor full-stack javascript framework.)
I know that everything has it's advantages and disadvantages, and I might just have to use one of the options listed above, but I'd like to know if there is a better way to do things.
Based on your requirement, one of the approaches could be to design your schema, in such a way that each document has the capability to hold more than one document and in itself act as a capped container.
{
"_id":Number,
"doc":Array
}
Each document in the collection will act as a capped container, and the documents will be stored as array in the doc field. The doc field being an array, will maintain the order of insertion.
You can limit the number of documents to n. So the _id field of each container document will be incremental by n, indicating the number of documents a container document can hold.
By doing these you avoid adding extra fields to the document, extra indices, unnecessary sorts.
Inserting the very first record
i.e when the collection is empty.
var record = {"name" : "first"};
db.col.insert({"_id":0,"doc":[record]});
Inserting subsequent records
Identify the last container document's _id, and the number of
documents it holds.
If the number of documents it holds is less than n, then update the
container document with the new document, else create a new container
document.
Say, that each container document can hold 5 documents at most,and we want to insert a new document.
var record = {"name" : "newlyAdded"};
// using aggregation, get the _id of the last inserted container, and the
// number of record it currently holds.
db.col.aggregate( [ {
$group : {
"_id" : null,
"max" : {
$max : "$_id"
},
"lastDocSize" : {
$last : "$doc"
}
}
}, {
$project : {
"currentMaxId" : "$max",
"capSize" : {
$size : "$lastDocSize"
},
"_id" : 0
}
// once obtained, check if you need to update the last container or
// create a new container and insert the document in it.
} ]).forEach( function(check) {
if (check.capSize < 5) {
print("updating");
// UPDATE
db.col.update( {
"_id" : check.currentMaxId
}, {
$push : {
"doc" : record
}
});
} else {
print("inserting");
//insert
db.col.insert( {
"_id" : check.currentMaxId + 5,
"doc" : [ record ]
});
}
})
Note that the aggregation, runs on the server side and is very efficient, also note that the aggregation would return you a document rather than a cursor in versions previous to 2.6. So you would need to modify the above code to just select from a single document rather than iterating a cursor.
Inserting a new document in between documents
Now, if you would like to insert a new document between documents 1 and 2, we know that the document should fall inside the container with _id=0 and should be placed in the second position in the doc array of that container.
so, we make use of the $each and $position operators for inserting into specific positions.
var record = {"name" : "insertInMiddle"};
db.col.update(
{
"_id" : 0
}, {
$push : {
"doc" : {
$each : [record],
$position : 1
}
}
}
);
Handling Over Flow
Now, we need to take care of documents overflowing in each container, say we insert a new document in between, in container with _id=0. If the container already has 5 documents, we need to move the last document to the next container and do so till all the containers hold documents within their capacity, if required at last we need to create a container to hold the overflowing documents.
This complex operation should be done on the server side. To handle this, we can create a script such as the one below and register it with mongodb.
db.system.js.save( {
"_id" : "handleOverFlow",
"value" : function handleOverFlow(id) {
var currDocArr = db.col.find( {
"_id" : id
})[0].doc;
print(currDocArr);
var count = currDocArr.length;
var nextColId = id + 5;
// check if the collection size has exceeded
if (count <= 5)
return;
else {
// need to take the last doc and push it to the next capped
// container's array
print("updating collection: " + id);
var record = currDocArr.splice(currDocArr.length - 1, 1);
// update the next collection
db.col.update( {
"_id" : nextColId
}, {
$push : {
"doc" : {
$each : record,
$position : 0
}
}
});
// remove from original collection
db.col.update( {
"_id" : id
}, {
"doc" : currDocArr
});
// check overflow for the subsequent containers, recursively.
handleOverFlow(nextColId);
}
}
So that after every insertion in between , we can invoke this function by passing the container id, handleOverFlow(containerId).
Fetching all the records in order
Just use the $unwind operator in the aggregate pipeline.
db.col.aggregate([{$unwind:"$doc"},{$project:{"_id":0,"doc":1}}]);
Re-Ordering Documents
You can store each document in a capped container with an "_id" field:
.."doc":[{"_id":0,","name":"xyz",...}..]..
Get hold of the "doc" array of the capped container of which you want
to reorder items.
var docArray = db.col.find({"_id":0})[0];
Update their ids so that after sorting the order of the item will change.
Sort the array based on their _ids.
docArray.sort( function(a, b) {
return a._id - b._id;
});
update the capped container back, with the new doc array.
But then again, everything boils down to which approach is feasible and suits your requirement best.
Coming to your questions:
What's a good way to store a set of documents in MongoDB where order is important?I need to easily insert documents at an arbitrary
position and possibly reorder them later.
Documents as Arrays.
Say I want to insert something between an element with a sequence of 5 and an element with a sequence of 6?
use the $each and $position operators in the db.collection.update() function as depicted in my answer.
My limited understanding of Database Administration tells me that a
query like that would be slow and generally a bad idea, but I'm happy
to be corrected.
Yes. It would impact the performance, unless the collection has very less data.
I could use a capped collection, which has a guaranteed order, but then I'd run into issues if I needed to grow the collection. (Yet
again, I might be wrong about that one too.)
Yes. With Capped Collections, you may lose data.
An _id field in MongoDB is a unique, indexed key similar to a primary key in relational databases. If there is an inherent order in your documents, ideally you should be able to associate a unique key to each document, with the key value reflecting the order. So while preparing your document for insertion, explicitly add an _id field as this key (if you do not, mongo creates it automatically with a BSON objectid).
As far as retrieving the results are concerned, MongoDB does not guarantee the order of return documents unless you explicitly use .sort() . If you do not use .sort(), the results are usually returned in natural order (order of insertion).Again, there is no guarantee on this behavior.
I'd advise you to override _id with your order while inserting, and use a sort while retrieving. Since _id is a necessary and auto-indexed entity, you will not be wasting any space defining a sort key, and storing the index for it.
For abitrary sorting of any collection, you'll need a field to sort it on. I call mine "sequence".
schema:
{
_id: ObjectID,
sequence: Number,
...
}
db.items.ensureIndex({sequence:1});
db.items.find().sort({sequence:1})
Here is a link to some general sorting database answers that may be relevant:
https://softwareengineering.stackexchange.com/questions/195308/storing-a-re-orderable-list-in-a-database/369754
I suggest going with Floating point solution - adding a position column:
Use a floating-point number for the position column.
You can then reorder the list changing only the position column in the "moved" row.
If your user wants to position "red" after "blue" but before "yellow" Then you just need to calculate
red.position = ((yellow.position - blue.position) / 2) + blue.position
After a few re-positions in the same place (Cuttin in half every time) - you might reach a wall - it's better that if you reach a certain threshold - to resort the list.
When retrieving it you can simply say col.sort() to get it sorted and no need for any client-side code (Like in the case of a Linked list solution)

Fetch Record from mongo db based on type and ancestry field

in mongodb records are store like this
{_id:100,type:"section",ancestry:nil,.....}
{_id:300,type:"section",ancestry:100,.....}
{_id:400,type:"problem",ancestry:100,.....}
{_id:500,type:"section",ancestry:100,.....}
{_id:600,type:"problem",ancestry:500,.....}
{_id:700,type:"section",ancestry:500,.....}
{_id:800,type:"problem",ancestry:100,.....}
i want to fetch records in order like this
first record whose ancestry is nil
then all record whose parent is first record we search and whose type is 'problem'
then all record whose parent is first record we search and whose type is 'section'
Expected output is
{_id:100,type:"section",ancestry:nil,.....}
{_id:400,type:"problem",ancestry:100,.....}
{_id:800,type:"problem",ancestry:100,.....}
{_id:300,type:"section",ancestry:100,.....}
{_id:500,type:"section",ancestry:100,.....}
{_id:600,type:"problem",ancestry:500,.....}
{_id:700,type:"section",ancestry:500,.....}
Try this MongoDB shell command:
db.collection.find().sort({ancestry:1, type: 1})
Different languages, where ordered dictionaries aren't available, may use a list of 2-tuples to the sort argument. Something like this (Python):
collection.find({}).sort([('ancestry', pymongo.ASCENDING), ('type', pymongo.ASCENDING)])
#vinipsmaker 's answer is good. However, it doesn't work properly if _ids are random numbers or there exist documents that aren't part of the tree structure. In that case, the following code would work rightly:
function getSortedItems() {
var sorted = [];
var ids = [ null ];
while (ids.length > 0) {
var cursor = db.Items.find({ ancestry: ids.shift() }).sort({ type: 1 });
while (cursor.hasNext()) {
var item = cursor.next();
ids.push(item._id);
sorted.push(item);
}
}
return sorted;
}
Note that this code is not fast because db.Items.find() will be executed n times, where n is the number of documents in the tree structure.
If the tree structure is huge or you will do the sort many times, you can optimize this by using $in operator in the query and sort the result on the client side.
In addition, creating index on the ancestry field will make the code quicker in either case.

How do I rename a nested key in mongodb

I want rename to rename my dict key in mongodb.
normally it works like that db.update({'_id':id},{$rename:{'oldfieldname':newfieldname}})
My document structure looks like that
{
'data':'.....',
'field':{'1':{'data':....},'2':{'data'...}},
'more_data':'....',
}
if i want to set
a new field in field 1 i do db.update({'_id':id},{$set:{'field.0.1.name':'peter'}})
for field two it is 'field'.1.2.name'
i thought with the rename it should be similar but it isn't ... (like $rename:{'field'.0.1': 2}
Here's a flexible method for renaming keys in a database
Given a document structure like this...
{
"_id": ObjectId("4ee5e9079b14f74ef14ddd2f"),
"code": "130.4",
"description": "4'' Socket Plug",
"technicalData": {
"Drawing No": "50",
"length": "200mm",
"diameter: "20mm"
},
}
I want to loop through all documents and rename technicalData["Drawing No"] to technicalData["Drawing Number"]
Run the following javascript in the execute panel in (the excellent) RockMongo
function remap(x){
dNo = x.technicalData["Drawing No"];
db.products.update({"_id":x._id}, {
$set: {"technicalData.Drawing Number" : dNo},
$unset: {"technicalData.Drawing No":1}
});
}
db.products.find({"technicalData.Drawing No":{$ne:null}}).forEach(remap);
The code will also run in a mongo shell
Your question is unclear but it seems you'd like to rename a field name within an array.
The short answer is you can't. As stated in the docs, $rename doesn't expand arrays to find a matching name. It only works on top level fields.
What you can do to simulate rename is by copying the field and its data to the new name, and then deleting the original field. You might also need a way to account for potentially concurrent writes if you have a lot of writes to that object/field.