I have a database with 15,574,934 records in mongo
I want to rename some columns to:
db.offerPhotos.files.update({}, {$rename: { 'orgFilename': 'metadata.orgFilename', 'offerId': 'metadata.offerId', 'batch': 'metadata.batch', 'group': 'metadata.group', 'size': 'metadata.size', 'mimeType': 'metadata.mimeType'}}, false, true)
I do it by mongo CLI but I'm waiting and waiting and nothing happens
How to do it better?
As per MongDB naming conventions
Field names cannot contain the null character.
Top-level field names cannot start with the dollar sign ($) character.
The use of $ and . in field names is not recommended and is not supported by the official MongoDB drivers
Otherwise, starting in MongoDB 3.6, the server permits storage of
field names that contain dots (i.e. .) and dollar signs (i.e. $).
Reason for the slowness:
The $rename operator logically performs an $unset of both the old name and the new name, and then performs a $set operation with the new name.
It does two operation first on a document unset on old and new name. Then it performs set operation. So totally three operations per document.
Reference
And you are actually doing more worse. Because you are converting a field to nested one.
Input doc:
{
"_id":"",
"key":1
}
Rename:
db.test.update({}, {
"$rename": {
"key": "key1.name"
}
})
Output:
/* 1 */
{
"_id" : ObjectId("5ef718a290c7f76c305aa21c"),
"key1" : {
"name" : 1
}
}
It's converted to nested one. You are doing worse because each rename results in 3 operations on a nested doc which in turn contains many fields.
Approximately you are doing 1.6million * 3 * 6 operations on a doc. Hence it is slow.
Related
I have a collection of documents in which a field name appears to have a dot:
{
"prod_id": "123",
"prod_cost (whole)": 49
"prod_cost (dec.)": 49
}
How can I effectively run an aggregation pipeline using that field?
As of now, it reports null values since it considers ")" as an additional nested field for prod_cost (dec.).
From MongoDB version 5,
MongoDB 5.0 adds improved support for the use of ($) and (.) in field names. There are some restrictions. See Field Name Considerations for more details.
Field Names with Periods (.) and Dollar Signs ($)
In most cases data that has been stored using field names like these is not directly accessible. You need to use helper methods like $getField, $setField, and $literal in queries that access those fields.
{ "$getField": "prod_cost (dec.)" }
Sample MongoPlayground
To access field in object, you can refer to Query a Field in a Sub-document demo.
{
"$getField": {
field: {
$literal: "prod_cost (dec.)"
},
input: "$productInfo"
}
}
Sample Mongo Playground (Nested object)
I have a database like this:
{
"universe":"comics",
"saga":[
{
"name":"x-men",
"characters":[
{
"character":"wolverine",
"picture":"618035022351.png"
},
{
"character":"wolverine",
"picture":"618035022352.png"
}
]
}
]
},
{
"universe":"dc",
"saga":[
{
"name":"spiderman",
"characters":[
{
"character":"venom",
"picture":"618035022353.png"
}
]
}
]
}
And with this code, I update the field where name: wolverine:
db.getCollection('collection').findOneAndUpdate(
{
"universe": "comics"
},
{
$set: {
"saga.$[outer].characters.$[inner].character": "lobezno",
"saga.$[outer].characters.$[inner].picture": "618035022354.png"
}
},
/*{
"saga.characters": 1
},*/
{
"arrayFilters": [
{
"outer.name": "x-men"
},
{
"inner.character": "wolverine"
}
],
"multi":false
}
)
I want to just update the first object where there is a match, and stop it.
For example, if I have an array of 100,000 elements and the object where the match is, is in the tenth position, he will update that record, but he will continue going through the entire array and this seems ineffective to me even though he already did the update.
Note: if I did the update using an _id inside of universe.saga.characters instead of doing the update using the name, it would still loop through the rest of the elements.
How can I do it?
Update using arrayFilters conditions
I don't think it will find and update through loop, and It does not matter if collection have 100,000 sub documents, because here is nice explanation in $[<identifier>] and has mentioned:
The $[<identifier>] to define an identifier to update only those array elements that match the corresponding filter document in the arrayFilters
In the update document, use the $[<identifier>] filtered positional operator to define an identifier, which you then reference in the array filter documents. But make sure you cannot have an array filter document for an identifier if the identifier is not included in the update document.
Update using _id
Your point,
Note: if I did the update using an _id inside of universe.saga.characters instead of doing the update using the name, it would still loop through the rest of the elements.
MongoDB will certainly use the _id index. Here is the nice answer on question MongoDB Update Query Performance, from this you will get an better idea on above point
Update using indexed fields
You can create index according to your query section of update command, Here MongoDB Indexes and Indexing Strategies has explained why index is important,
In your example, lets see with examples:
Example 1: If document have 2 sub documents and when you update and check with explain("executionStats"), assume it will take 1 second to update,
quick use Mongo Playground (this platform will not support update query)
Example 2: If document have 1000 sub documents and when you update and check with explain("executionStats"), might be it will take more then 1 second,
If provide index on fields (universe, saga.characters.character and saga.characters.picture) then definitely it will take less time then usual without index, main benefit of index it will direct point to indexed fields.
quick use Mongo Playground (this platform will not support update query)
Create Index for your fields
db.maxData.createIndex({
"universe": 1,
"saga.characters.character": 1,
"saga.characters.picture": 1
})
For more experiment use above 2 examples data with index and without index and check executionStats you will get more clarity.
I have a mongoDB document that has the following structure:
{
user:user_name,
streams:[
{user:user_a, name:name_a},
{user:user_b, name:name_b},
{user:user_c, name:name_c}
]
}
I want to use $pullAll to remove from the streams array, passing it an array of streams (the size of the array varies from 1 to N):
var streamsA = [{user:"user_a", name:"name_a"},{user:"user_b", name:"name_b"}]
var streamsB = [{name:"name_a", user:"user_a"},{name:"name_b", user:"user_b"}]
I use the following mongoDB command to perform the update operation:
db.streams.update({name:"user_name", {"$pullAll:{streams:streamsA}})
db.streams.update({name:"user_name", {"$pullAll:{streams:streamsB}})
Removing streamsA succeeds, whereas removing streamsB fails. After digging through the mongoDB manuals, I saw that the order of fields in streamsA and streamsB records has to match the order of fields in the database. For streamsB the order does not match, that's why it was not removed.
I can reorder the streams to the database document order prior to performing an update operation, but is there an easier and cleaner way to do this? Is there some flag that can be set to update and/or pullAll to ignore the order?
Thank You,
Gary
The $pullAll operator is really a "special case" that was mostly intended for single "scalar" array elements and not for sub-documents in the way you are using it.
Instead use $pull which will inspect each element and use an $or condition for the document lists:
db.streams.update(
{ "user": "user_name" },
{ "$pull": { "streams": { "$or": streamsB } }}
)
That way it does not matter which order the fields are in or indeed look for an "exact match" as the current $pullAll operation is actually doing.
What's a good way to store a set of documents in MongoDB where order is important? I need to easily insert documents at an arbitrary position and possibly reorder them later.
I could assign each item an increasing number and sort by that, or I could sort by _id, but I don't know how I could then insert another document in between other documents. Say I want to insert something between an element with a sequence of 5 and an element with a sequence of 6?
My first guess would be to increment the sequence of all of the following elements so that there would be space for the new element using a query something like db.items.update({"sequence":{$gte:6}}, {$inc:{"sequence":1}}). My limited understanding of Database Administration tells me that a query like that would be slow and generally a bad idea, but I'm happy to be corrected.
I guess I could set the new element's sequence to 5.5, but I think that would get messy rather quickly. (Again, correct me if I'm wrong.)
I could use a capped collection, which has a guaranteed order, but then I'd run into issues if I needed to grow the collection. (Yet again, I might be wrong about that one too.)
I could have each document contain a reference to the next document, but that would require a query for each item in the list. (You'd get an item, push it onto the results array, and get another item based on the next field of the current item.) Aside from the obvious performance issues, I would also not be able to pass a sorted mongo cursor to my {#each} spacebars block expression and let it live update as the database changed. (I'm using the Meteor full-stack javascript framework.)
I know that everything has it's advantages and disadvantages, and I might just have to use one of the options listed above, but I'd like to know if there is a better way to do things.
Based on your requirement, one of the approaches could be to design your schema, in such a way that each document has the capability to hold more than one document and in itself act as a capped container.
{
"_id":Number,
"doc":Array
}
Each document in the collection will act as a capped container, and the documents will be stored as array in the doc field. The doc field being an array, will maintain the order of insertion.
You can limit the number of documents to n. So the _id field of each container document will be incremental by n, indicating the number of documents a container document can hold.
By doing these you avoid adding extra fields to the document, extra indices, unnecessary sorts.
Inserting the very first record
i.e when the collection is empty.
var record = {"name" : "first"};
db.col.insert({"_id":0,"doc":[record]});
Inserting subsequent records
Identify the last container document's _id, and the number of
documents it holds.
If the number of documents it holds is less than n, then update the
container document with the new document, else create a new container
document.
Say, that each container document can hold 5 documents at most,and we want to insert a new document.
var record = {"name" : "newlyAdded"};
// using aggregation, get the _id of the last inserted container, and the
// number of record it currently holds.
db.col.aggregate( [ {
$group : {
"_id" : null,
"max" : {
$max : "$_id"
},
"lastDocSize" : {
$last : "$doc"
}
}
}, {
$project : {
"currentMaxId" : "$max",
"capSize" : {
$size : "$lastDocSize"
},
"_id" : 0
}
// once obtained, check if you need to update the last container or
// create a new container and insert the document in it.
} ]).forEach( function(check) {
if (check.capSize < 5) {
print("updating");
// UPDATE
db.col.update( {
"_id" : check.currentMaxId
}, {
$push : {
"doc" : record
}
});
} else {
print("inserting");
//insert
db.col.insert( {
"_id" : check.currentMaxId + 5,
"doc" : [ record ]
});
}
})
Note that the aggregation, runs on the server side and is very efficient, also note that the aggregation would return you a document rather than a cursor in versions previous to 2.6. So you would need to modify the above code to just select from a single document rather than iterating a cursor.
Inserting a new document in between documents
Now, if you would like to insert a new document between documents 1 and 2, we know that the document should fall inside the container with _id=0 and should be placed in the second position in the doc array of that container.
so, we make use of the $each and $position operators for inserting into specific positions.
var record = {"name" : "insertInMiddle"};
db.col.update(
{
"_id" : 0
}, {
$push : {
"doc" : {
$each : [record],
$position : 1
}
}
}
);
Handling Over Flow
Now, we need to take care of documents overflowing in each container, say we insert a new document in between, in container with _id=0. If the container already has 5 documents, we need to move the last document to the next container and do so till all the containers hold documents within their capacity, if required at last we need to create a container to hold the overflowing documents.
This complex operation should be done on the server side. To handle this, we can create a script such as the one below and register it with mongodb.
db.system.js.save( {
"_id" : "handleOverFlow",
"value" : function handleOverFlow(id) {
var currDocArr = db.col.find( {
"_id" : id
})[0].doc;
print(currDocArr);
var count = currDocArr.length;
var nextColId = id + 5;
// check if the collection size has exceeded
if (count <= 5)
return;
else {
// need to take the last doc and push it to the next capped
// container's array
print("updating collection: " + id);
var record = currDocArr.splice(currDocArr.length - 1, 1);
// update the next collection
db.col.update( {
"_id" : nextColId
}, {
$push : {
"doc" : {
$each : record,
$position : 0
}
}
});
// remove from original collection
db.col.update( {
"_id" : id
}, {
"doc" : currDocArr
});
// check overflow for the subsequent containers, recursively.
handleOverFlow(nextColId);
}
}
So that after every insertion in between , we can invoke this function by passing the container id, handleOverFlow(containerId).
Fetching all the records in order
Just use the $unwind operator in the aggregate pipeline.
db.col.aggregate([{$unwind:"$doc"},{$project:{"_id":0,"doc":1}}]);
Re-Ordering Documents
You can store each document in a capped container with an "_id" field:
.."doc":[{"_id":0,","name":"xyz",...}..]..
Get hold of the "doc" array of the capped container of which you want
to reorder items.
var docArray = db.col.find({"_id":0})[0];
Update their ids so that after sorting the order of the item will change.
Sort the array based on their _ids.
docArray.sort( function(a, b) {
return a._id - b._id;
});
update the capped container back, with the new doc array.
But then again, everything boils down to which approach is feasible and suits your requirement best.
Coming to your questions:
What's a good way to store a set of documents in MongoDB where order is important?I need to easily insert documents at an arbitrary
position and possibly reorder them later.
Documents as Arrays.
Say I want to insert something between an element with a sequence of 5 and an element with a sequence of 6?
use the $each and $position operators in the db.collection.update() function as depicted in my answer.
My limited understanding of Database Administration tells me that a
query like that would be slow and generally a bad idea, but I'm happy
to be corrected.
Yes. It would impact the performance, unless the collection has very less data.
I could use a capped collection, which has a guaranteed order, but then I'd run into issues if I needed to grow the collection. (Yet
again, I might be wrong about that one too.)
Yes. With Capped Collections, you may lose data.
An _id field in MongoDB is a unique, indexed key similar to a primary key in relational databases. If there is an inherent order in your documents, ideally you should be able to associate a unique key to each document, with the key value reflecting the order. So while preparing your document for insertion, explicitly add an _id field as this key (if you do not, mongo creates it automatically with a BSON objectid).
As far as retrieving the results are concerned, MongoDB does not guarantee the order of return documents unless you explicitly use .sort() . If you do not use .sort(), the results are usually returned in natural order (order of insertion).Again, there is no guarantee on this behavior.
I'd advise you to override _id with your order while inserting, and use a sort while retrieving. Since _id is a necessary and auto-indexed entity, you will not be wasting any space defining a sort key, and storing the index for it.
For abitrary sorting of any collection, you'll need a field to sort it on. I call mine "sequence".
schema:
{
_id: ObjectID,
sequence: Number,
...
}
db.items.ensureIndex({sequence:1});
db.items.find().sort({sequence:1})
Here is a link to some general sorting database answers that may be relevant:
https://softwareengineering.stackexchange.com/questions/195308/storing-a-re-orderable-list-in-a-database/369754
I suggest going with Floating point solution - adding a position column:
Use a floating-point number for the position column.
You can then reorder the list changing only the position column in the "moved" row.
If your user wants to position "red" after "blue" but before "yellow" Then you just need to calculate
red.position = ((yellow.position - blue.position) / 2) + blue.position
After a few re-positions in the same place (Cuttin in half every time) - you might reach a wall - it's better that if you reach a certain threshold - to resort the list.
When retrieving it you can simply say col.sort() to get it sorted and no need for any client-side code (Like in the case of a Linked list solution)
If I have a mongodb collection users like this:
{
"_id": 1,
"name": {
"first" : "John",
"last" :"Backus"
},
}
How do I retrieve name.first from this without providing _id or any other reference. Also, is it possible that pulling just the `name^ can give me the array of embedded keys (first and last in this case)? How can that be done?
db.users.find({"name.first"}) didn't work for me, I got a:
SyntaxError "missing: after property id (shell):1
The first argument to find() is the query criteria whereas the second argument to the find() method is a projection, and it takes the form of a document with a list of fields for inclusion or exclusion from the result set. You can either specify the fields to include (e.g. { field: 1 }) or specify the fields to exclude (e.g. { field: 0 }). The _id field is implicitly included, unless explicitly excluded.
In your case, db.users.find({name.first}) will give an error as it is expected to be a search criteria.
To get the name json :
db.users.find({},{name:1})
If you want to fetch only name.first
db.users.find({},{"name.first":1})
Mongodb Documentation link here
To fetch all the record details:
db.users.find({"name.first":""})
To fetch just the name or specific field:
db.users.find({{},"name.X":""});
where X can be first, last .
dot(.) notation can be used if required to traverse inside the array for key value pair as
db.users.find({"name.first._id":"xyz"});
In 2022
const cursor = db
.collection('inventory')
.find({
status: 'A'
})
.project({ item: 1, status: 1 });
Source: https://www.mongodb.com/docs/manual/tutorial/project-fields-from-query-results/