MongoDb _id field rename - mongodb

I have a collection named 'summary' in azure cosmos db and _id field for my collection is 'orderId'. I have millions of records in my collection. Now I want to rename my _id field 'orderId' to 'purchaseOrderId'(as per business domain design). This collection has '_id.orderId' index. To achieve this one straight approach is dropping the collection and reload with new id field name, but that cost more and also time consuming since it needs to reload millions of data. So is there any way to achieve this by updating _id field name for rename (by retrieving existing record and do a rename update)with spring mongotemplate or mongodb driver 3.11.1.
old id field name : 'orderId',
recommended id name : 'purchaseOrderId',
Existing index : '_id.orderId',
Mongo db version: 3.6
Mongo document structure
{
"_id" : {
"orderId" : 10164
},
"countryCode" : null,
"sequenceNumber" : "5693",
"deptNumber" : "92",
"type" : "20",
"addrNumber" : 12,
"venNumber" : 0,
"shipPtDescr" : " ",
"whsNumber" : "6001",
"purchId" : 1006,
"statCode" : "C",
"groceryId" : "N",
"openToBuyMonth" : 12,
"updateSource" : "MF",
"authorizedDate" : null,
"deposit" : null,
"cost" : null,
"boardCode" : null,
"authorizedBy" : null,
...
..
...
}

Unfortunately, _id is an immutable field in MongoDB and it does not allow to change the _id of a document after you have inserted it.
Here is the behavior of _id field as mentioned in this link :
_id Field Once set, you cannot update the value of the _id field nor can you replace an existing document with a replacement document that has a different _id field value.
As a work-around, let MongoDB create it's own _id field while you add another field ( say, custom_id outside _id field ) that your application will refer and use.
{
"_id" : Object("xxxxxxxxxxxxxxxx"),
"custom_id" : {
"orderId" : 10164
},
"countryCode" : null,
"sequenceNumber" : "5693",
...
...
}
Here you can rename the field as
db.collection.updateMany( {}, { $rename: { "custom_id.orderId": "custom_id.purchaseOrderId" } } )

Related

Merge documents where a column has the same value and create fields with found data

How can I merge the documents together where the user is the same?
What I find difficult to do is to automatically add the values from the "field" column as columns and the data from the "data" field as values for the newly created columns.
Like merging these two because they have the same user id and have the columns "Date of birth":"1989-01-12" and "Job":"Teacher".
I know it's a lot to ask, but could someone guide me to how to achieve this?
{
"_id" : ObjectId("5d6b00b960016c4c441d9a16"),
"user" : 1000,
"field" : "Date of birth",
"data" : "1989-01-12",
"timestamp" : "2017-08-27 11:00:59"
}
{
"_id" : ObjectId("5d6b00b960016c4c441d9a17"),
"user" : 1000,
"field" : "Job",
"data" : "Teacher",
"timestamp" : "2017-08-27 10:59:19"
}
Into
{
"_id" : ObjectId("5d6b00b960016c4c441d9a16"),
"user" : 1000,
"Date of birth" : "1989-01-12",
"Job" : "Teacher",
"timestamp" : "2017-08-27 11:00:59"
}
To merge documents, you can iterate them to create a new document.
If you want to remove the old references to the given user, you have to delete them before inserting your new document.
You can go like this using the javascript interface of MongoDB:
// Get the docs
docs = db.find({user:1000})
// Init common values of the new doc
new_doc = db.findOne({user:1000})
// Remove the unused field in the new object
delete new_doc.field
delete new_doc.data
for (i=0; i<docs.size(); i++){
doc = docs[i]
new_doc[doc["field"]] = new_doc["data"]
if (Date(doc["timestamp"]) > Date(new_doc["timestamp"])) {
new_doc["timestamp"] = doc["timestamp"]
}
}
// remove all references to the user if you need to
db.remove({user:1000}, {multi:true})
// insert the merged document
db.insert(new_doc)

MongoDB - update an existing document in a collection

I have a collection called user_roles and it contains a field called rights which is an array of strings.
I want to update a document within user_roles collection that has _id = 5b1509f8b95b4bfe2b638508 by appending a new string element into the rights field.
So basically, after this update collection should hold the additional element "ui.dealers.measures.retrieve" as shown below.
{
"_id" : ObjectId("5b1509f8b95b4bfe2b638508"),
"type" : "coach",
"name" : "Coach",
"flavours" : {
"coach" : NumberInt(1)
},
"rights" : [
"ui.dealers.retrieve",
"ui.dealers.dossier.retrieve",
"ui.dealers.dossier.update",
"ui.dealers.documents.retrieve",
"ui.dealers.documents.create",
"ui.dealers.documents.delete",
"ui.dealers.events.retrieve",
"ui.dealers.events.create",
"ui.dealers.events.update",
"ui.dealers.events.export",
"ui.dealers.events.delete",
"ui.dealers.kpis.retrieve",
"ui.dealers.kpis.update",
"ui.dealers.blueprints.retrieve",
"ui.dealers.blueprints.create",
"ui.dealers.gap.retrieve",
"ui.dealers.gap.update",
"ui.dealers.measures.create",
"ui.dealers.surveys.retrieve",
"ui.dealers.surveys.update",
"ui.dealers.measures.retrieve"
],
"createdAt" : ISODate("2018-06-04T09:44:24.394+0000"),
"updatedAt" : ISODate("2018-06-04T10:01:56.428+0000")
}
Please try this
db.collection.update({_id:ObjectId("5b1509f8b95b4bfe2b638508")},{
$push:{
"rights":"ui.dealers.measures.retrieve"
}
})

MongoDB : Can't insert twice the same document

On my pymongo code, inserting twice the same doc raises an error :
document = {"auteur" : "romain",
"text" : "premier post",
"tag" : "test2",
"date" : datetime.datetime.utcnow()}
collection.insert_one(document)
collection.insert_one(document)
raises :
DuplicateKeyError: E11000 duplicate key error collection: test.myCollection index: _id_ dup key: { : ObjectId('5aa282eff1dba231beada9e3') }
inserting two documents with different content works fine.
Seems like according to https://docs.mongodb.com/manual/reference/method/db.collection.createIndex/#options I should do something aobut option of indexes:
unique boolean
Optional. Creates a unique index so that the collection will not accept insertion or update of documents where the index key value matches an existing value in the index.
Specify true to create a unique index. The default value is false.
The option is unavailable for hashed indexes.
Adding to Peba's answer, you can use the .copy() method of python dictionary to avoid the mutation of the document itself.
document = {"auteur" : "romain",
"text" : "premier post",
"tag" : "test2",
"date" : datetime.datetime.utcnow()}
collection.insert_one(document.copy())
collection.insert_one(document.copy())
This way, each insert_one call get's a shallow copy of the document and at the same time keeps your code more pythonic.
Inserting a document implicitly generates an _id.
So after inserting the document it will mutate to
document = {"_id" : ObjectId('random_id_here'),
"auteur" : "romain",
"text" : "premier post",
"tag" : "test2",
"date" : datetime.datetime.utcnow()}
Trying to insert said document again will result in an error due to the duplicated _id.
You can create a new document with the same values and insert it.
document = {"auteur" : "romain",
"text" : "premier post",
"tag" : "test2",
"date" : datetime.datetime.utcnow()}
collection.insert_one(document)
document = {"auteur" : "romain",
"text" : "premier post",
"tag" : "test2",
"date" : datetime.datetime.utcnow()}
collection.insert_one(document)

MongoDB aggregation and paging

I have documents with my internal id field inside of each document and date when this document was added. There could be number of documents with the same id (differents versions of the same document), but dates will always be different for those documents. I want in some query, to bring only one document from all versions of the same document (with same id field) that was relevant to specified date, and I want to display them with paging (50 rows in the page). So, is there any chance to do this in MongoDB (operations - query documents by some field, group them by id field, sort by date field and take only first, and all this should be with paging.) ?
Please see example :Those are documents, some of them different documents,like documents A,B and C, and some are versions of the same documents,
like _id: 1, 2 and 3 are all version of the same document A
Document A {
_id : 1,
"id" : "A",
"author" : "value",
"date" : "2015-11-05"
}
Document A {
_id : 2,
"id" : "A",
"author" : "value",
"date" : "2015-11-06"
}
Document A {
_id : 3,
"id" : "A",
"author" : "value",
"date" : "2015-11-07"
}
Document B {
_id : 4,
"id" : "B",
"author" : "value",
"date" : "2015-11-06"
}
Document B {
_id : 5,
"id" : "B",
"author" : "value",
"date" : "2015-11-07"
}
Document C {
_id : 6,
"id" : "C",
"author" : "value",
"date" : "2015-11-07"
}
And I want to query all documents that has "value" in the "author" field.
And from those documents to bring only one document of each with latest date for
the specified date, for example 2015-11-08. So, I expect the result to be :
_id : 3, _id : 5, _id : 6
And also paging , for example 10 documents in each page.
Thanks !!!!!
Two documents can't have the same _id. There is a unique index on _id by default.
As per 1. you need to have a compound _id field which includes the date:
{
"_id":{
docId: yourFormerIdValue,
date: new ISODate()
}
// other fields
}
To get the version valid at a specified date, the query becomes rather easy:
db.yourColl.find({
"_id":{
"docId": idToFind,
// get only the version valid up to a specific date...
"date":{ "$lte": someISODate }
}
})
// ...sort the results descending...
.sort("_id.date":-1)
// ...and get only the first and therefor newest entry
.limit(1)

Unique key in moongose db

I have the following DB:
{
"_id" : ObjectId("556da79a77f9f7465943ff93"),
"guid" : "a12345",
"update_day" : "12:05:10 02.06.15"
}
{
"_id" : ObjectId("556dc4509a0a6a002f97e972"),
"guid" : "bbbb",
"update_day" : "15:03:10 02.06.15"
"__v" : 0
}
{
"_id" : ObjectId("556dc470e57836242f5519eb"),
"guid" : "bbbb",
"update_day" : "15:03:10 02.06.15"
"__v" : 0
}
{
"_id" : ObjectId("556dc47c7e882d3c2fe9e0fd"),
"guid" : "bbbb",
"update_day" : "15:03:10 02.06.15"
"__v" : 0
}
I want to set the guid to be unique, so no to duplicate is possible (Like primary key in MYSQL). So the DB will look like this:
{
"_id" : ObjectId("556da79a77f9f7465943ff93"),
"guid" : "a12345",
"update_day" : "12:05:10 02.06.15"
}
{
"_id" : ObjectId("556dc4509a0a6a002f97e972"),
"guid" : "bbbb",
"update_day" : "15:03:10 02.06.15"
"__v" : 0
}
and when I will insert another "guid":"bbbb" (with the save command), it will fails.
While declaring schema in mongoose, do this
guid : { type : String, unique : true}
AND if you want mongodb to create the guid on its own (like _id) then do this
guid : { type : String, index : { unique : true} }
First, you have to deal with the current state of your MongoDB collection and delete all the duplicated documents.
One thing is sure : you won't be able to create the unique index with duplicates in your collection and dropDupes is now deprecated since the version 2.7.5 so you can't use it. By the way, it was removed because it was almost impossible to predict which document would be deleted in the process.
Two possible solutions :
Create a new collection. Create the unique index on this new collection and run a batch to copy all the documents from the old collection to the new one and make sure you ignore duplicated key error during the process.
Deal with it in your own collection manually :
make sure you won't insert more duplicated documents in your code,
run a batch on your collection to delete the duplicates (and make sure you keep the good one if they are not completely identical),
then add the unique index.
I would declare my guid like so in mongoose :
guid : { type : String, unique : true}