Trying to understand why/if mongoose is updating my documents even though no data is changed?
If I save a new document with the query below it will return this in the console.log(item)
{ n: 1,
nModified: 0,
upserted: [ { index: 0, _id: 5f3d35c386aeb6c6fb35fa79 } ],
ok: 1 }
Query
Product.updateOne(
{productName: product.productName},
{$set: newProduct},
{upsert: true}
).then((item) => {
console.log(item);
}).catch((e) => {
console.log('Insert error', e);
});
If i rerun the same query again i get this back. This indicates that the document has been modified but the data is the same, there is no new data that has been inserted.
{ n: 1, nModified: 1, ok: 1 }
I've noticed if i remove the stores array, delete the document, insert it again and rerun the query I get { n: 1, nModified: 0, ok: 1 } back in the console.log(item)
I run the same querys, the same amout of time, but when having an array in the object i get this { n: 1, nModified: 1, ok: 1 } and when not having an array a get this { n: 1, nModified: 0, ok: 1 }
It seems that when having an array the document gets modified regardless if the data is changed.
Example 1
Gives { n: 1, nModified: 1, ok: 1 }
const newProduct = {
ean: product.ean,
productName: product.productName,
lowestPrice: product.productPrice,
mainCategory: categories.mainCategory,
group: categories.group,
subCategory: categories.subCategory,
subSubCategory: subSubCat,
stores: [{
name: "foobar",
}],
};
Example 2
Gives { n: 1, nModified: 0, ok: 1 }
const newProduct = {
ean: product.ean,
productName: product.productName,
lowestPrice: product.productPrice,
mainCategory: categories.mainCategory,
group: categories.group,
subCategory: categories.subCategory,
subSubCategory: subSubCat,
};
Is it me who misunderstands the operation below or whats going on?
What i want to do is:
1.insert if the document don't exists based on productName,
2.if something differs in the document stored in the database and the newProduct, update the document.
3.If nothing differs, do nothing
Product model
const ProductSchema = new Schema({
ean: String,
productName: String,
mainCategory: String,
subCategory: String,
group: String,
subSubCategory: String,
lowestPrice: Number,
isPopular: Boolean,
description: String,
stores: [
{
name: String,
},
],
});
Edit: As its pretty hard to explain I created a small repo that shows the issue.
https://github.com/gameatrix/mongo_array
The database still performs the update.
For example, let's conditionally upsert a value:
MongoDB Enterprise ruby-driver-rs:PRIMARY> db.foo.update({a:42},{a:42},{upsert:true})
WriteResult({
"nMatched" : 0,
"nUpserted" : 1,
"nModified" : 0,
"_id" : ObjectId("5f3d4ed509fcd40c9f092690")
})
MongoDB Enterprise ruby-driver-rs:PRIMARY> db.foo.update({a:42},{a:42},{upsert:true})
WriteResult({ "nMatched" : 1, "nUpserted" : 0, "nModified" : 1 })
The first write was an insert, the second write was an update. The second write did not change any data but the database performed a write.
You can verify there was a write by using a change stream in another shell instance:
MongoDB Enterprise ruby-driver-rs:PRIMARY> db.foo.watch()
{ "_id" : { "_data" : "825F3D4ED5000000012B022C0100296E5A1004FAC29486D5A3459A8726349007F2E43E46645F696400645F3D4ED509FCD40C9F0926900004" }, "operationType" : "insert", "clusterTime" : Timestamp(1597853397, 1), "fullDocument" : { "_id" : ObjectId("5f3d4ed509fcd40c9f092690"), "a" : 42 }, "ns" : { "db" : "test", "coll" : "foo" }, "documentKey" : { "_id" : ObjectId("5f3d4ed509fcd40c9f092690") } }
{ "_id" : { "_data" : "825F3D4ED6000000012B022C0100296E5A1004FAC29486D5A3459A8726349007F2E43E46645F696400645F3D4ED509FCD40C9F0926900004" }, "operationType" : "replace", "clusterTime" : Timestamp(1597853398, 1), "fullDocument" : { "_id" : ObjectId("5f3d4ed509fcd40c9f092690"), "a" : 42 }, "ns" : { "db" : "test", "coll" : "foo" }, "documentKey" : { "_id" : ObjectId("5f3d4ed509fcd40c9f092690") } }
By definition an upsert either modifies documents that match the condition or inserts new documents. You are always going to have a write when upserting.
2.if something differs in the document stored in the database and the newProduct, update the document.
The bolded part is not how MongoDB (and most databases, as far as I know) work. Whether a write is performed does not depend on whether the data being written is the same as what is already in the database.
Related
After scouring the documentation and posts online, there is one thing I have never been clear about with Mongo.
When you are attempting to write documents in bulk to a collection like the example below, is it ever possible that you would get some documents that write successfully, but some that don't?
db.products.insertMany( [
{ item: "card", qty: 15 },
{ item: "envelope", qty: 20 },
{ item: "stamps" , qty: 30 }
] );
In other words, could you ever get into a situation where you would create documents for the card and envelope items, but not for the stamps item?
I am trying to improve the performance of some of my company's processes and there is some debate in my team as to what kind of error scenarios can really arise from bulk inserts or updates, so if anyone has a clear answer, that would be fantastic. I know that generally mongo queries are not transactional unless you explicitly state so, but this is one area where it just wasn't clear.
Have a look at this example:
db.products.insertMany([
{ _id: 1, item: "card", qty: 15 },
{ _id: 2, item: "envelope", qty: 20 },
{ _id: 1, item: "stamps", qty: 30 }
]);
uncaught exception: BulkWriteError({
"writeErrors" : [
{
"index" : 2,
"code" : 11000,
"errmsg" : "E11000 duplicate key error collection: so.products index: _id_ dup key: { _id: 1.0 }",
"op" : {
"_id" : 1,
"item" : "stamps",
"qty" : 30
}
},
],
"writeConcernErrors" : [ ],
"nInserted" : 2,
"nUpserted" : 0,
"nMatched" : 0,
"nModified" : 0,
"nRemoved" : 0,
"upserted" : [ ]
}) :
Behavior should be obvious. Note, by default the documents are inserted in the same order as in your command, unless you specify option ordered: false
Say I have the fields a and b. I want to have a compound uniqueness where if a: 1, b: 2, I would not be able to do a: 2, b: 1.
The reason I want this is because I'm making a "friends list" kind of collection, where if a is connected to b, then it's automatically the reverse as well.
is this possible on a schema level or do I need to do queries to check.
If you don't need to differentiate between requester and requestee, you could sort the values before saving or querying so that your two fields a and b have a predictable order for any pair of friend IDs (and you can take advantage of the unique index constraint).
For example, using the mongo shell:
Create a helper function to return friend pairs in predictable order:
function friendpair (friend1, friend2) {
if ( friend1 < friend2) {
return ({a: friend1, b: friend2})
} else {
return ({a: friend2, b: friend1})
}
}
Add a compound unique index:
> db.friends.createIndex({a:1, b:1}, {unique: true});
{
"createdCollectionAutomatically" : true,
"numIndexesBefore" : 1,
"numIndexesAfter" : 2,
"ok" : 1
}
Insert unique pairs (should work)
> db.friends.insert(friendpair(1,2))
WriteResult({ "nInserted" : 1 })
> db.friends.insert(friendpair(1,3))
WriteResult({ "nInserted" : 1 })
Insert non-unique pair (should return duplicate key error):
> db.friends.insert(friendpair(2,1))
WriteResult({
"nInserted" : 0,
"writeError" : {
"code" : 11000,
"errmsg" : "E11000 duplicate key error collection: test.friends index: a_1_b_1 dup key: { : 1.0, : 2.0 }"
}
})
Search should work in either order:
db.friends.find(friendpair(3,1)).pretty()
{ "_id" : ObjectId("5bc80ed11466009f3b56fa52"), "a" : 1, "b" : 3 }
db.friends.find(friendpair(1,3)).pretty()
{ "_id" : ObjectId("5bc80ed11466009f3b56fa52"), "a" : 1, "b" : 3 }
Instead of handling duplicate key errors or insert versus update, you could also use findAndModify with an upsert since this is expected to be a unique pair:
> var pair = friendpair(2,1)
> db.friends.findAndModify({
query: pair,
update: {
$set: {
a : pair.a,
b : pair.b
},
$setOnInsert: { status: 'pending' },
},
upsert: true
})
{
"_id" : ObjectId("5bc81722ce51da0e4118c92f"),
"a" : 1,
"b" : 2,
"status" : "pending"
}
Doesn't seem like you can do a unique on the entire array's values so I'm doing a kind of work around. I'm using the $jsonSchema as follows:
{
$jsonSchema:
{
bsonType:
"object",
required:
[
"status",
"users"
],
properties:
{
status:
{
enum:
[
"pending",
"accepted"
],
bsonType:
"string"
},
users:
{
bsonType:
"array",
description:
"references two user_id",
items:
{
bsonType:
"objectId"
},
maxItems:
2,
minItems:
2,
},
}
}
}
then I will use $all to find the connected users, e.g.
db.collection.find( { users: { $all: [ ObjectId1, ObjectId2 ] } } )
I am new to MongoDB so this is probably a basic question (hopefully). I currently have 10 million records with 410 fields loaded in a mongodb collection like so:
{
"_id" : ObjectId("........"),
"AddressID" : 123455,
"IndividualId" : 1,
"personfirstname" : "FirstName",
"personmiddleinitial" : "M",
"personlastname" : "LastName",
"etc": "....."
}
I need to wrap all of this data into an embedded document like so:
{
"_id" : ObjectId("........"),
"data" : {
"AddressID" : 123455,
"IndividualId" : 1,
"personfirstname" : "FirstName",
"personmiddleinitial" : "M",
"personlastname" : "LastName",
"etc": "....."
}
I don't necessarily need to update this data in-place but that would be nice. If I need to export this data somehow specifying the new format and then re-import the new, updated data that is fine. Performing this via the MongoDB shell would be ideal.
As suggested by chridam within comments you can execute the following aggregation pipeline:
db.collectionName.aggregate([
{ $project: { _id: "$_id", data: "$$ROOT" } },
{ $out: "newCollectionName" }
]);
This way you have the _id field both at root level and in the data object. Thus, you can execute a massive update to unset the second one:
db.newCollectionName.updateMany(
{},
{ $unset: { "data._id": "" } }
);
Finally, you can drop the first collection and rename the second to restore the original name on the updated collection:
db.collectionName.drop();
db.newCollectionName.rename("collectionName");
This approach fully works within the database, avoiding fetching any of your 10 million documents.
You can simply do this in the shell with the following
db.test.find().forEach(function(doc){
doc = { _id: doc._id, data: doc };
delete doc.data._id;
db.test.save(doc);
});
For example, if we insert the following documents:
> db.test.insertMany([
... {
... _id: ObjectId("5a91af8908e17c5997e03b7e"),
... field1: false,
... field2: 0,
... field3: "No"
... },
... {
... _id: ObjectId("5a91afbc08e17c5997e03b7f"),
... field1: true,
... field2: 1,
... field3: "Yes"
... }])
{
"acknowledged" : true,
"insertedIds" : [
ObjectId("5a91af8908e17c5997e03b7e"),
ObjectId("5a91afbc08e17c5997e03b7f")
]
}
Then run:
db.test.find().forEach(function(doc){
doc = { _id: doc._id, data: doc };
delete doc.data._id;
db.test.save(doc);
});
Our documents now look like this:
> db.test.find().pretty()
{
"_id" : ObjectId("5a91af8908e17c5997e03b7e"),
"data" : {
"field1" : false,
"field2" : 0,
"field3" : "No"
}
}
{
"_id" : ObjectId("5a91afbc08e17c5997e03b7f"),
"data" : {
"field1" : true,
"field2" : 1,
"field3" : "Yes"
}
}
I know the question have been asked many times, but I can't figure out how to update a subdocument in mongo.
Here's my Schema:
// Schemas
var ContactSchema = new mongoose.Schema({
first: String,
last: String,
mobile: String,
home: String,
office: String,
email: String,
company: String,
description: String,
keywords: []
});
var UserSchema = new mongoose.Schema({
email: {
type: String,
unique: true,
required: true
},
password: {
type: String,
required: true
},
contacts: [ContactSchema]
});
My collection looks like this:
db.users.find({}).pretty()
{
"_id" : ObjectId("5500b5b8908520754a8c2420"),
"email" : "test#random.org",
"password" : "$2a$08$iqSTgtW27TLeBSUkqIV1SeyMyXlnbj/qavRWhIKn3O2qfHOybN9uu",
"__v" : 8,
"contacts" : [
{
"first" : "Jessica",
"last" : "Vento",
"_id" : ObjectId("550199b1fe544adf50bc291d"),
"keywords" : [ ]
},
{
"first" : "Tintin",
"last" : "Milou",
"_id" : ObjectId("550199c6fe544adf50bc291e"),
"keywords" : [ ]
}
]
}
Say I want to update subdocument of id 550199c6fe544adf50bc291e by doing:
db.users.update({_id: ObjectId("5500b5b8908520754a8c2420"), "contacts._id": ObjectId("550199c6fe544adf50bc291e")}, myNewDocument)
with myNewDocument like:
{ "_id" : ObjectId("550199b1fe544adf50bc291d"), "first" : "test" }
It returns an error:
db.users.update({_id: ObjectId("5500b5b8908520754a8c2420"), "contacts._id": ObjectId("550199c6fe544adf50bc291e")}, myNewdocument)
WriteResult({
"nMatched" : 0,
"nUpserted" : 0,
"nModified" : 0,
"writeError" : {
"code" : 16837,
"errmsg" : "The _id field cannot be changed from {_id: ObjectId('5500b5b8908520754a8c2420')} to {_id: ObjectId('550199b1fe544adf50bc291d')}."
}
})
I understand that mongo tries to replace the parent document and not the subdocument, but in the end, I don't know how to update my subdocument.
You need to use the $ operator to update a subdocument in an array
Using contacts.$ will point mongoDB to update the relevant subdocument.
db.users.update({_id: ObjectId("5500b5b8908520754a8c2420"),
"contacts._id": ObjectId("550199c6fe544adf50bc291e")},
{"$set":{"contacts.$":myNewDocument}})
I am not sure why you are changing the _id of the subdocument. That is not advisable.
If you want to change a particular field of the subdocument use the contacts.$.<field_name> to update the particular field of the subdocument.
I'm attempting to store pre-aggregated performance metrics in a sharded mongodb according to this document.
I'm trying to update the minute sub-documents in a record that may or may not exist with an upsert like so (self.collection is a pymongo collection instance):
self.collection.update(query, data, upsert=True)
query:
{ '_id': u'12345CHA-2RU020130304',
'metadata': { 'adaptor_id': 'CHA-2RU',
'array_serial': 12345,
'date': datetime.datetime(2013, 3, 4, 0, 0, tzinfo=<UTC>),
'processor_id': 0}
}
data:
{ 'minute': { '16': { '45': 1.6693091}}}
The problem is that in this case the 'minute' subdocument always only has the last hour: { minute: metric} entry, the minute subdocument does not create new entries for other hours, it's always overwriting the one entry.
I've also tried this with a $set style data entry:
{ '$set': { 'minute': { '16': { '45': 1.6693091}}}}
but it ends up being the same.
What am I doing wrong?
In both of the examples listed you are simply setting a field ('minute')to a particular value, the only reason it is an addition the first time you update is because the field itself does not exist and so must be created.
It's hard to determine exactly what you are shooting for here, but I think what you could do is alter your schema a little so that 'minute' is an array. Then you could use $push to add values regardless of whether they are already present or $addToSet if you don't want duplicates.
I had to alter your document a little to make it valid in the shell, so my _id (and some other fields) are slightly different to yours, but it should still be close enough to be illustrative:
db.foo.find({'_id': 'u12345CHA-2RU020130304'}).pretty()
{
"_id" : "u12345CHA-2RU020130304",
"metadata" : {
"adaptor_id" : "CHA-2RU",
"array_serial" : 12345,
"date" : ISODate("2013-03-18T23:28:50.660Z"),
"processor_id" : 0
}
}
Now let's add a minute field with an array of documents instead of a single document:
db.foo.update({'_id': 'u12345CHA-2RU020130304'}, { $addToSet : {'minute': { '16': {'45': 1.6693091}}}})
db.foo.find({'_id': 'u12345CHA-2RU020130304'}).pretty()
{
"_id" : "u12345CHA-2RU020130304",
"metadata" : {
"adaptor_id" : "CHA-2RU",
"array_serial" : 12345,
"date" : ISODate("2013-03-18T23:28:50.660Z"),
"processor_id" : 0
},
"minute" : [
{
"16" : {
"45" : 1.6693091
}
}
]
}
Then, to illustrate the addition, add a slightly different entry (since I am using $addToSet this is required for a new field to be added:
db.foo.update({'_id': 'u12345CHA-2RU020130304'}, { $addToSet : {'minute': { '17': {'48': 1.6693391}}}})
db.foo.find({'_id': 'u12345CHA-2RU020130304'}).pretty()
{
"_id" : "u12345CHA-2RU020130304",
"metadata" : {
"adaptor_id" : "CHA-2RU",
"array_serial" : 12345,
"date" : ISODate("2013-03-18T23:28:50.660Z"),
"processor_id" : 0
},
"minute" : [
{
"16" : {
"45" : 1.6693091
}
},
{
"17" : {
"48" : 1.6693391
}
}
]
}
I ended up setting the fields like this:
query:
{ '_id': u'12345CHA-2RU020130304',
'metadata': { 'adaptor_id': 'CHA-2RU',
'array_serial': 12345,
'date': datetime.datetime(2013, 3, 4, 0, 0, tzinfo=<UTC>),
'processor_id': 0}
}
I'm setting the metrics like this:
data = {"$set": {}}
for metric in csv:
date_utc = metric['date'].astimezone(pytz.utc)
data["$set"]["minute.%d.%d" % (date_utc.hour,
date_utc.minute)] = float(metric['metric'])
which creates data like this:
{"$set": {'minute.16.45': 1.6693091,
'minute.16.46': 1.566343,
'minute.16.47': 1.22322}}
So that when self.collection.update(query, data, upsert=True) is run it updates those fields.