I've answered this question on LinkedIn and I thought it's something useful and interesting to share. The question was:
"Suppose we have documents like {_id: ..., data: ..., timestamp: ...}.
Is there any way to write update criteria which will satisfy following rules:
1 If there is no documents with following _id then insert this document;
2 If there is exists document with following _id then
2.1 If new timestamp greater then stored timestamp then update data;
2.2 Otherwise do nothing"
Solution below should do the trick, you just need to ignore dup key errors. Example is given in Mongo shell:
> var lastUpdateTime = ISODate("2013-09-10")
> var newUpdateTime = ISODate("2013-09-12")
>
> lastUpdateTime
ISODate("2013-09-10T00:00:00Z")
> newUpdateTime
ISODate("2013-09-12T00:00:00Z")
>
> var id = new ObjectId()
> id
ObjectId("52310502f3bf4823f81e7fc9")
>
> // collection is empty, first update will do insert:
> db.testcol.update(
... {"_id" : id, "ts" : { $lt : lastUpdateTime } },
... { $set: { ts: lastUpdateTime, data: 123 } },
... { upsert: true, multi: false }
... );
>
> db.testcol.find()
{ "_id" : ObjectId("52310502f3bf4823f81e7fc9"), "data" : 123, "ts" : ISODate("2013-09-10T00:00:00Z") }
>
> // try one more time to check that nothing happens (due to error):
> db.testcol.update(
... {"_id" : id, "ts" : { $lt : lastUpdateTime } },
... { $set: { ts: lastUpdateTime, data: 123 } },
... { upsert: true, multi: false }
... );
E11000 duplicate key error index: test.testcol.$_id_ dup key: { : ObjectId('52310502f3bf4823f81e7fc9') }
>
> var tooOldToUpdate = ISODate("2013-09-09")
>
> // update does not happen because query condition does not match
> // and mongo tries to insert with the same id (and fails with dup again):
> db.testcol.update(
... {"_id" : id, "ts" : { $lt : tooOldToUpdate } },
... { $set: { ts: tooOldToUpdate, data: 999 } },
... { upsert: true, multi: false }
... );
E11000 duplicate key error index: test.testcol.$_id_ dup key: { : ObjectId('52310502f3bf4823f81e7fc9') }
>
> // now query cond actually matches, so update rather than insert happens which works
> // as expected:
> db.testcol.update(
... {"_id" : id, "ts" : { $lt : newUpdateTime } },
... { $set: { ts: newUpdateTime, data: 999 } },
... { upsert: true, multi: false }
... );
>
> // check that everything worked:
> db.testcol.find()
{ "_id" : ObjectId("52310502f3bf4823f81e7fc9"), "data" : 999, "ts" : ISODate("2013-09-12T00:00:00Z") }
>
The only annoying part are those errors, but they are cheap and safe.
db.collection.update({
_id: ObjectId("<id>"))
},
{timestamp: <newTimestamp>, data: <data>},
{upsert: true})
This operation would update an existing document if it meets the condition that it exists and that the existing timestamp is less than the newTimestamp; otherwise will insert a new document.
Related
Assume one document in student collection as follows:
{
_id:{
time: 1
loc: 'A'
},
name: 'Jery'
}
I want to filter _id.time, like so:
db.student.find({}, {'_id.time':1})
But I got a result:
{
_id:{
time: 1
loc: 'A'
}
}
Why the result is not as follows:
{
_id:{
time: 1
}
}
So, where is my error? And how to write the query operation.
Thank you.
Just tried a few things but it seems to be a limitation:
> db.students.find({}, {'_id':0, '_id.time': 1})
{ }
{ }
> db.students.find({}, {'_id.loc':0, '_id.time': 1})
Error: error: {
"ok" : 0,
"errmsg" : "Projection cannot have a mix of inclusion and exclusion.",
"code" : 2,
"codeName" : "BadValue"
}
You can however easily achieve this with an aggregation query with a $project stage.
> db.students.aggregate( { $project: { '_id.time':1 } } )
{ "_id" : { "time" : 1 } }
{ "_id" : { "time" : 2 } }
Say I have the fields a and b. I want to have a compound uniqueness where if a: 1, b: 2, I would not be able to do a: 2, b: 1.
The reason I want this is because I'm making a "friends list" kind of collection, where if a is connected to b, then it's automatically the reverse as well.
is this possible on a schema level or do I need to do queries to check.
If you don't need to differentiate between requester and requestee, you could sort the values before saving or querying so that your two fields a and b have a predictable order for any pair of friend IDs (and you can take advantage of the unique index constraint).
For example, using the mongo shell:
Create a helper function to return friend pairs in predictable order:
function friendpair (friend1, friend2) {
if ( friend1 < friend2) {
return ({a: friend1, b: friend2})
} else {
return ({a: friend2, b: friend1})
}
}
Add a compound unique index:
> db.friends.createIndex({a:1, b:1}, {unique: true});
{
"createdCollectionAutomatically" : true,
"numIndexesBefore" : 1,
"numIndexesAfter" : 2,
"ok" : 1
}
Insert unique pairs (should work)
> db.friends.insert(friendpair(1,2))
WriteResult({ "nInserted" : 1 })
> db.friends.insert(friendpair(1,3))
WriteResult({ "nInserted" : 1 })
Insert non-unique pair (should return duplicate key error):
> db.friends.insert(friendpair(2,1))
WriteResult({
"nInserted" : 0,
"writeError" : {
"code" : 11000,
"errmsg" : "E11000 duplicate key error collection: test.friends index: a_1_b_1 dup key: { : 1.0, : 2.0 }"
}
})
Search should work in either order:
db.friends.find(friendpair(3,1)).pretty()
{ "_id" : ObjectId("5bc80ed11466009f3b56fa52"), "a" : 1, "b" : 3 }
db.friends.find(friendpair(1,3)).pretty()
{ "_id" : ObjectId("5bc80ed11466009f3b56fa52"), "a" : 1, "b" : 3 }
Instead of handling duplicate key errors or insert versus update, you could also use findAndModify with an upsert since this is expected to be a unique pair:
> var pair = friendpair(2,1)
> db.friends.findAndModify({
query: pair,
update: {
$set: {
a : pair.a,
b : pair.b
},
$setOnInsert: { status: 'pending' },
},
upsert: true
})
{
"_id" : ObjectId("5bc81722ce51da0e4118c92f"),
"a" : 1,
"b" : 2,
"status" : "pending"
}
Doesn't seem like you can do a unique on the entire array's values so I'm doing a kind of work around. I'm using the $jsonSchema as follows:
{
$jsonSchema:
{
bsonType:
"object",
required:
[
"status",
"users"
],
properties:
{
status:
{
enum:
[
"pending",
"accepted"
],
bsonType:
"string"
},
users:
{
bsonType:
"array",
description:
"references two user_id",
items:
{
bsonType:
"objectId"
},
maxItems:
2,
minItems:
2,
},
}
}
}
then I will use $all to find the connected users, e.g.
db.collection.find( { users: { $all: [ ObjectId1, ObjectId2 ] } } )
I am new to MongoDB so this is probably a basic question (hopefully). I currently have 10 million records with 410 fields loaded in a mongodb collection like so:
{
"_id" : ObjectId("........"),
"AddressID" : 123455,
"IndividualId" : 1,
"personfirstname" : "FirstName",
"personmiddleinitial" : "M",
"personlastname" : "LastName",
"etc": "....."
}
I need to wrap all of this data into an embedded document like so:
{
"_id" : ObjectId("........"),
"data" : {
"AddressID" : 123455,
"IndividualId" : 1,
"personfirstname" : "FirstName",
"personmiddleinitial" : "M",
"personlastname" : "LastName",
"etc": "....."
}
I don't necessarily need to update this data in-place but that would be nice. If I need to export this data somehow specifying the new format and then re-import the new, updated data that is fine. Performing this via the MongoDB shell would be ideal.
As suggested by chridam within comments you can execute the following aggregation pipeline:
db.collectionName.aggregate([
{ $project: { _id: "$_id", data: "$$ROOT" } },
{ $out: "newCollectionName" }
]);
This way you have the _id field both at root level and in the data object. Thus, you can execute a massive update to unset the second one:
db.newCollectionName.updateMany(
{},
{ $unset: { "data._id": "" } }
);
Finally, you can drop the first collection and rename the second to restore the original name on the updated collection:
db.collectionName.drop();
db.newCollectionName.rename("collectionName");
This approach fully works within the database, avoiding fetching any of your 10 million documents.
You can simply do this in the shell with the following
db.test.find().forEach(function(doc){
doc = { _id: doc._id, data: doc };
delete doc.data._id;
db.test.save(doc);
});
For example, if we insert the following documents:
> db.test.insertMany([
... {
... _id: ObjectId("5a91af8908e17c5997e03b7e"),
... field1: false,
... field2: 0,
... field3: "No"
... },
... {
... _id: ObjectId("5a91afbc08e17c5997e03b7f"),
... field1: true,
... field2: 1,
... field3: "Yes"
... }])
{
"acknowledged" : true,
"insertedIds" : [
ObjectId("5a91af8908e17c5997e03b7e"),
ObjectId("5a91afbc08e17c5997e03b7f")
]
}
Then run:
db.test.find().forEach(function(doc){
doc = { _id: doc._id, data: doc };
delete doc.data._id;
db.test.save(doc);
});
Our documents now look like this:
> db.test.find().pretty()
{
"_id" : ObjectId("5a91af8908e17c5997e03b7e"),
"data" : {
"field1" : false,
"field2" : 0,
"field3" : "No"
}
}
{
"_id" : ObjectId("5a91afbc08e17c5997e03b7f"),
"data" : {
"field1" : true,
"field2" : 1,
"field3" : "Yes"
}
}
I'm trying to implement a 'server-side counter-versioned' item in mongodb and trying to do following /* using Java API */
Document dbDoc = dbCollection.findOneAndUpdate(
new Document("_id", "meta"),
new Document("$inc", new Document("version", 1))
.append("$setOnInsert", new Document("version", 0)),
new FindOneAndUpdateOptions().upsert(true)
.returnDocument(ReturnDocument.AFTER));
Assumed logic is simple: if there is no record in database - start counting from zero (and with a whole fresh new object), otherwise - increment counter.
Code sample fails with: 'Cannot update 'version' and 'version' at the same time'
My assumption is that in 'upsert' mode mongo should only use "$setOnInsert" when no matching item is found - but it works in some other way.
Is it possible to implement such operation in one atomic mongoDB call?
PS: MongoDB documentation regarding findOneAndUpdate() and upsert() is fuzzy - at least I cannot get why this error arizes from their description.
Also there is similar question here - findAndModify fails with error: "Cannot update 'field1' and 'field1' at the same time - accepted, but again with no clear reasoning.
You can just remove the update operator of $setOnInsert as this will be set to the value specified in the $inc operator if the document does not exist
https://docs.mongodb.com/v3.2/reference/operator/update/inc/#behavior
If the field does not exist, $inc creates the field and sets the field to the specified value.
Example from the mongo shell:
> db.dropDatabase()
{ "ok" : 1 }
> db.test.findOneAndUpdate({_id: "meta"}, { $inc: { version: 1} }, {upsert: true, returnNewDocument: true})
{ "_id" : "meta", "version" : 1 }
> db.test.findOneAndUpdate({_id: "meta"}, { $inc: { version: 1} }, {upsert: true, returnNewDocument: true})
{ "_id" : "meta", "version" : 2 }
> db.test.findOneAndUpdate({_id: "meta"}, { $inc: { version: 1} }, {upsert: true, returnNewDocument: true})
{ "_id" : "meta", "version" : 3 }
> db.test.findOneAndUpdate({_id: "meta"}, { $inc: { version: 1} }, {upsert: true, returnNewDocument: true})
{ "_id" : "meta", "version" : 4 }
> db.test.findOneAndUpdate({_id: "meta"}, { $inc: { version: 1} }, {upsert: true, returnNewDocument: true})
{ "_id" : "meta", "version" : 5 }
If you need to set a given version on the first initial insert, then mongodb does not support any operators to support this atomically, however the follow would be safe and is a common workaround:
> db.dropDatabase()
{ "dropped" : "test", "ok" : 1 }
> function updateMeta(){
... function update(){
... return db.test.findOneAndUpdate({_id: "meta"}, { $inc: { version: 1} }, {returnNewDocument: true});
... }
...
... var result = update();
...
... if(result === null){
... db.test.insert({_id: "meta", version: -10});
... result = update();
... }
...
... return result;
... }
>
> updateMeta()
{ "_id" : "meta", "version" : -9 }
> updateMeta()
{ "_id" : "meta", "version" : -8 }
> updateMeta()
{ "_id" : "meta", "version" : -7 }
> updateMeta()
{ "_id" : "meta", "version" : -6 }
> updateMeta()
{ "_id" : "meta", "version" : -5 }
>
I created auto-incrementing sequence field via counters collection and getNextSequence() function (absolutely like in docs)
According another document JavaScript functions are stored in a special system collection named system.js
But there is no such collection at my database (a least db.system.js.find() shows empty result):
> db.dropDatabase();
{ "dropped" : "mongopa", "ok" : 1 }
> version()
3.2.5
> db.counters.insert({_id: "userid", seq: 0 })
WriteResult({ "nInserted" : 1 })
> db.counters.find()
{ "_id" : "userid", "seq" : 0 }
> function getNextSequence(name) {
... var ret = db.counters.findAndModify(
... {
... query: { _id: name },
... update: { $inc: { seq: 1 } },
... new: true
... }
... );
...
... return ret.seq;
... }
> db.system.js.find()
> show collections
counters
> db.users.insert({"login":"demo","user_id":getNextSequence("userid"),"password":"demo"})
WriteResult({ "nInserted" : 1 })
> db.users.find()
{ "_id" : ObjectId("574ff1c7436a1b4f9c6f47b9"), "login" : "demo", "user_id" : 1, "password" : "demo" }
> db.users.insert({"login":"demo2","user_id":getNextSequence("userid"),"password":"demo2"})
WriteResult({ "nInserted" : 1 })
> db.users.find()
{ "_id" : ObjectId("574ff1c7436a1b4f9c6f47b9"), "login" : "demo", "user_id" : 1, "password" : "demo" }
{ "_id" : ObjectId("574ff1d6436a1b4f9c6f47ba"), "login" : "demo2", "user_id" : 2, "password" : "demo2" }
>
So where does the getNextSequence function really stored?
When you define the function as,
function getNextSequence(name) {
var ret = db.counters.findAndModify(
{
query: { _id: name },
update: { $inc: { seq: 1 } },
new: true
}
);
return ret.seq;
}
It is merely defined for that particular session and is not available to you once the session ends. Hence, its not saved anywhere.
To make the function re-usable across the sessions, you need to explicitly save the function is system.js by using,
db.system.js.save(
{
_id: "getNextSequence",
value: function(name){var ret = db.counters.findAndModify({
query: { _id: name },
update: { $inc: { seq: 1 } },
new: true
});
return ret.seq;}
})
Once you have saved the function, you can cross-check it by,
db.system.js.find()
You need to call this
db.loadServerScripts();
across the sessions. It loads all the scripts saved in system.js collection.
For details, please check here.