Update field only if document updated in MongoDB - mongodb

I am calling findAndModify() using the $max function to set the value of a field to the largest value.
For example, as shown in the MongoDB documentation.
db.scores.update( { _id: 1 }, { $max: { highScore: 950 } } )
I'd like to also set a lastUpdatedTimestamp only if the document is updated. I can't just perform a $set because that will always change the last updated timestamp. Is there a good mechanism within MongoDB to set another value only if the document is updated? Something similar to $setOnInsert but for any update.
If there isn't what might be a good approach here? Right now I'm thinking I could perform a regular find. Then do a local comparison. If the new value is greater than the old, then there is a good possibility that the update will update the document. So I just include the $set for the lastUpdatedTimestamp.

You can first make a query to find records having highScore less than your input value and then update. This will only set lastUpdatedTimestamp on updating the record.
db.scores.findAndModify({
query: { highScore: { $lt: 950 } },
update: { $set: { "highScore" : 950, "lastUpdatedTimestamp" : new Date() } },
})

as I see you wanted to . update your document only if your highScore can be updated .
only the documents's score is lower than new score value ,it will be updated with score field and lastUpdatedTimestamp
the best way is put your new score in the filter to find the documents match old score < new score
do it like this
db.scores.update(
{_id :4,highScore:{$lt:900}},
{$set:{highScore:900},
$currentDate: { lastModified: true }})
or set the modify time like
{$set:{highScore:900 ,lastupdatetime: new_time},

Related

MongoDb: Insert Document in collection only if collection has no newer document since point in time

I want to depict the following use case using MongoDb:
I want to read from a collection and memorize that particular point in time.
When writing the next time to that collection, I want to not be able to write a new document, if another document has been added to that collection in between.
Using a timestamp property on the documents would be ok.
Is this possible?
One trick is use findAndModify
Assume at the time of reading, your most recent timestamp on a document is oldTimestamp:
db.collection.findAndModify({
query: {timestamp: {$gt: oldTimestamp}},
new: true, // Return modified / inserted document
upsert: true, // Update if match found, insert otherwise
update: {
$setOnInsert: {..your document...}
}
})
This will not insert your document if another document is inserted between your read and write operation.
However, this won't let you know that the document is inserted or not directly.
You should compare returned document with your proposed document to find that out.
In case using nodejs driver, the correct pattern should be:
collection.findAndModify(criteria[, sort[, update[, options]]], callback)
According to the example, our query should be:
db.collection('test').findAndModify(
{timestamp: {$gt: oldTimestamp}}, // query, timestamp is a property of your document, often set as the created time
[['timestamp','desc']], // sort order
{$setOnInsert: {..your document..}}, // replacement, replaces only the field "hi"
{
new: true,
upsert: true
}, // options
function(err, object) {
if (err){
console.warn(err.message); // returns error if no matching object found
}else{
console.dir(object);
}
});
});
This can be achieved, using a timestamp property in every document. You can take a look at the Mongoose Pre Save path validation hook . Using this hook, you can write something like this.
YourSchema.path('timestamp').validate(function(value, done) {
this.model(YourSchemaModelName).count({ timestamp: {$gt : value} }, function(err, count) {
if (err) {
return done(err);
}
// if count exists and not zero hence document is found with greater timestamp value
done(!count);
});
}, 'Greater timestamp already exists');
Sounds like you'll need to do some sort of optimistic locking at the collection level. I understand you are writing new documents but never updating existing ones in this collection?
You could add an index on the timestamp field, and your application would need to track the latest version of this value. Then, before attempting a new write you could lookup the latest value from the collection with a query like
db.collection.find({}, {timestamp: 1, _id:0}).sort({timestamp:-1}).limit(1)
which would project just the maximum timestamp value using a covered query which is pretty efficient.
From that point on, it's up to your application logic to handle the 'conflict'.

documents with tags in mongodb: getting tag counts

I have a collection1 of documents with tags in MongoDB. The tags are an embedded array of strings:
{
name: 'someObj',
tags: ['tag1', 'tag2', ...]
}
I want to know the count of each tag in the collection. Therefore I have another collection2 with tag counts:
{
{
tag: 'tag1',
score: 2
}
{
tag: 'tag2',
score: 10
}
}
Now I have to keep both in sync. It is rather trivial when inserting to or removing from collection1. However when I update collection1 I do the following:
1.) get the old document
var oldObj = collection1.find({ _id: id });
2.) calculate the difference between old and new tag arrays
var removedTags = $(oldObj.tags).not(obj.tags).get();
var insertedTags = $(obj.tags).not(oldObj.tags).get();
3.) update the old document
collection1.update(
{ _id: id },
{ $set: obj }
);
4.) update the scores of inserted & removed tags
// increment score of each inserted tag
insertedTags.forEach(function(val, idx) {
// $inc will set score = 1 on insert
collection2.update(
{ tag: val },
{ $inc: { score: 1 } },
{ upsert: true }
)
});
// decrement score of each removed tag
removedTags.forEach(function(val, idx) {
// $inc will set score = -1 on insert
collection2.update(
{ tag: val },
{ $inc: { score: -1 } },
{ upsert: true }
)
});
My questions:
A) Is this approach of keeping book of scores separately efficient? Or is there a more efficient one-time query to get the scores from collection1?
B) Even if keeping book separately is the better choice: can that be done in less steps, e.g. letting mongoDB calculate what tags are new / removed?
The solution, as nickmilion correctly states, would be an aggregation. Though I would do it with a nack: we'll save its results in a collection. What will do is to trade real time results for an extreme speed boost.
How I would do it
More often than not, the need for real time results is overestimated. Hence, I'd go with precalculated stats for the tags and renew it every 5 minutes or so. That should be well enough, since most of such calls are requested async by the client and hence some delay in case the calculation has to be made on a specific request is negligible.
db.tags.aggregate(
{$unwind:"$tags"},
{$group: { _id:"$tags", score:{"$sum":1} } },
{$out:"tagStats"}
)
db.tagStats.update(
{'lastRun':{$exists:true}},
{'lastRun':new Date()},
{upsert:true}
)
db.tagStats.ensureIndex({lastRun:1}, {sparse:true})
Ok, here is the deal. First, we unwind the tags array, group it by the individual tags and increment the score for each occurrence of the respective tag. Next, we upsert lastRun in the tagStats collection, which we can do since MongoDB is schemaless. Next, we create a sparse index, which only holds values for documents in which the indexed field exists. In case the index already exists, ensureIndex is an extremely cheap query; however, since we are going to use that query in our code, we don't need to create the index manually. With this procedure, the following query
db.tagStats.find(
{lastRun:{ $lte: new Date( ISODate().getTime() - 300000 ) } },
{_id:0, lastRun:1}
)
becomes a covered query: A query which is answered from the index, which tends to reside in RAM, making this query lightning fast (slightly less than 0.5 msecs median in my tests). So what does this query do? It will return a record when the last run of the aggregation was run more than 5 minutes ( 5*60*1000 = 300000 msecs) ago. Of course, you can adjust this to your needs.
Now, we can wrap it up:
var hasToRun = db.tagStats.find(
{lastRun:{ $lte: new Date( ISODate().getTime() - 300000 ) } },
{_id:0, lastRun:1}
);
if(hasToRun){
db.tags.aggregate(
{$unwind:"$tags"},
{$group: {_id:"$tags", score:{"$sum":1} } },
{$out:"tagStats"}
)
db.tagStats.update(
{'lastRun':{$exists:true}},
{'lastRun':new Date()},
{upsert:true}
);
db.tagStats.ensureIndex({lastRun:1},{sparse:true});
}
// For all stats
var tagsStats = db.tagStats.find({score:{$exists:true}});
// score for a specific tag
var scoreForTag = db.tagStats.find({score:{$exists:true},_id:"tag1"});
Alternative approach
If real time results really matter and you need the stats for all the tags, simply use the aggregation without saving it to another collection:
db.tags.aggregate(
{$unwind:"$tags"},
{$group: { _id:"$tags", score:{"$sum":1} } },
)
If you only need the results for one specific tag at a time, a real time approach could be to use a special index, create a covered query and simply count the results:
db.tags.ensureIndex({tags:1})
var numberOfOccurences = db.tags.find({tags:"tag1"},{_id:0,tags:1}).count();
answering your questions:
B): you don't have to calculate the dif yourself use $addToSet
A): you can get the counts via aggregation framework with a combination of $unwind and $count

How to remove duplicates based on a key in Mongodb?

I have a collection in MongoDB where there are around (~3 million records). My sample record would look like,
{ "_id" = ObjectId("50731xxxxxxxxxxxxxxxxxxxx"),
"source_references" : [
"_id" : ObjectId("5045xxxxxxxxxxxxxx"),
"name" : "xxx",
"key" : 123
]
}
I am having a lot of duplicate records in the collection having same source_references.key. (By Duplicate I mean, source_references.key not the _id).
I want to remove duplicate records based on source_references.key, I'm thinking of writing some PHP code to traverse each record and remove the record if exists.
Is there a way to remove the duplicates in Mongo Internal command line?
This answer is obsolete : the dropDups option was removed in MongoDB 3.0, so a different approach will be required in most cases. For example, you could use aggregation as suggested on: MongoDB duplicate documents even after adding unique key.
If you are certain that the source_references.key identifies duplicate records, you can ensure a unique index with the dropDups:true index creation option in MongoDB 2.6 or older:
db.things.ensureIndex({'source_references.key' : 1}, {unique : true, dropDups : true})
This will keep the first unique document for each source_references.key value, and drop any subsequent documents that would otherwise cause a duplicate key violation.
Important Note: Any documents missing the source_references.key field will be considered as having a null value, so subsequent documents missing the key field will be deleted. You can add the sparse:true index creation option so the index only applies to documents with a source_references.key field.
Obvious caution: Take a backup of your database, and try this in a staging environment first if you are concerned about unintended data loss.
This is the easiest query I used on my MongoDB 3.2
db.myCollection.find({}, {myCustomKey:1}).sort({_id:1}).forEach(function(doc){
db.myCollection.remove({_id:{$gt:doc._id}, myCustomKey:doc.myCustomKey});
})
Index your customKey before running this to increase speed
While #Stennie's is a valid answer, it is not the only way. Infact the MongoDB manual asks you to be very cautious while doing that. There are two other options
Let the MongoDB do that for you using Map Reduce
Another way
You do programatically which is less efficient.
Here is a slightly more 'manual' way of doing it:
Essentially, first, get a list of all the unique keys you are interested.
Then perform a search using each of those keys and delete if that search returns bigger than one.
db.collection.distinct("key").forEach((num)=>{
var i = 0;
db.collection.find({key: num}).forEach((doc)=>{
if (i) db.collection.remove({key: num}, { justOne: true })
i++
})
});
I had a similar requirement but I wanted to retain the latest entry. The following query worked with my collection which had millions of records and duplicates.
/** Create a array to store all duplicate records ids*/
var duplicates = [];
/** Start Aggregation pipeline*/
db.collection.aggregate([
{
$match: { /** Add any filter here. Add index for filter keys*/
filterKey: {
$exists: false
}
}
},
{
$sort: { /** Sort it in such a way that you want to retain first element*/
createdAt: -1
}
},
{
$group: {
_id: {
key1: "$key1", key2:"$key2" /** These are the keys which define the duplicate. Here document with same value for key1 and key2 will be considered duplicate*/
},
dups: {
$push: {
_id: "$_id"
}
},
count: {
$sum: 1
}
}
},
{
$match: {
count: {
"$gt": 1
}
}
}
],
{
allowDiskUse: true
}).forEach(function(doc){
doc.dups.shift();
doc.dups.forEach(function(dupId){
duplicates.push(dupId._id);
})
})
/** Delete the duplicates*/
var i,j,temparray,chunk = 100000;
for (i=0,j=duplicates.length; i<j; i+=chunk) {
temparray = duplicates.slice(i,i+chunk);
db.collection.bulkWrite([{deleteMany:{"filter":{"_id":{"$in":temparray}}}}])
}
Expanding on Fernando's answer, I found that it was taking too long, so I modified it.
var x = 0;
db.collection.distinct("field").forEach(fieldValue => {
var i = 0;
db.collection.find({ "field": fieldValue }).forEach(doc => {
if (i) {
db.collection.remove({ _id: doc._id });
}
i++;
x += 1;
if (x % 100 === 0) {
print(x); // Every time we process 100 docs.
}
});
});
The improvement is basically using the document id for removing, which should be faster, and also adding the progress of the operation, you can change the iteration value to your desired amount.
Also, indexing the field before the operation helps.
pip install mongo_remove_duplicate_indexes
create a script in any language
iterate over your collection
create new collection and create new index in this collection with unique set to true ,remember this index has to be same as index u wish to remove duplicates from in ur original collection with same name
for ex-u have a collection gaming,and in this collection u have field genre which contains duplicates,which u wish to remove,so just create new collection
db.createCollection("cname")
create new index
db.cname.createIndex({'genre':1},unique:1)
now when u will insert document with similar genre only first will be accepted,other will be rejected with duplicae key error
now just insert the json format values u received into new collection and handle exception using exception handling
for ex pymongo.errors.DuplicateKeyError
check out the package source code for the mongo_remove_duplicate_indexes for better understanding
If you have enough memory, you can in scala do something like that:
cole.find().groupBy(_.customField).filter(_._2.size>1).map(_._2.tail).flatten.map(_.id)
.foreach(x=>cole.remove({id $eq x})

Mongo add timestamp field from existing date field

I currently have a collection with documents like the following:
{ foo: 'bar', timeCreated: ISODate("2012-06-28T06:51:48.374Z") }
I would now like to add a timestampCreated key to the documents in this collection, to make querying by time easier.
I was able to add the new column with an update and $set operation, and set the timestamp value but I appears to be setting the current timestamp using this:
db.reports.update({}, {
$set : {
timestampCreated : new Timestamp(new Date('$.timeCreated'), 0)
}
}, false, true);
I however have not been able to figure out a way to add this column and set it's value to the timestamp of the existing 'timeCreated' field.
Do a find for all the documents, limiting to just the id and timeCreated fields. Then loop over that and generate the timestampCreated value, and do an update on each.
Use updateMany() which can accept aggregate pipelines (starting from MongoDB 4.2) and thus take advantage of the $toLong operator which converts a Date into the number of milliseconds since the epoch.
Also use the $type query in the update filter to limit only documents with the timeCreated field and of Date type:
db.reports.updateMany(
{ 'timeCreated': {
'$exists': true,
'$type': 9
} },
[
{ '$set': {
'timestampCreated': { '$toLong': '$timeCreated' }
} }
]
)

Multiply field by value in Mongodb

I've been looking for a way to create an update statement that will take an existing numeric field and modify it using an expression. For example, if I have a field called Price, is it possible to do an update that sets Price to 50% off the existing value ?
So, given { Price : 19.99 }
I'd like to do db.collection.update({tag : "refurb"}, {$set {Price : Price * 0.50 }}, false, true);
Can this be done or do I have to read the value back to the client, modify, then update ? I guess the question then is can expressions be used in update, and can they reference the document being updated.
You can run server-side code with db.eval().
db.eval(function() {
db.collection.find({tag : "refurb"}).forEach(function(e) {
e.Price = e.Price * 0.5;
db.collection.save(e);
});
});
Note this will block the DB, so it's better to do find-update operation pair.
See https://docs.mongodb.com/manual/core/server-side-javascript/
In the new Mongo 2.6.x there is a $mul operator. It would multiply the value of the field by the number with the following syntax.
{
$mul: { field: <number> }
}
So in your case you will need to do the following:
db.collection.update(
{ tag : "refurb"},
{ $mul: { Price : 0.5 } }
);
Starting Mongo 4.2, db.collection.update() can accept an aggregation pipeline, finally allowing the update of a field based on another field:
// { price: 19.99 }
// { price: 2.04 }
db.collection.update(
{},
[{ $set: { price: { $multiply: [ 0.5, "$price" ] } } }],
{ multi: true }
)
// { price: 9.995 }
// { price: 1.02 }
The first part {} is the match query, filtering which documents to update (all documents in this case).
The second part [{ $set: { price: ... } }] is the update aggregation pipeline (note the squared brackets signifying the use of an aggregation pipeline). $set is a new aggregation operator and an alias of $addFields. Note how price is modified directly based on the its own value ($price).
Don't forget { multi: true }, otherwise only the first matching document will be updated.
Well, this is possible with an atomic operation as $set.
You have several options :
use the eval() solution proposed by pingw33n
retrieve the document you want to modify to get the current value and modify it with a set
if you have a high operation rate, you might want to be sure the focument has not changed during you fetch its value (using the previous solution) so you might want to use a findAndModify (see this page to get inspired on how to do it) operation.
It really depends on your context : with a very low pressure on the db, I'd go for the solution of pingw33n. With a very high operation rate, I'd use the third solution.