MongoDB: verifying document was not updated - mongodb

We would like to apply some auditing in our current project. For that we created a scenario but I don't see how to make point 1 and 2 atomic.
Scenario
Every document has to have a timestamp that will server
as a version. When saving a document we will:
Verify document was not changed - first compare the timestamps of the latest document docLatest
with the document we would like to store docUpdated. The timestamps must be equal.
If not, save request is refused.
If ok, go to next point.
Update the document
Create diff with the previous doc - The latest document must be our last
document. We will create a diff and store it.

I stumble upon this idea once. My idea will utilize long_polling technique. I am not going to tell you how to architect your data, but you can convert date to numeric value, and compare by it.
for 1 and 2, you can convert Date-format to number, the schema will look something like below:
var document= {
updatedAt: { type: Number, default: Date.parse(new Date()) }
}
then for every document submitted by client, just check, if the
if(latestDocument.updatedAt - prevDocument.updatedAt > 0) {
//if latest's timestamp is bigger than prev, then store it in mongodb
} else {
//if latest document is the same or even older, just ignore this document
}
for number 3. I found that, if the document has changed, do you even need to diff it? I decide to follow react/flux's method, if the document has changed just replaced the whole document.

Related

Remove document if timestamp is too old

I know there is possible to remove some documents when the saved date is passed away with this command:
db.course.deleteMany({date: {"$lt": ISODate()}})
I trying to do the same thing but I'm trying to check is a timestamps is passed away
All saved timestamps are like this one
1492466400000
Is it possible to make a command with a condition to delete all documents with a too old timestamp?
EDIT
I use milliseconds timestamps
In MongoDB you 3.2 you can also use TTL Indexes where you can say to MongoDB "please remove all the Documents if the {fieldDate} is more older then 3600 seconds". This is pretty useful for Logging (remove all logs more older then 3 months).
Maybe is not your use case, but I think is pretty good to know.
TL indexes are special single-field indexes that MongoDB can use to automatically remove documents from a collection after a certain amount of time or at a specific clock time. Data expiration is useful for certain types of information like machine generated event data, logs, and session information that only need to persist in a database for a finite amount of time.
db.eventlog.createIndex( { "lastModifiedDate": 1 }, { expireAfterSeconds: 3600 } )
https://docs.mongodb.com/v3.2/core/index-ttl/
I find the way to do this
db.course.deleteMany({date: {"$lt": ISODate()*1}})
ISODate()*1 condition will convert the Date into a timestamp
and
ISODate(timestamp) convert the timestamps into a Date.

mongodb upsert with conditional field update

I have a script that populates a mongo db from daily server log files. Log files come from a number of servers so the chronological order of the data is not guaranteed. To make this simple, let's say that the document schema is this:
{
_id: <username>,
first_seen: <date>,
last_seen: <date>,
most_recent_ip: <string>
}
that is, documents are indexed by the name of the user who accessed the server. For each user, we keep track of the first time the user was seen and the ip from the last visit.
Right now I handle this very inefficiently: first try an insert. If it fails, retrieve a record by _id, then calculate updated values (e.g. first_seen and most_recent_up), and finally update the record. This is 3 db calls per log entry, which makes the script's running time prohibitively long given the very high volume of data.
I'm wondering if I can replace this with an upsert instead. I can see how to handle first/last_seen: probably something like {$min: {'first_seen': <log_entry_date>}} (hope this works correctly when inserting a new doc). But how do I set most_recent_ip to the new value only when <log_entry_date> > $last_seen.
Is there generally a preferred pattern for my use case?
You can just use $set to set the most_recent_ip, e.g.
db.logs.update(
{_id:"user1"},
{$set:{most_recent_ip:"2.2.2.2"}, $min:{first_seen:new Date()}, $max:{last_seen:new Date()}},
{upsert: true}
)

File versioning with GridFS

I'm trying to store versioned content in mongo DB with GridFS. Therefore I add a version field to the metadata of the file I'm storing. This all works well. Now I want to get the latest version without knowing the version. Here: Find the latest version of a document stored in MongoDB - GridFs someone mentions that findOne always returns the youngest (latest) file if matching the query. What is, what I want. But when I try this, I always get the first (oldest) file from findOne(). I'm using spring-data-mongodbversion 1.5.0.RELEASE
Here my current code:
public void storeFileToGridFs(ContentReference contentReference, InputStream content) {
Integer nextVersion = findLatestVersion(contentReference) + 1;
DBObject metadata = new BasicDBObject();
metadata.put("version", nextVersion);
metadata.put("definitionId", contentReference.getContentDefinitionId());
gridOperations.store(content, contentReference.getContentId().getValue(), metadata);
}
and to find the latest version:
private Integer findLatestVersion(ContentReference contentReference) {
Query query = new Query(GridFsCriteria.whereFilename().is(contentReference.getContentId().getValue()));
GridFSDBFile latestVersionRecord = gridOperations.findOne(query);
if (latestVersionRecord != null) {
Integer version = (Integer) latestVersionRecord.getMetaData().get("version");
return version;
} else return 0;
}
But, as already mentioned, the findLatestVersion() always returns 1 (except the first time, when it returns 0...
If I have this running, is there a way to only retrieve the metadata of the document? In findLatestVersion()it's not necessary to load the file itself.
findOne returns exactly one result, more specifically the first one in the collection matching the query.
I am not too sure whether the latest version is returned when using findOne. Please try find instead.
A more manual approach would be filtering a result set from querying for the file name for the highest value of version.
In general, the version field only shows how often a document was changed. It is used for something which is called optimistic locking, which works by checking the current version of a document against the one the changed document has. If the version in the database is higher than the one in the document to be saved, another process has made changes to the document and an exception is raised.
For storing versioned documents, git (via egit for example) might be a solution.
EDIT: After a quick research, here is how it works. File versioning should be done using the automatically set upload date from the metadata. Query for it, sort descending and use the first result. You do not need to set the version manually any more.
I know it's been a while since this question has been asked and I don't know whether the code has been the same back then, but I think this information may help future readers:
Looking at the source code shows that findOne completely ignores the sorting part defined in the query, while find actually makes use of it.
So you need to make a normal query with find and then select the first object found (refer to Markus W Mahlberg's answer for more information).
Try adding sorting to the query, like this:
GridFSDBFile latestVersionRecord = template.findOne(
new Query(GridFsCriteria.whereFilename().is(filename))
.with(new Sort(Sort.Direction.DESC, "version")));
once you have the GridFSDBFile, you can easily retrieve metadata without loading whole file with the method:
DBObject metadata = latestVersionRecord.getMetaData();
Hope it helps!

`mongo` Query To Purge Old Entries

Anyone have a handy mongo command to remove all entries from a DB that are older than X date/ X days?
Basically have a dev and production DB, I'm looking to prune the dev DB out a bit to limit size.
Thanks for the help!
You could do something like this from the mongo shell.
var older=Date.parse("2013-03-01"),collection=db.so,all=collection.find();
all.forEach(function(doc) { var ts = doc._id.getTimestamp();
if (ts < older) { collection.remove(doc); } });
The above line (which you'd paste into the shell) will delete all documents in the specified collection (collection=db.so) created before the first of March, 2013. It relies on the fact that each ObjectId has an embedded timestamp (based on the timestamp of document creation (docs)), which can be retrieved and used.
You could of course change the query to look for a specific timestamp field in a document.
if (doc.timestampField < older) { collection.remove(doc); } })
You can use the mongo concept of deleting data after some specified amount of time, Expire Data from Collections by Setting TTL.
Please refer below link to do so
http://docs.mongodb.org/manual/tutorial/expire-data/

Does updating a MongoDB record rewrites the whole record or only the updated fields?

I have a MongoDB collection as follows:
comment_id (number)
comment_title (text)
score (number)
time_score (number)
final_score (number)
created_time (timestamp)
Score is and integer that's usually updated using $inc 1 or -1 whenever someone votes up or down for that record.
but time_score is updated using a function relative to timestamp and current time and other factors like how many (whole days passed) and how many (whole weeks passed) .....etc
So I do $inc and $dec on db directly but for the time_score, I retrieve data from db calculate the new score and write it back. What I'm worried about is that in case many users incremented the "score" field during my calculation of time_score then when I wrote time_score to db it'll corrupt the last value of score.
To be more clear does updating specific fields in a record in Mongo rewrites the whole record or only the updated fields ? (Assume that all these fields are indexed).
By default, whole documents are rewritten. To specify the fields that are changed without modifying anything else, use the $set operator.
Edit: The comments on this answer are correct - any of the update modifiers will cause only relevant fields to be rewritten rather than the whole document. By "default", I meant a case where no special modifiers are used (a vanilla document is provided).
The algorithm you are describing is definitely not thread-safe.
When you read the entire document, change one field and then write back the entire document, you are creating a race condition - any field in the document that is modified after your read but before your write will be overwritten by your update.
That's one of many reasons to use $set or $inc operators to atomically set individual fields rather than updating the entire document based on possibly stale values in it.
Another reason is that setting/updating a single field "in-place" is much more efficient than writing the entire document. In addition you have less load on your network when you are passing smaller update document ({$set:{field:value}}, rather than entire new version of the document).