i have a problem for a simple select and update logic :
task = Task.queueing.where(conditions).order(:created_at.asc).first
if task
task.set(:status=>2)
end
it's simple right ?
BUT, the problem is : 100+ requests coming in the same time have.
SO many client got the same record, that's what i DONT want.
in mysql, i can do some thing like this to avoid duplicate load:
rnd_str = 10000000 * rand
Task.update(status:rnd_str).limit(1) # this may be wrong code
task = Task.where(status:rnd_str).first
task.set(status:2)
render :json=>task
BUT HOW TO UPDATE 1 RECORD WITH QUERY IN mongomapper ?
thx !
An update in MongoDB will only update one record by default. You have an option to update all the matching documents in a query with {multi:true} option but by default your update will only update one document.
So what you have to do is combine your "query" into your update statement so they execute atomically (just like you do it in SQL) and not do two separate operations. In shell syntax, something like:
db.queueing.update({conditions}, {$set:{status:2}})
Now, if you also need the task document you updated to work with then you can use findAndModify to update and return the document in one atomic operation. Like this:
task = db.queueing.findAndModify( {your-condition},
sort: { your-ordering },
update: { $set: { status: 2 } }
} );
Related
We need to cache records for a service with a terrible API.
This service provides us with API to query for data about our employees, but does not inform us whether employees are new or have been updated. Nor can we filter our queries to them for this information.
Our proposed solution to the problems this creates for us is to periodically (e.g. every 15 minutes) query all our employee data and upsert it into a Mongo database. Then, when we write to the MongoDb, we would like to include an additional property which indicates whether the record is new or whether the record has any changes since the last time it was upserted (obviously not including the field we are using for the timestamp).
The idea is, instead of querying the source directly, which we can't filter by such timestamps, we would instead query our cache which would include said timestamp and use it for a filter.
(Ideally, we'd like to write this in C# using the MongoDb driver, but more important right now is whether we can do this in an upsert call or whether we'd need to load all the records into memory, do comparisons, and then add the timestamps before upserting them....)
There might be a way of doing that, but how efficient that is, still needs to be seen. The update command in MongoDB can take an aggregation pipeline to perform an update operation. We can use the $addFields stage of MongoDB to add a new field denoting the update status, and we can use $function to compute its value. A short example is:
db.collection.update({
key: 1
},
[
{
"$addFields": {
changed: {
"$function": {
lang: "js",
"args": [
"$$ROOT",
{
"key": 1,
data: "somedata"
}
],
"body": "function(originalDoc, newDoc) { return JSON.stringify(originalDoc) !== JSON.stringify(newDoc) }"
}
}
}
}
],
{
upsert: true
})
Here's the playground link.
Some points to consider here, are:
If the order of fields in the old and new versions of the doc is not the same then JSON.stringify will fail.
The function specified in $function will run on the server-side, so ideally it needs to be lightweight. If there is a large number of users, that will get upserted, then it may or may not act as a bottleneck.
I am looking to a way to FindAndModify not more than 5 documents in MongoDB.
This is collection for queue which will be processed from multiple workers, so I want to put it into single query.
While I cannot control amount of updates in UpdateOptions parameter, is it possible to limit number of rows which will be found in filterDefinition?
Problem 1: findAndModify() can only update a single document at a time, as per the documentation. This is an inherent limit in MongoDB's implementation.
Problem 2: There is no way to update a specific number of arbitrary documents with a simple update() query of any kind. You can update one or all depending on the boolean value of your multi option, but that's it.
If you want to update up to 5 documents at a time, you're going to have to retrieve these documents first then update them, or update them individually in a foreach() call. Either way, you'll either be using something like:
db.collection.update(
{_id: {$in: [ doc1._id, doc2._id, ... ]}},
{ ... },
{multi: true}
);
Or you'll be using something like:
db.collection.find({ ... }).limit(5).forEach(function(doc) {
//do something to doc
db.collection.update({_id: doc._id}, doc);
});
Whichever approach you choose to take, it's going to be a workaround. Again, this is an inherent limitation.
I have a >6M documents collection in mongodb. And one of it's fields (field1 and field2 in the example below) are string values (type 2 in mongodb).
My problem is that I want to parse them into float values (all the values are parseFloat-able). I found this snippet in SO. But it doesn't seems to be a great solution to deal with a 6M document collection.
db.collection.find({field1: {$type:1}}).forEach(function(data) {
db.collection.update(
{_id:data._id},
{$set:{
field1: parseFloat(data.field1),
field2: parseFloat(data.field2)}
}
)
})
Is there any way I can convert my two fields without slowing down the server ?
Using db.collection.getIndexes() and db.collection.getIndexKeys() says that my two fields are indexed.
If your goal is to prevent server slowdown then I would introduce a sleep on the client end between updates. You can adjust the timeout depending on how much load you want to reduce and your patience for the updates to complete. To force a sleep call "sleep(ms)" in the mongo shell where "ms" is the number of milliseconds you would like to sleep for.
Instead of using forEach and executing one update command per document, have you tried setting the 4th argument of update to true so that you can perform a multi-document update?
I am trying to implement a fairly simple queue using MongoDB. I have a collection which a number of dumb workers needs to process. Each worker should search the collection for unprocessed work and then execute it.
The way I decide which work is unprocessed is based on a simple calculation.
Basically I have a collection of jobs which need to be performed at specific intervals, where the interval is stored in each document as interval, the worker will scan the collection for documents which have not been updated for at least the interval time.
An example of a document (_id field omitted) is:
{
updated: 0360,
interval: 60,
work: "an object representing the work"
}
What I want is an atomic/blocking query (there are multiple workers) which returns a batch of documents where updated + interval < currentTime, where currentTime is the time on the database server, as well as sets the updated field to currentTime.
In other words:
find: updated + interval < currentTime
return a batch of these, say 30
set: updated = currentTime
Any help is greatly appreciated!
Since MongoDB does not support transactions, you can't safely put a pessimistic lock on a batch of items, unless you have a separate document for that -- more on that at the end.
Let's start with the query: You can't query for sth. like 'where x + y < z' in MongoDB. Instead, you'll have to use a field for the next due date, e.g. nextDue:
{
"nextDue": "420",
"work": { ... }
}
Now each worker can fetch a couple of items (NOTE: this is all pseudo-code, not a specific programming language):
var result = db.queue.find( { "nextDue": { $gt, startTime } }).limit(50);
// hint: you can do a random skip here to decrease the chances of collisions
// between workers.
foreach(rover in result)
{
// pessimistic locking: '-1' indicates this is in progress.
// I'd recommend a flag instead, however...
var currentItem = db.queue.findAndModify({ "_id" : rover.id, "nextDue" : {$gt, startTime}}, {$set : {"nextDue" : -1}});
if(currentItem == null)
continue; // hit a lock: another worker is processing this already
// ... process job ...
db.queue.findAndModify({ "_id" : rover.id, "nextDue" : "-1"}, {$set : {"nextDue" : yourNextDue }});
}
There are essentially two methods I see for the pessimistic locking of multiple documents. One is to create bucket for the documents you're trying to lock, put the job descriptors in the bucket and process those buckets. Since now, the bucket is a single object, you can rely on the atomic modifiers.
The other one is to use two-phase commit, which also creates another object for the transaction, but does not require you to move your documents into a different document. However, this is a somewhat elaborate pattern.
The pseudocode I presented above worked very well in two applications, but in both applications the individual jobs took quite some time to execute (half a second to several hours).
Looking for similar functionality to Postgres' Distinct On.
Have a collection of documents {user_id, current_status, date}, where status is just text and date is a Date. Still in the early stages of wrapping my head around mongo and getting a feel for best way to do things.
Would mapreduce be the best solution here, map emits all, and reduce keeps a record of the latest one, or is there a built in solution without pulling out mr?
There is a distinct command, however I'm not sure that's what you need. Distinct is kind of a "query" command and with lots of users, you're probably going to want to roll up data not in real-time.
Map-Reduce is probably one way to go here.
Map Phase: Your key would simply be an ID. Your value would be something like the following {current_status:'blah',date:1234}.
Reduce Phase: Given an array of values, you would grab the most recent and return only it.
To make this work optimally you'll probably want to look at a new feature from 1.8.0. The "re-reduce" feature. Will allow you to process only new data instead of re-processing the whole status collection.
The other way to do this is to build a "most-recent" collection and tie the status insert to that collection. So when you insert a new status for the user, you update their "most-recent".
Depending on the importance of this feature, you could possibly do both things.
Current solution that seems to be working well.
map = function () {emit(this.user.id, this.created_at);}
//We call new date just in case somethings not being stored as a date and instead just a string, cause my date gathering/inserting function is kind of stupid atm
reduce = function(key, values) { return new Date(Math.max.apply(Math, values.map(function(x){return new Date(x)})))}
res = db.statuses.mapReduce(map,reduce);
Another way to achieve the same result would be to use the group command, which is a kind of a mr-shortcut that lets you aggregate on a specific key or set of keys.
In your case it would read like this:
db.coll.group({ key : { user_id: true },
reduce : function(obj, prev) {
if (new Date(obj.date) < prev.date) {
prev.status = obj.status;
prev.date = obj.date;
}
},
initial : { status : "" }
})
However, unless you have a rather small fixed amount of users I strongly believe that a better solution would be, as previously suggested, to keep a separate collection containing only the latest status-message for each user.