mongodb background task - mongodb

If possible, i'd like to run a find & remove query on non-indexed columns in "background", without disturbing other tasks or exhausting memory to the detriment of others.
For indexing, there is a background flag. Can the same be appended for find/remove tasks?
Thanks for a tip

This is not something you can use "background:true" for. Possibly the best way to handle this is to write a script that does this in the background. This script should run your operation in small batches with some delay in between. In pseudo code you would do:
find 10 docs you need to update
update those 10 docs
sleep
goto first step.
You will have to experiment with which value for sleep works. You do need to realize that all documents that you are updating need to be pulled into memory, so it will have at least some impact.

No, there is not a background:true flag for this operation. The remove will yield when page faults occur and allow other operations to execute. If you need to throttle this, then you can either remove in smaller batches or use a find/remove pattern which will lower the impact to other operations.

Related

Unlocking a collection after aborting a reIndex() command in MongoDB?

I was attempting to reduce the size of my indexes on a mongo collection and ran db.collection.reIndex().
After about 90 minutes, I began to think it had somehow gotten locked up and tried to cancel. Now (about 2 hours after cancelling) the collection appears to be locked to all write commands. All my other collections are allowing writes. Is there any way to unlock it?
The period of time that it takes to perform this operation is going is dependent on a few things, namely:
The size of the collection.
The number of indexes in that collection.
This is a blocking operation.
Simply put, a small database (less than 500MB) should only take a few minutes to reindex whereas a larger database (5-10GB or more) could take much longer ... with increasing length as the database size increases.
While it is best to let the procedure finish, if you absolutely needed to stop it, then restarting the process would be the way to do it. Also, send in a support ticket to: support#mongohq.com (including the name of the database) and the team can help more there.

Update or Delete which is fast?

I am using mongoDB for an application. This application requires high frequency of read, write and update.
I am just concerned about update and delete functions. Which one is fast among these two. I am indexing the collection on one attribute. Update and Delete both fulfils my purpose, but I am not sure which one is perfect and have better performance.
I would suggest that rather than deciding on whether you use Update or Delete for your solution, you look more on the SafeMode attribute.
SafeMode.True indicates that you are expecting a response from the server that will contain among other things, a confirmation of whether the command succeeded or failed. This option blocks the execution until you receive a response from the server.
SafeMode.False will not expect any response, and it is basically an optimistic command. You expect for it to work, but have no way to confirm it. Waiting for the response does not block the execution, therefore, you gain performance because all you need to do is to send the request.
Now you need to consider that Deletes will free us space on the server, but you will lose history and traceability of the data. Updates will allow you to keep historic entries, but you will need to make sure your queries exclude the 'marked for deletion' entries.
It is obviously up to you to find whether a Delete or Update is better, but I think the focus should be on whether you use SafeMode true or false to improve performance.
A rather odd question but here are the things you can base your decision on :
Deleting will keep the collection at an optimum size. Updating (I assume you mean something like setting a deleted flag to true) will result in an ever growing collection which eventually will make things slower.
In-place updates (updates that do not result in the document having to be moved due to an increase in size) are always faster than updates or deleted that require documents to be (re)moved.
Safe = false writes will significantly improve throughput of updates and deletes at the expense of not being able to check if the update/remove was succesful.

Sphinx UPDATE performance

Sphinx 2.0.1 brings with it the ability to call UPDATE and update an individual item in an index.
Does anyone know what type of performance this brings to sphinx when called VERY frequently (as frequently as several hundred times a second)? The reason for this would be to keep a real time index of trending item scores which get updated every time a user performs an action. Obviously when there are lots of users this value can be update quite frequently.
EDIT:
I should mention that I am not using SphinxSE.
You are talking about sphinx rt indices... Updates are fast, but remember, this type of indices do not support enable_star. This means you can't perform searches like appl*.
Such attributes are stored in memory. So updates should be really fast.
But I've never benchmarked it. So try benchmarking it!
... although to be honest I would still be tempted to 'batch process' it. Write the actions to a log "file", and then process that log in batches. Maybe every 10 seconds. All actions on the same record can be run as one update statement.

Core Data executing a "sub query"

I would like to execute some kind of subquery with my fetchedresultscontroller.
I've got a set of items which have a flag like "viewed" or "not viewed". Is it possible to switch between these items... Sure I could do a complete refetch but this takes some time.
Is there a better way for doing this?
Many thanks!
One option would be to have two versions of your NSFetchedResultsController, one for viewed and one for not viewed. The trick is to make sure they use different cache files. This will allow the switching to be nearly instantaneous once the initial population of the cache is complete.
You can even set it up so that only one of these is in memory at a time to keep the overhead low. The trick is to make sure the cache names and fetch requests are consistent so that you do not trigger a cache reset.

Is it possible to pause an SQL query?

I've got a really long running SQL query (data import, etc). It's crap - it uses cursors and it running slowly. It's doing it, so I'm not too worried about performance.
Anyways, can I pause it for a while (instead of canceling the query)?
It chews up a a bit of CPU so i was hoping to pause it, do some other stuff ... then resume it.
I'm assuming the answer is 'NO' because of how rows and data gets locked, etc.
I'm using Sql Server 2008, btw.
The best approximation I know for what you're looking for is
BEGIN
WAITFOR DELAY 'TIME';
EXECUTE XXXX;
END;
GO
Not only can you not pause it, doing so would be bad. SQL queries hold locks (for transactional integrity), and if you paused the query, it would have to hold any locks while it was paused. This could really slow down other queries running on the server.
Rather than pause it, I would write the query so that it can be terminated, and pick up from where it left off when it is restarted. This requires work on your part as a query author, but it's the only feasible approach if you want to interrupt and resume the query. It's a good idea for other reasons as well: long running queries are often interrupted anyway.
Click the debug button instead of execute. SQL 2008 introduced the ability to debug queries on the fly. Put a breakpoint at convenient locations
When working on similar situations, where I was trying to go through an entire list of data, which could be huge, and could tell which ones I have visited already, I would run the processing in chunks.
update or whatever
where (still not done)
limit 1000
And then I would just keep running the query until there are no rows being modified. This breaks the locks up into reasonable time chunks and can allow you to do thinks like move tens of millions of rows between tables while a system is in production.
Jacob
Instead of pausing the script, perhaps you could use resource governor. That way you could allow the script to run in the background without severely impacting performance of other tasks.
MSDN-Resource Governor