How to achieve isolation in mongodb 4.0? - mongodb

I'm new to Mongodb.
I'd like to understand if it's possible or not to achieve full isolation for reads and updates.
For example, I have the following flow:
Count documents from a collection based on some filter.
Based on the result, update another document.
So basically I want to understand how to prevent a concurrent write operation to enter between 1 & 2 .
BTW I'm using the java driver 3.12
Thanks
Shira

Latest versions of mongoDB has transactions features, maybe you can check that.
https://docs.mongodb.com/manual/core/transactions/

Related

MongoDB integration with Solr

I am beginner with mongodb and its integraiton with Solr. From different posts I got an idea about the integration steps. But need info on the below
I have the data in mongodb, for faster retrieval we are integrating it with Solr.
Solr indexes all mongodb entries. Is this indexing one time activity after integration or Do we need to periodically update Solr to index the entries which got inserted after the integration ?
If we need to periodically update solr, it becomes an extra overhead to maintain it in Solr as well along with mongodb. Best approaches on overcoming it.
As far as I know you do not have official(supported/complete) solution to integrate MongoDB and Solr, but let me give you some ideas/direction.
For me the best approach is when it is possible to modify the application and add to the persistence layer the fact that you have all writes operations done in MongoDB and Solr in the "same" time. Like that you can control exactly what you want to send to the Database and what you want to index for a full text operation. But as I said this means that you have to change your application code. (You will have anyway to change it to be able to query Solr when needed). And yes you have to index all the existing documents the first time
You can use a "connector" approach where MongoDB and Solr are kind of connected together, this could be done in various ways.
You can use for example the MongoDB Connector available here : https://github.com/10gen-labs/mongo-connector
LucidWorks, the company behind Solr has also a connector for MongoDB, documented here : http://docs.lucidworks.com/display/help/Create+a+New+MongoDB+Data+Source# (I have not used it so cannot comment, but it is also an approach)
You point #2 is true, you have to manage two clusters and be sure the data are in sync, and sometimes pay the price of inconsistency between the Solr index and the document just updated in MongoDB... So you need to see if the best approach for your application is to use MongoDB alone or MongoDB with Solr (see comment below)
Just a small comment in addition to this answer:
You are talking about "faster retrieval", not sure it should be the reason, if you write correct queries with correct indexes in MongoDB you should be able to do it without Solr. If you requirement is really oriented towards the power of solr meaning: full text index (with all related features it makes sense)
How large is your data? MongoDB has a few good indexing mechanism of its own.
There is a powerful geo-api and for full text search there is http://docs.mongodb.org/manual/core/index-text/. So it would be ideal to identify if your need fits into MongoDB or you need to spill over to SOLR.
About the indexing part. How often if your data updated? If you can afford to have infrequent updates, then a batch job with once a day re-indexing may work for you. Ideally SOLR would work well for some form of master data.

How to overcome the limitations with mongoDB aggregation framework

The aggregation framework on MongoDB has certain limitations as per this link.
I want to remove the restrictions 2, 3.
I really do not care what the resulting set's size is. I have a lot of RAM and resources.
And I do not care if it takes more than 10% system resources.
I expect both 2, 3 to be violated in my application. Mostly 2.
But I really need the aggregation framework. Is there anything that I can do to remove these limitations?
The reason *
The application I have been working has these things
The user has the ability to upload a large dataset
We have a menu to let him sort, aggregate etc
The aggregate has no restrictions currently and the user can choose to do whatever he wants. Since the data is not known to the developer and since it is possible to group by any number of columns, the application can error out.
Choosing something other than mongodb is a no go. We have already sunk too much into development with MongoDB
Is it advisable to change the source code of Mongo?
1) Saving aggregated values directly to some collection(like with MapReduce) will released in future versions, so first solution is just wait for a while :)
2) If you hit 2-nd or 3-rd limitation may you should redesign your data scheme and/or aggregation pipeline. If you working with large time series, you can reduce number of aggregated docs and do aggregation in several steps (like MapReduce do). I can't say more concretely, because I don't know your data/use cases(give me a comment).
3) You can choose different framework. If you familiar with MapReduce concept, you can try Hadoop(it can use MongoDB as data source). I don't have experience with MongoDB-Hadoop integration, but I mast warn you NOT to use Mongo's MapReduce -- it sucks hard on large datasets.
4) You can do aggregation inside your code, but you should use some "lowlevel" language or library. For example, pymongo (http://api.mongodb.org/python/current/) is not suitable for such things, but you can tray something like monary(https://bitbucket.org/djcbeach/monary/wiki/Home) to efficiently extract date and NumPy or Pandas to aggregate it the way want.

Why mongodb only updates the first matching document in the collection?

Consider a collection student contains the following documents.
{name:”Nithin”,age:23}
{name:”Nithin”,age:25}
{name:”Nithin”,age:28}
{name:”Nithin”,age:12}
I want to update all the documents whose name is “Nithin” as age=60.
If we execute the following query it will only update the first document.
db.student.update({name:”Nithin”},{age:60})
For update all the documents I have to use the query
db.student.update({name:”Nithin”},{age:60},false,true)
or
db.student.update({name:”Nithin”},{age:60},multi:true)
What is the reason by default mongodb not updating all the documents by executing db.student.update({name:”Nithin”},{age:60}) ? What is the motivation for creating separate queries for updating all the documents? Is it improving the performance?
Originally, in the early early days of MongoDB (pre 1.1) it was not possible to update multiple documents. This was a feature added around 1.1.3.
You can see it in the release notes, New Feature 268.
I'm guessing this was not enabled by default for backwards compatibility with previous versions.
This may not really be the reason but I find the additional multi parameter as a safeguard to prevent accidental update of multiple records when one intends to update a single document only, something like accidentally performing UPDATE...SET on SQL without specifying additional constraints.
Again this is just an assumption but may not really be the case.
I suppose part of the reason might be to avoid people coming from the SQL world to think about multi-document updates as isolated transactions.
In fact, during a long update MongoDB will periodically yield control to other queries which can potentially modify the same dataset.
So, by explicitly setting multi=true you are somewhat acknowledging this fact (well, not really, but I guess that's the spirit...)

Perishable Documents?

Is there any way that MongoDB handles the deletion of documents automaticaly based on time?
I use Tornado (Python) and the IOLoop can do it, but it checks always for time, and the life of those documents is high (6 mounth),
So can Mongodb do it without making a clickable script or using the IOLoop?
Yes, it's a feature (new in 2.1) called TTL (time-to-live) indexes.
You can read all about them including examples here:
http://docs.mongodb.org/manual/tutorial/expire-data/

Random order in Django

I created my databases with mongodb, then I created a model in django and now I want order_by('?') order randomly, but the order does not change.
I am using django 1.4.1.
Thanks.
The MongoDB server (as at 2.2) does not have support for returning query results in random order.
One possible workaround, using a Random Attribute, is described in the MongoDB Cookbook.
Another less performant option would be to use a combination of count, skip, and limit with to find a random document.
You can vote or watch SERVER-533 in the MongoDB issue tracker, which is a feature request for getting random items from a collection. There is some further discussion on the Jira issue as well.