Morphia is there a difference between fetch and asList in performance wise - mongodb

We are using morphia 0.99 and java driver 2.7.3 I would like to learn is there any difference between fetching records one by one using fetch and retrieving results by asList (assume that there is enough memory to retrieve records through asList).
We iterate over a large collection, while using fetch I sometimes encounter cursor not found exception on the server during the fetch operation, so I need to execute another command to continue, what could be the reason for this?
1-)fetch the record
2-)do some calculation on it
3-)+save it back to database again
4-)fetch another record and repeat the steps until there isn't any more records.
So which one would be faster? Fetching records one by one or retrieving bulks of results using asList, or isn't there any difference between them using morphia implementation?
Thanks for the answers

As far as I understand the implementation, fetch() streams results from the DB while asList() will load all query results into memory. So they will both get every object that matches the query, but asList() will load them all into memory while fetch() leaves it up to you.
For your use case, it neither would be faster in terms of CPU, but fetch() should use less memory and not blow up in case you have a lot of DB records.

Judging from the source-code, asList() uses fetch() and aggregates the results for you, so I can't see much difference between the two.

One very useful difference would be if the following two conditions applied to your scenario:
You were using offset and limit in the query.
You were changing values on the object such that it would no longer be returned in the query.
So say you were doing a query on awesome=true, and you were using offset and limit to do multiple queries, returning 100 records at a time to make sure you didn't use up too much memory. If, in your iteration loop, you set awesome=false on an object and saved it, it would cause you to miss updating some records.
In a case like this, fetch() would be a better approach.

Related

Updating data in Mongo sorted by a particular field

I posted this question on Software Engineering portal without conducting any tests. It was also brought to my notice that this needs to be posted on SO, not there. Thanks for the help in advance!
I need Mongo to return the documents sorted by a field value. The easiest way to achieve this would be running the command db.collectionName.find().sort({field:priority}), however, I tried this method on a dummy collection of 1000 documents; it runs in 22ms. I also tried running db.collectionName.find() on the same data, it runs in 3ms, which means that Mongo is taking time to sort and return the documents (which is understandable). Both tests were done in the same environment and were done by adding .explain("executionStats") to the query.
I will be working with a large amount of data and concurrent requests to access DB, so I need the querying to be faster. My question is, is there a way to always keep the data sorted by a field in the DB so that I don't have to sort it over and over for all requests? For instance, some sort of update command that could sort the entire DB once a week or so?
A non-unique index with that field in this collection will give the results you're after and avoid the inefficient in-memory sorting.

Is it possible to run queries on 200GB data on mongodb with 16GB RAM?

I am trying to run a simple query to find number of all records with a particular value using:
db.ColName.find({id_c:1201}).count()
I have 200GB of data. When I run this query, mongodb takes up all the RAM and my system starts lagging. After an hour of futile waiting, I give up without getting any results.
What can be the issue here and how can I solve it?
I believe the right approach in the NoSQL world isn't trying to perform a full query like that, but accumulate stats overtime.
For example, you should have a collection stats with arbitrary objects which should own a kind or id property that can take a value like "totalUserCount". Whenever you add an user, you also update this count.
This way you'll get instant results. It's just getting a property value in a small collection of stats.
BTW, this slowness should be originated by querying objects by a non-indexed property in your collection. Try to index id_c and probably you'll get quicker results.
That amount of data can easily be managed by MySQL, MSSQL or Oracle with the given hardware specification. You don't need a NoSQL database for that, NoSQL databases are made for much larger storing needs which actually require lots of hardware (RAM, harddisks) to be efficient.
You need to define an index to read that id and use a normal SQL database.

Want to retrieve records from mongodb in batchwise

I am trying to retrieve records from Mongodb whose count is approx up to 50,000 but when I execute the query Java runs out of Heap space and server crashes down.
Following is my code ;
List<FormData> forms = ds.find(FormData.class).field("formId").equal(formId).asList();
Can anyone help me syntactically to fetch records in batchwise from mongodb.
Thanks in advance
I am not sure if the Java implementation has this but in the c# version there is a setBatchSize method.
Using that I could do
foreach(var item in coll.find(...).setBatchSize(1000)) {
}
This will fetch all items the find matches but it will not fetch all at once but ratgher 1000 in each batch. You code will not see this "batching" as it is all handled within the enumeration. Once the loop tries to get the 1001 item, another batch will be fetched from the mongodb server.
This should lessen heap space problem.
http://docs.mongodb.org/manual/reference/method/cursor.batchSize/
You could still have other problems depending on what you do within the loop, but that will be under your control.
Fetching 50k entries doesn't sound like a good idea with any database.
Depending on your use case, you might want to change your query or work with limit and offset:
ds.find(FormData.class)
.field("formId").equal(formId)
.limit(20).offset(0)
.asList();
Note that a range based query is more efficient than working with limit and offset.

Unable to fetch more than 10k records

I am developing an app where I have more than 10k records added to a class in parse. Now I am trying to fetch those records using PFQuery( I am using the "skip" property). But I am unable to fetch records beyond 10k and I get the following error message
"Skips larger than 10000 are not allowed"
This is a big problem for me since I need all the data.
Has anybody come across such problem. Please share your views.
Thanks
The problem is indeed due to the cost of mongo skip operations. You can formulate a query such that you don't need the skip operator. My preferred method is to orderBy objectId and then add a condition that objectId > last yielded objectId. This type of query can be indexed and remain fast, unlike skip pagination, which has a O(N^2) cost in seeks.
My assumption would be that it's based on performance issues with MongoDB's skip implementation.
The cursor.skip() method is often expensive because it requires the server to walk from the beginning of the collection or index to get the offset or skip position before beginning to return result. As offset (e.g. pageNumber above) increases, cursor.skip() will become slower and more CPU intensive. With larger collections, cursor.skip() may become IO bound.

Mongodb count vs findone

My problem is: There is a collection of users. Im trying to find, does user with _id=xxx has somevalue > 5.
I'm wondering, what will be faster, using find(...).count() > 0 or findOne(...) != null? Or maybe there is some other, faster/better way?
The difference between the query times should be almost negligible, because they both limit the bounds of the unique _id, so they will be found immediately. The only slight edge here goes to the count because the db will return a int instead of an entire document. So the time you save is purely because of the transfer of the data from db to client.
That being said, if your goal is to do an exists query and you don't care about the data, use the count
Using count() does not pin the document, in contrary to find(), to the cache, which saves you memory. Memory is your most valuable resource, when using mongodb or any other DB... besides a fast disk.