MongoDb java driver projection performance - mongodb

I have encounter probably a problem using MongoDB like this. I have 860000 documents in a collection and have 500 collections like this. I have 3 columns, first and second field is type of Array contains 10 elements, third is type of Int64 that keeps currentTimeMillis. When i query 1000 document from one table it tooks ~2500 ms. But when i execute same query getting only first elements of two fields (using $slice operator for Array) (each other contains 10 elements), it takes ~2000 ms. This looks weird. MongoDB is in remote host, so i take approximately 10 times smaller data from network but it takes almost same amount of time. Any thoughts?

Problem turns into this :) When i query 1000 documents using collection.find(whereQuery), it takes ~2400ms. But when i query 13 documents using same code, it takes ~1500ms. Data taken 100 times smaller but time even not half. Am i missing something.

Related

Is there a way to know whether .limit() actually removes any documents?

Using mongocxx driver (c++ project).
Working on a mongodb query to paginate some results from a query. I'm trying to return the first 10 results, while also informing whether or not there are more results to fetch with another query (using an offset) - so as to inform the recipient if there are more documents to fetch. The results are stored in a std::vector after the db find query.
Is there any elegant way to do this, preferably without returning all the result documents and then comparing the vector size to the specified page limit?
Current query (without specifics):
db.collection.find({"<some_field>" : <some value>}).limit(10);
This, however, will not inform whether or not any documents were removed, in the case that exactly 10 results were found.
Currently I'm simply returning the full vector of results and looping through it, breaking if the loop goes over 10 iterations (and setting a "more_items" bool to true).
You have 2 ways to do this:
Count all documents found by query:
db.collection.count({"<some_field>" : <some value>});
And then if there is more documents than you need (10 in here) - you can set "more_items" bool to true
Find and set limit to +1 (11 in here):
db.collection.find({"<some_field>" : <some value>}).limit(11);
That way you find 11 documents or less.
If you find 11 documents - this indicates that you have more documents than 10 (actual limit). If you find less than 11 - then you don't have documents to reach actual limit.

Couchbase N1QL Query getting distinct on the basis of particular fields

I have a document structure which looks something like this:
{
...
"groupedFieldKey": "groupedFieldVal",
"otherFieldKey": "otherFieldVal",
"filterFieldKey": "filterFieldVal"
...
}
I am trying to fetch all documents which are unique with respect to groupedFieldKey. I also want to fetch otherField from ANY of these documents. This otherFieldKey has minor changes from one document to another, but I am comfortable with getting ANY of these values.
SELECT DISTINCT groupedFieldKey, otherField
FROM bucket
WHERE filterFieldKey = "filterFieldVal";
This query fetches all the documents because of the minor variations.
SELECT groupedFieldKey, maxOtherFieldKey
FROM bucket
WHERE filterFieldKey = "filterFieldVal"
GROUP BY groupFieldKey
LETTING maxOtherFieldKey= MAX(otherFieldKey);
This query works as expected, but is taking a long time due to the GROUP BY step. As this query is used to show products in UI, this is not a desired behaviour. I have tried applying indexes, but it has not given fast results.
Actual details of the records:
Number of records = 100,000
Size per record = Approx 10 KB
Time taken to load the first 10 records: 3s
Is there a better way to do this? A way of getting DISTINCT only on particular fields will be good.
EDIT 1:
You can follow this discussion thread in Couchbase forum: https://forums.couchbase.com/t/getting-distinct-on-the-basis-of-a-field-with-other-fields/26458
GROUP must materialize all the documents. You can try covering index
CREATE INDEX ix1 ON bucket(filterFieldKey, groupFieldKey, otherFieldKey);

A bad performance of upserting item to a million-document collection

It takes 700~800 ms to upsert an item into a collection, which is containing about 2 million documents. I have tried the functions as following,
Model.findOneAndUpdate()
bulk.find({...}).upsert().updateOne()
But both of them takes about almost 1 second to upsert ONE item.
I have another 1 million items to insert/upsert, so it will takes me several days. How can I improve it?
Adding an Index for the querying item will accelerate the process.

Remove given number of records in Mongodb

I have Too much records in my Collection, can I have only desired number of records and remove others without any condition?
I have a collection called Products with around 10,0000 of records and its slowing down my Local application, I am thinking to shrink this huge amount of records to something around 1000, How can do it?
OR
How to copy a collection with limited number of records?
If you want to copy collection with limited number of records without any filter condition, for loop can be used . It copies 1000 document form originalCollection to copyCollection.
db.originalCollection.find().limit(1000).forEach( function(doc)
{db.copyCollection.insert(doc)} );

what is the Recommended max emits in map function?

I am new to mongoDb and planning to use map reduce for computing large amount of data.
As you know we have map function to match the criteria and then emit the required data for a given filed. In my map function I have multiple emits. As of now I have 50 Fields emitted from a single document. That means from a single document in a collection explodes to 40 document in temp table. So if I have 1 million documents to be processed it will 1million * 40 documents in temp table by end of map function.
The next step is to sort on this collection. (I haven't used sort param of map will it help?)
Thought of splitting the map function into two….but then one more problem …while performing map function if by chance I ran into an exception thought of skipping entire document data (I.e not to emit any data from that document) but if I split I won't be able to….
In mongoDB.org i found a comment which said..."When I run MR job, with sort - it takes 1.5 days to reach 23% at first stage of MR. When I run MR job, without sort, it takes about 24-36 hours for all job.Also when turn off jsMode is speed up my MR twice ( before i turn off sorting )"
Will enabling sort help? or Will turning OFF jsmode help? i am using mongo 2.0.5
Any suggestion?
Thanks in advance .G
The next step is to sort on this collection. (I haven't used sort param of map will it help?)
Don't know what you mean, MR's don't have sort params, only the incoming query has a sort param. The sort param of the incoming query only sorts the data going in. Unless you are looking for some specific behaviour that will avoid sorting the final output using an incoming sort you don't normally need to sort.
How are you looking to use this MR. Obviusly it won't be in realtime else you would just kill your servers so Ima guess it is a background process that runs and formats data to the way you want. I would suggest looking into incremental MRs so that you do delta updates throughout the day to limit the amount of resources used at any given time.
So if I have 1 million documents to be processed it will 1million * 40 documents in temp table by end of map function.
Are you emiting multiple times? If not then the temp table should have only one key per row with documents of the format:
{
_id: emitted_id
[{ //each of your docs that you emit }]
}
This is shown: http://kylebanker.com/blog/2009/12/mongodb-map-reduce-basics/
or Will turning OFF jsmode help? i am using mongo 2.0.5
Turning off jsmode is unlikely to do anything significant and results from it have varied.