How do I get N random documents from Cosmos DB using the MongoDB API? - mongodb

I'm using azure cosmos DB acting as MongoDB version 4.0.0. I need to get N random documents from Azure Cosmos DB using the MongoDB API. I've tried using the $sample operator but it is giving me documents in the same order each time I query.
db.collectionName.aggregate([{$sample: {size: 1}}])
No matter how many times I run this query I'm getting the same document from the collection.

I can't find any document on how $sample execute, but I think the problem we met may have relationship with what mentioned here. That means if some conditions are not met, it will get the result and sort them, then return the size count of items. So what we get always the same. I'll try to find more details, and any further find will update.

Related

mongodb - retrieve multiple documents efficiently

I want to download ~100K records from mongodb.
Using pymongo.
find() returns cursor, which, i guess, go to the db (cloud) for each single doc. This takes ~0.003 per document, not so good.
Is there some way to perform "fetch all" or something like that, to reduce run time?

MongoDB - A very Simple query is taking high response time

We have an issue with MongoDB query performance. There is a collection in MongoDb with approx 30 properties and total documents are near to 2000. Below is an example with one collection "deal", facing similar issue with almost all the collections in my db. Please note that i have created indexes, even compound indexes based on query use cases, and there is no COLLSCAN determined in any of the query. I have tried this with MongoDb 3.6 as well as 4.x, the result is same.
db.getCollection("deals").find({isActive:true, branch: ObjectId("5af1de276fcd080007ed79fd")})
enter image description here

Using MapReduce as a Stage in Mongo DB Aggregation Pipeline

I want to use Mongo DB MapReduce functionality along with Aggregation Query.
The below are the stages which I see could be part of the Aggregation pipeline.
Filter docs for which the user has access based on content in the docs and
passed security context(roles of the user)
(Using $REDACT)
Filter based on one or more criteria (Using MATCH)
Tokenize the words in returned docs based on above filtering and populate a
collection (Using MAPREDUCE) (OR) return the docs inline
Query the populated collection/returned docs inline for words based on user
criteria using like query(REGEX) and return the words along with their
locations
I able to achieve steps 1,2 and 4 in the aggregation pipeline.
I am able to achieve Stage 3, separately by using mapreduce functionality in Mongo DB.
I want to make the mapreduce operation also as a stage in the aggregation pipeline and use it to receive the filtered docs from the earlier steps and pass the processed result to next step.
The mapreduce operation is based on sample map and reduce operation. I intend to use the map , reduce and finalize functions as shared in the below stackoverflow issue.
Implement auto-complete feature using MongoDB search
My query is I do now know if we can have MapReduce operation as part of the Mongo DB aggregation pipeline and if so can we use as inline and pass it to the next stage.
I am using Spring Data Mongo DB to implement the Mongo DB aggregation solution.
If someone has implemented the same please help me on this.

MongoDB is aggregation discourage over simple query

We are running a MongoDB instance for some of our price data, and I would like to find the most recent price update for each product that I have in the database.
Coming from a SQL background my initial thought was that to create an query with a subquery, where the subquery is a group by query. In the subquery price updates are grouped by the product and then one can find the most recent update for each price update.
I talked to a colleague about this approach and he claimed that in the official training material from MongoDB it is said that one should prefer simple queries over aggregated ones. i.e. he would run a query for each product and then find the most recent price update by ordering them by the update date. So that the number of queries will be linear in comparison to the number of products.
I do agree that it is simpler to write such a query, instead of an aggregated one, but I would have thought that performance wise it would have been faster to go through the collection once and find the queries i.e. the number of queries will be constant in comparison to the number of products.
He claims also that mongodb also will be able to better do optimization when running simple queries when running in a cluster.
Anybody know if that is the case?
I tried to search on the internet and I cannot find such a claim that one should prefer simple queries over aggregated ones.
Another colleague of mine was also thinking that it may be the case that since MongoDB are a new technology, then maybe aggregation queries have not been optimized for clustered MongoDB instances.
Anybody who can shed some light on these matters?
Thanks in advance
Here is some information on the aggregation pipeline on a sharded MongoDb implementation
Aggregation Pipeline and Sharded Collections
Assuming you have the right indexes in place on your collections, you shouldn't have any problems using MongoDB aggregation.

mongodb - keeping track of aggregated documents

I have a mongodb collection that stores raw information coming from an app. I wrote a multi-pipeline aggregation method to generate more meaningful data from the raw documents.
Using the $out operator in my aggregation function I store the aggregation results in another collection.
I would like to be able to either delete raw documents that were already aggregated, or somehow mark those documents so I know not to aggregate again.
I am worried that I cannot guaranty I won't miss out some documents that are created in between or create duplicate aggregated documents.
Is there a way to achieve this?