MongoDB - Preserve order when using sample - mongodb

I want to have pagination and sampling.
I have a client that requests entities in a random order.
I want to implement a new feature - pagination.
The only way I can think of is to use match to ensure the entities to be served do not have ID's of the entities already served, but this sounds like an expensive operation.
Is there a way to randomise the order of entities in a collection, and save it in a way? So that I can use skip on it?

How about using a combination of $sample and $out?
db.collection.aggregate({
$sample: { size: 300 } // get 300 random documents
}, {
$out: "temp" // write results to a new "temp" collection
})

Related

When doing an upsert to MongoDb is it possible to set a field with a timestamp only if other data in the record has changed?

We need to cache records for a service with a terrible API.
This service provides us with API to query for data about our employees, but does not inform us whether employees are new or have been updated. Nor can we filter our queries to them for this information.
Our proposed solution to the problems this creates for us is to periodically (e.g. every 15 minutes) query all our employee data and upsert it into a Mongo database. Then, when we write to the MongoDb, we would like to include an additional property which indicates whether the record is new or whether the record has any changes since the last time it was upserted (obviously not including the field we are using for the timestamp).
The idea is, instead of querying the source directly, which we can't filter by such timestamps, we would instead query our cache which would include said timestamp and use it for a filter.
(Ideally, we'd like to write this in C# using the MongoDb driver, but more important right now is whether we can do this in an upsert call or whether we'd need to load all the records into memory, do comparisons, and then add the timestamps before upserting them....)
There might be a way of doing that, but how efficient that is, still needs to be seen. The update command in MongoDB can take an aggregation pipeline to perform an update operation. We can use the $addFields stage of MongoDB to add a new field denoting the update status, and we can use $function to compute its value. A short example is:
db.collection.update({
key: 1
},
[
{
"$addFields": {
changed: {
"$function": {
lang: "js",
"args": [
"$$ROOT",
{
"key": 1,
data: "somedata"
}
],
"body": "function(originalDoc, newDoc) { return JSON.stringify(originalDoc) !== JSON.stringify(newDoc) }"
}
}
}
}
],
{
upsert: true
})
Here's the playground link.
Some points to consider here, are:
If the order of fields in the old and new versions of the doc is not the same then JSON.stringify will fail.
The function specified in $function will run on the server-side, so ideally it needs to be lightweight. If there is a large number of users, that will get upserted, then it may or may not act as a bottleneck.

Storing MongoDB query in the database

I have a collection subscribers.
I want to get a segment of subscribers by applying sometimes complex filters in the query db.subscribers.find({ age: { $gt: 20 }, ...etc }), but I don't want to save the result, since that would be inefficient.
Instead, I would like to save only the filters applied in the query as a set of rules in the segments collection.
Is that a good approach and what would be an efficient way to do that?
Should I just save the query object itself as a document or define a more restrictive schema before saving?

Query documents, update it and return it back to MongoDB

In my MongoDB 3.2 based application I want to perform the documents processing. In order to avoid the repeated processing on the same document I want to update its flag and update this document in the database.
The possible approach is:
Query the data: FindIterable<Document> documents = db.collection.find(query);.
Perform some business logic on these documents.
Iterate over the documents, update each document and store it in a new collection.
Push the new collection to the database with db.collection.updateMany();.
Theoretically, this approach should work but I'm not sure that it is the optimal scenario.
Is there any way in MongoDB Java API to perform the followings two operations:
to query documents (to get them from the DB and to pass to the separate method);
to update them and then store the updated version in DB;
in a more elegant way comparing to the proposed above approach?
You can update document inplace using update:
db.collection.update(
{query},
{update},
{multi:true}
);
It will iterate over all documents in the collection which match the query and updated fields specified in the update.
EDIT:
To apply some business logic to individual documents you can iterate over matching documents as following:
db.collection.find({query}).forEach(
function (doc) {
// your logic business
if (doc.question == "Great Question of Life") {
doc.answer = 42;
}
db.collection.save(doc);
}
)

How to create an index in MongoDB which calls a JS function via system.js?

I have two collections viz. whitelist (id, count, expiry) and blacklist (id).
Now i would like to create an index such that when count>=200 then call a JS function which will remove the document from whitelist and add the id to blacklist.
So can i do this in Mongo using db.collection.createindex({"count":1}, ???);
or do i need to write a daemon to scan the entire collection? or is there any better method for the same?
You seem to be asking for what in a SQL relational database we would call a "trigger", which is something completely different from an "index" even in that world.
In the NoSQL world typically and especially with MongoDB, that sort of "server logic" is relegated to the "client" code operations rather than the server. Think of it as another part of the "scalability" philosphy of these products, where certain functions like "triggers" are taken away due to the stance that these "cost" a lot with distributed data.
So in order to do what you want you do it in "code" instead of defining a database "trigger". The process is simple enough, via .findAndModify() and other wrapping variants available to langauge API's:
// Increment below 200 and return the modified document
var doc = db.whitelist.findAndModify({
"query": { "_id": myId, "count": { "$lt": 200 } }
"update": { "count": { "$inc": 1 } },
"new": true
});
// Then remove the blacklist where the value meets conditions
if ( doc.hasOwnProperty("count") {
if ( doc.count >= 200 )
db.blacklist.remove({ "_id": myId });
}
Be careful with the actual language API method variant as the structure typically differs fromt the "query/update" keys as is provided in the shell method.
The basic principles remain the same. Modifiy and fetch, then remove from the other collection if your conditions are met. But it is "two" trips to the server, and there is no way to make the server "trigger" when such a condition is met by itself.
db.whitelist.insert(doc);
if(db.whitelist.find(criterion).count() >= 200) {
var bulkRemove = db.whitelist.initializeUnorderedBulkOp();
var bulkInsert = db.blacklist.initializeUnorderedBulkOp();
db.whitelist.find(criterion).forEach(
function(doc){
bulkInsert.insert({_id:doc._id});
bulkRemove.find({doc._id}).removeOne();
}
);
bulkInsert.execute();
bulkRemove.execute();
}
First, you insert the document as usual. Since criterion is going to use an index, the if clause should be determined fast and efficiently.
In case we have 200 or more documents matching that criterion, we use bulk operations to insert the ids into the blacklist and remove the documents from the whitelist, which will be executed in parallel.
The problem with only writing the _id to the backlist is that you need to check wether the criterion for being blacklisted is matched, so the _id needs to contain that criterion.
A better solution IMHO is to flag entries of a single collection using a field named blacklisted for individual entries or to use the aggregation framework to find blacklisted documents and write them to an a collection using the out pipeline stage. Sadly, you didn't give example data or a proper description of your use case, so you get an unspecified answer.

Limiting documents in MongoDB dynamically

I'm looking to limit the number of documents in a collection dynamically. For example, I will store upto 1000 documents for GOLD members, and 500 for SILVER members. When the maximum is reached for a particular user and a new document is created by them, the oldest document belonging to that user should be deleted.
Capped collections aren't suitable for something like this so I was wondering if it is up to me to add my own logic to implement this "queue type" feature or whether there is some tried and tested approach out there somewhere?
Use a pre save middleware that performs the "capping" on every save.
schema.pre('save', function(next) {
model.find({}, {
sort: { $natural: -1 }, // sort in reverse
limit: 50, // to get last 50 docs
}).remove();
next();
});