Does mongodb provide triggers, like in RDBMS? [duplicate] - mongodb

I'm creating a sort of background job queue system with MongoDB as the data store. How can I "listen" for inserts to a MongoDB collection before spawning workers to process the job?
Do I need to poll every few seconds to see if there are any changes from last time, or is there a way my script can wait for inserts to occur?
This is a PHP project that I am working on, but feel free to answer in Ruby or language agnostic.

What you are thinking of sounds a lot like triggers. MongoDB does not have any support for triggers, however some people have "rolled their own" using some tricks. The key here is the oplog.
When you run MongoDB in a Replica Set, all of the MongoDB actions are logged to an operations log (known as the oplog). The oplog is basically just a running list of the modifications made to the data. Replicas Sets function by listening to changes on this oplog and then applying the changes locally.
Does this sound familiar?
I cannot detail the whole process here, it is several pages of documentation, but the tools you need are available.
First some write-ups on the oplog
- Brief description
- Layout of the local collection (which contains the oplog)
You will also want to leverage tailable cursors. These will provide you with a way to listen for changes instead of polling for them. Note that replication uses tailable cursors, so this is a supported feature.

MongoDB has what is called capped collections and tailable cursors that allows MongoDB to push data to the listeners.
A capped collection is essentially a collection that is a fixed size and only allows insertions. Here's what it would look like to create one:
db.createCollection("messages", { capped: true, size: 100000000 })
MongoDB Tailable cursors (original post by Jonathan H. Wage)
Ruby
coll = db.collection('my_collection')
cursor = Mongo::Cursor.new(coll, :tailable => true)
loop do
if doc = cursor.next_document
puts doc
else
sleep 1
end
end
PHP
$mongo = new Mongo();
$db = $mongo->selectDB('my_db')
$coll = $db->selectCollection('my_collection');
$cursor = $coll->find()->tailable(true);
while (true) {
if ($cursor->hasNext()) {
$doc = $cursor->getNext();
print_r($doc);
} else {
sleep(1);
}
}
Python (by Robert Stewart)
from pymongo import Connection
import time
db = Connection().my_db
coll = db.my_collection
cursor = coll.find(tailable=True)
while cursor.alive:
try:
doc = cursor.next()
print doc
except StopIteration:
time.sleep(1)
Perl (by Max)
use 5.010;
use strict;
use warnings;
use MongoDB;
my $db = MongoDB::Connection->new;
my $coll = $db->my_db->my_collection;
my $cursor = $coll->find->tailable(1);
for (;;)
{
if (defined(my $doc = $cursor->next))
{
say $doc;
}
else
{
sleep 1;
}
}
Additional Resources:
Ruby/Node.js Tutorial which walks you through creating an application that listens to inserts in a MongoDB capped collection.
An article talking about tailable cursors in more detail.
PHP, Ruby, Python, and Perl examples of using tailable cursors.

Check out this: Change Streams
January 10, 2018 - Release 3.6
*EDIT: I wrote an article about how to do this https://medium.com/riow/mongodb-data-collection-change-85b63d96ff76
https://docs.mongodb.com/v3.6/changeStreams/
It's new in mongodb 3.6
https://docs.mongodb.com/manual/release-notes/3.6/ 2018/01/10
$ mongod --version
db version v3.6.2
In order to use changeStreams the database must be a Replication Set
More about Replication Sets:
https://docs.mongodb.com/manual/replication/
Your Database will be a "Standalone" by default.
How to Convert a Standalone to a Replica Set: https://docs.mongodb.com/manual/tutorial/convert-standalone-to-replica-set/
The following example is a practical application for how you might use this.
* Specifically for Node.
/* file.js */
'use strict'
module.exports = function (
app,
io,
User // Collection Name
) {
// SET WATCH ON COLLECTION
const changeStream = User.watch();
// Socket Connection
io.on('connection', function (socket) {
console.log('Connection!');
// USERS - Change
changeStream.on('change', function(change) {
console.log('COLLECTION CHANGED');
User.find({}, (err, data) => {
if (err) throw err;
if (data) {
// RESEND ALL USERS
socket.emit('users', data);
}
});
});
});
};
/* END - file.js */
Useful links:
https://docs.mongodb.com/manual/tutorial/convert-standalone-to-replica-set
https://docs.mongodb.com/manual/tutorial/change-streams-example
https://docs.mongodb.com/v3.6/tutorial/change-streams-example
http://plusnconsulting.com/post/MongoDB-Change-Streams

Since MongoDB 3.6 there will be a new notifications API called Change Streams which you can use for this. See this blog post for an example. Example from it:
cursor = client.my_db.my_collection.changes([
{'$match': {
'operationType': {'$in': ['insert', 'replace']}
}},
{'$match': {
'newDocument.n': {'$gte': 1}
}}
])
# Loops forever.
for change in cursor:
print(change['newDocument'])

MongoDB version 3.6 now includes change streams which is essentially an API on top of the OpLog allowing for trigger/notification-like use cases.
Here is a link to a Java example:
http://mongodb.github.io/mongo-java-driver/3.6/driver/tutorials/change-streams/
A NodeJS example might look something like:
var MongoClient = require('mongodb').MongoClient;
MongoClient.connect("mongodb://localhost:22000/MyStore?readConcern=majority")
.then(function(client){
let db = client.db('MyStore')
let change_streams = db.collection('products').watch()
change_streams.on('change', function(change){
console.log(JSON.stringify(change));
});
});

Alternatively, you could use the standard Mongo FindAndUpdate method, and within the callback, fire an EventEmitter event (in Node) when the callback is run.
Any other parts of the application or architecture listening to this event will be notified of the update, and any relevant data sent there also. This is a really simple way to achieve notifications from Mongo.

Many of these answers will only give you new records and not updates and/or are extremely ineffecient
The only reliable, performant way to do this is to create a tailable cursor on local db: oplog.rs collection to get ALL changes to MongoDB and do with it what you will. (MongoDB even does this internally more or less to support replication!)
Explanation of what the oplog contains:
https://www.compose.com/articles/the-mongodb-oplog-and-node-js/
Example of a Node.js library that provides an API around what is available to be done with the oplog:
https://github.com/cayasso/mongo-oplog

There is an awesome set of services available called MongoDB Stitch. Look into stitch functions/triggers. Note this is a cloud-based paid service (AWS). In your case, on an insert, you could call a custom function written in javascript.

Actually, instead of watching output, why you dont get notice when something new is inserted by using middle-ware that was provided by mongoose schema
You can catch the event of insert a new document and do something after this insertion done

There is an working java example which can be found here.
MongoClient mongoClient = new MongoClient();
DBCollection coll = mongoClient.getDatabase("local").getCollection("oplog.rs");
DBCursor cur = coll.find().sort(BasicDBObjectBuilder.start("$natural", 1).get())
.addOption(Bytes.QUERYOPTION_TAILABLE | Bytes.QUERYOPTION_AWAITDATA);
System.out.println("== open cursor ==");
Runnable task = () -> {
System.out.println("\tWaiting for events");
while (cur.hasNext()) {
DBObject obj = cur.next();
System.out.println( obj );
}
};
new Thread(task).start();
The key is QUERY OPTIONS given here.
Also you can change find query, if you don't need to load all the data every time.
BasicDBObject query= new BasicDBObject();
query.put("ts", new BasicDBObject("$gt", new BsonTimestamp(1471952088, 1))); //timestamp is within some range
query.put("op", "i"); //Only insert operation
DBCursor cur = coll.find(query).sort(BasicDBObjectBuilder.start("$natural", 1).get())
.addOption(Bytes.QUERYOPTION_TAILABLE | Bytes.QUERYOPTION_AWAITDATA);

After 3.6 one is allowed to use database the following database triggers types:
event-driven triggers - useful to update related documents automatically, notify downstream services, propagate data to support mixed workloads, data integrity & auditing
scheduled triggers - useful for scheduled data retrieval, propagation, archival and analytics workloads
Log into your Atlas account and select Triggers interface and add new trigger:
Expand each section for more settings or details.

Related

How to avoid mongo from returning duplicated documents by iterating a cursor in a constantly updated big collection?

Context
I have a big collection with millions of documents which is constantly updated with production workload. When performing a query, I have noticed that a document can be returned multiple times; My workload tries to migrate the documents to a SQL system which is set to allow unique row ids, hence it crashes.
Problem
Because the collection is so big and lots of users are updating it after the query is started, iterating over the cursor's result may give me documents with the same id (old and updated version).
What I'v tried
const cursor = db.collection.find(query, {snapshot: true});
while (cursor.hasNext()) {
const doc = cursor.next();
// do some stuff
}
Based on old documentation for the mongo driver (I'm using nodejs but this is applicable to any official mongodb driver), there is an option called snapshot which is said to avoid what is happening to me. Sadly, the driver returns an error indicating that this option does not exists (It was deprecated).
Question
Is there a way to iterate through the documents of a collection in a safe fashion that I don't get the same document twice?
I only see a viable option with aggregation pipeline, but I want to explore other options with standard queries.
Finally I got the answer from a mongo changelog page:
MongoDB 3.6.1 deprecates the snapshot query option.
For MMAPv1, use hint() on the { _id: 1} index instead to prevent a cursor from returning a document more than once if an intervening write operation results in a move of the document.
For other storage engines, use hint() with { $natural : 1 } instead.
So, from my code example:
const cursor = db.collection.find(query).hint({$natural: 1});
while (cursor.hasNext()) {
const doc = cursor.next();
// do some stuff
}

Watch mongo update events using oplogs

I have to Listen to changes in particular field in mongodb documents and send data to client accordingly
I have tried looking for solutions, but only thing I could find is to query oplogs
(operation logs) of mongodb.
db.collection("oplog.rs", function(err, oplog) {})
Questions:
How are we actually going to make decisions based on oplogs (Capped Collections) and how frequently are we going to query oplogs to see the changes ?
Do we have any alternative solution to this problem may be using mongoose ?
MongoDB keeps track of all the database changes in a collection called "oplogs" (Operation Logs). Oplogs are used when you need to keep track of EACH and EVERY collection of the database. However if you need to keep track of just one collection, you may create tailable cursor on that particular collection using the below code. (Note: Only capped collections can have tailable cursor)
However if you want to use oplogs, In case of multiple MongoDB server instance, oplogs are enabled by default but if you are working on a single instance MongoDB server, you need to enable oplogs in mongoDB by this command in command prompt:
mongod --replSet rs0
Then to get the records from oplogs collection using Mongoose, you can try the following code.
var mongoose = require('mongoose');
var db = mongoose.connect('mongodb://localhost/local'); //local db has oplogs collection
mongoose.connection.on('open', function callback() {
var collection = mongoose.connection.db.collection('oplog.rs'); //or any capped collection
var stream = collection.find({}, {
tailable: true,
awaitdata: true,
numberOfRetries: Number.MAX_VALUE
}).stream();
stream.on('data', function(val) {
console.log('Doc: %j',val);
});
stream.on('error', function(val) {
console.log('Error: %j', val);
});
stream.on('end', function(){
console.log('End of stream');
});
});
Now I guess this may help. The above code is how you implement tailable cursor in Mongoose.

Better way to move MongoDB Collection to another Collection

In my web scraping project i need to move previous day scraped data from mongo_collection to mongo_his_collection
I am using this query to move data
for record in collection.find():
his_collection.insert(record)
collection.remove()
It works fine but sometimes it break when MongoDB collection contain above 10k rows
Suggest me some optimized query which will take less resources and do the same task
You could use a MapReduce job for this.
MapReduce allows you to specify a out-collection to store the results in.
When you hava a map function which emits each document with its own _id as key and a reduce function which returns the first (and in this case only because _id's are unique) entry of the values array, the MapReduce is essentially a copy operation from the source-collection to the out-collection.
Untested code:
db.runCommand(
{
mapReduce: "mongo_collection",
map: function(document) {
emit(document._id, document);
},
reduce: function(key, values) {
return values[0];
},
out: {
merge:"mongo_his_collection"
}
}
)
If both your collections are in the same database, I believe you're looking for renameCollection.
If not, you unfortunately have to do it manually, using a targeted mongodump / mongorestore command:
mongodump -d your_database -c mongo_collection
mongorestore -d your_database -c mongo_his_collection dump/your_database/mongo_collection.bson
Note that I just typed these two commands from the top of my head without actually testing them, so do make sure you check them before running them in production.
[EDIT]: sorry, I just realised that this was something you needed to do on a regular basis. In that case, mongodump / mongorestore probably isn't the best solution.
I don't see anything wrong with your solution - it would help if you edited your question to explain what you mean by "it breaks".
The query breaks because you are not limiting the find(). When you create a cursor on the server mongod will try to load the entire result set in memory. This will cause problems and/or fail if your collection is too large.
To avoid this use a skip/limit loop. Here is an example in Java:
long count = 0
while (true) {
MongoClient client = new MongoClient();
DBCursor = client.getDB("your_DB_name").getCollection("mongo_collection").find().sort(new BasicDBObject("$natural", 1)).skip(count).limit(100);
while (cursor.hasNext()) {
client.getDB("your_DB_name").getCollection("mongo_his_collection").insert(cursor.next());
count++;
}
}
This will work, but you would get better performance by batching the writes as well. To do that build an array of DBObjects from the cursor and write them all at once with one insert.
Also if the Collection is being altered while you are copying there is no guarantee that you will traverse all documents as some may end up getting moved if they increase in size.
You can try mongodump & mongorestore.
You can use renameCollection to do it directly. Or if on different mongods, use cloneCollection.
References:
MongoDB docs renameCollection: http://docs.mongodb.org/manual/reference/command/renameCollection/#dbcmd.renameCollection
MongoDB docs cloneCollection: http://docs.mongodb.org/manual/reference/command/cloneCollection/
Relevant blog post: http://blog.shlomoid.com/2011/08/how-to-move-mongodb-collection-between.html

mongodb: how can I see the execution time for the aggregate command?

I execute the follow mongodb command in mongo shell
db.coll.aggregate(...)
and i see the list of result. but is it possible to see the query
execution time? Is there any equivalent function for explain method for aggregation queries.
var before = new Date()
#aggregation query
var after = new Date()
execution_mills = after - before
You can add a time function to your .mongorc.js file (in your home directory):
function time(command) {
const t1 = new Date();
const result = command();
const t2 = new Date();
print("time: " + (t2 - t1) + "ms");
return result;
}
and then you can use it like so:
time(() => db.coll.aggregate(...))
Caution
This method doesn't give relevant results for db.collection.find()
i see that in mongodb there is a possibility to use this two command:
db.setProfilingLevel(2)
and so after the query you can use db.system.profile.find() to see the query execution time and other
Or you can install the excellent mongo-hacker, which automatically times every query, pretty()fies it, colorizes the output, sorts the keys, and more:
I will write an answer to explain this better.
Basically there is no explain() functionality for the aggregation framework yet: https://jira.mongodb.org/browse/SERVER-4504
However there is a way to measure client side but not without its downsides:
You are not measuring the database
You are measuring the application
There are too many unknowns about the in between parts to be able to get an accurate reading, i.e. you can't say that it took 0.04ms for the document result to be formulated by the MongoDB server, serialised, sent over the wire, de-serialised by the app and then stored into a hash allowing you subtract that sum from the total to get a aggregation benchmark.
However that being said, you might be able to get a slightly accurate result by doing it in MongoDB console on the same server as the mongos / mongod. This will create very little in betweens, still too many but enough to maybe get a reading you could roughly trust. As such you could use #Zagorulkin's answer in that position.

Getting an item count with MongoDB C# driver query builder

Using the c# driver for MongoDB I can easily construct a query against which I can then add SetSkip() and SetLimit() parameters to constrict the result set to a certain size.
However I'd like to be able to know what item count of the query would be before applying Skip and Take without executing the query and loading the entire result set (which could be huge) into memory.
It looks like I can do this with MongoDB directly through the shell by using the count() command. e.g.:
db.item.find( { "FieldToMatch" : "ValueToMatch" } ).count()
Which just returns an integer and that's exactly what I want. But I can't see a way in the documentation of doing this through the C# driver. Is it possible?
(It should be noted that we're already using the query builder extensively, so ideally I'd much rather do this through the query builder than start issuing commands down to the shell through the driver, if that's possible. But if that's the only solution then an example would be helpful, thanks.)
Cheers,
Matt
You can do it like this:
var server = MongoServer.Create("mongodb://localhost:27020");
var database = server.GetDatabase("someDb");
var collection = database.GetCollection<Type>("item");
var cursor = collection.Find(Query.EQ("FieldToMatch" : "ValueToMatch"));
var count = cursor.Count();
Some notes:
You should have only one instance of server (singleton)
latest driver version actually returns long count instead of int
Cursor only fetches data once you iterate
You can configure a lot of things like skip, take, specify fields to return in cursor before actually load data (start iteration)
Count() method of cursor loads only document count
I'm using the Driver 2.3.0 and now is also possible to do that like this:
...
IMongoCollection<entity> Collection = db.GetCollection<entity>(CollectionName);
var yourFilter = Builders<entity>.Filter.Where(o => o.prop == value);
long countResut = Collection.Count(yourFilter);