How to stream MongoDB Query Results with nodejs? - mongodb

I have been searching for an example of how I can stream the result of a MongoDB query to a nodejs client. All solutions I have found so far seem to read the query result at once and then send the result back to the server.
Instead, I would (obviously) like to supply a callback to the query method and have MongoDB call that when the next chunk of the result set is available.
I have been looking at mongoose - should I probably use a different driver?
Jan

node-mongodb-driver (the underlying layer that every mongoDB client uses in nodejs) except the cursor API that others mentioned has a nice stream API (#458). Unfortunately i did not find it documented elsewhere.
Update: there are docs.
It can be used like this:
var stream = collection.find().stream()
stream.on('error', function (err) {
console.error(err)
})
stream.on('data', function (doc) {
console.log(doc)
})
It actually implements the ReadableStream interface, so it has all the goodies (pause/resume etc)

Streaming in Mongoose became available in version 2.4.0 which appeared three months after you've posted this question:
Model.where('created').gte(twoWeeksAgo).stream().pipe(writeStream);
More elaborated examples can be found on their documentation page.

mongoose is not really "driver", it's actually an ORM wrapper around the MongoDB driver (node-mongodb-native).
To do what you're doing, take a look at the driver's .find and .each method. Here's some code from the examples:
// Find all records. find() returns a cursor
collection.find(function(err, cursor) {
sys.puts("Printing docs from Cursor Each")
cursor.each(function(err, doc) {
if(doc != null) sys.puts("Doc from Each " + sys.inspect(doc));
})
});
To stream the results, you're basically replacing that sys.puts with your "stream" function. Not sure how you plan to stream the results. I think you can do response.write() + response.flush(), but you may also want to checkout socket.io.

Here is the solution I found (please correct me anyone if thatis the wrong way to do it):
(Also excuse the bad coding - too late for me now to prettify this)
var sys = require('sys')
var http = require("http");
var Db = require('/usr/local/src/npm/node_modules/mongodb/lib/mongodb').Db,
Connection = require('/usr/local/src/npm/node_modules/mongodb/lib/mongodb').Connection,
Collection = require('/usr/local/src/npm/node_modules/mongodb/lib/mongodb').Collection,
Server = require('/usr/local/src/npm/node_modules/mongodb/lib/mongodb').Server;
var db = new Db('test', new Server('localhost',Connection.DEFAULT_PORT , {}));
var products;
db.open(function (error, client) {
if (error) throw error;
products = new Collection(client, 'products');
});
function ProductReader(collection) {
this.collection = collection;
}
ProductReader.prototype = new process.EventEmitter();
ProductReader.prototype.do = function() {
var self = this;
this.collection.find(function(err, cursor) {
if (err) {
self.emit('e1');
return;
}
sys.puts("Printing docs from Cursor Each");
self.emit('start');
cursor.each(function(err, doc) {
if (!err) {
self.emit('e2');
self.emit('end');
return;
}
if(doc != null) {
sys.puts("doc:" + doc.name);
self.emit('doc',doc);
} else {
self.emit('end');
}
})
});
};
http.createServer(function(req,res){
pr = new ProductReader(products);
pr.on('e1',function(){
sys.puts("E1");
res.writeHead(400,{"Content-Type": "text/plain"});
res.write("e1 occurred\n");
res.end();
});
pr.on('e2',function(){
sys.puts("E2");
res.write("ERROR\n");
});
pr.on('start',function(){
sys.puts("START");
res.writeHead(200,{"Content-Type": "text/plain"});
res.write("<products>\n");
});
pr.on('doc',function(doc){
sys.puts("A DOCUMENT" + doc.name);
res.write("<product><name>" + doc.name + "</name></product>\n");
});
pr.on('end',function(){
sys.puts("END");
res.write("</products>");
res.end();
});
pr.do();
}).listen(8000);

I have been studying mongodb streams myself, while I do not have the entire answer you are looking for, I do have part of it.
you can setup a socket.io stream
this is using javascript socket.io and socket.io-streaming available at NPM
also mongodb for the database because
using a 40 year old database that has issues is incorrect, time to modernize
also the 40 year old db is SQL and SQL doesn't do streams to my knowledge
So although you only asked about data going from server to client, I also want to get client to server in my answer because I can NEVER find it anywhere when I search and I wanted to setup one place with both the send and receive elements via stream so everyone could get the hang of it quickly.
client side sending data to server via streaming
stream = ss.createStream();
blobstream=ss.createBlobReadStream(data);
blobstream.pipe(stream);
ss(socket).emit('data.stream',stream,{},function(err,successful_db_insert_id){
//if you get back the id it went into the db and everything worked
});
server receiving stream from the client side and then replying when done
ss(socket).on('data.stream.out',function(stream,o,c){
buffer=[];
stream.on('data',function(chunk){buffer.push(chunk);});
stream.on('end',function(){
buffer=Buffer.concat(buffer);
db.insert(buffer,function(err,res){
res=insertedId[0];
c(null,res);
});
});
});
//This is the other half of that the fetching of data and streaming to the client
client side requesting and receiving stream data from server
stream=ss.createStream();
binarystring='';
stream.on('data',function(chunk){
for(var I=0;i<chunk.length;i++){
binarystring+=String.fromCharCode(chunk[i]);
}
});
stream.on('end',function(){ data=window.btoa(binarystring); c(null,data); });
ss(socket).emit('data.stream.get,stream,o,c);
server side replying to request for streaming data
ss(socket).on('data.stream.get',function(stream,o,c){
stream.on('end',function(){
c(null,true);
});
db.find().stream().pipe(stream);
});
The very last one there is the only one where I am kind of just pulling it out of my butt because I have not yet tried it, but that should work. I actually do something similar but I write the file to the hard drive then use fs.createReadStream to stream it to the client. So not sure if 100% but from what I read it should be, I'll get back to you once I test it.
P.s. anyone want to bug me about my colloquial way of talking, I'm Canadian, and I love saying "eh" come at me with your hugs and hits bros/sis' :D

Related

MongoDB Converting Circular Structure to JSON error

I'm trying to query a collection of users using Mongoose in a Node API.
The handler looks like this:
exports.getUsers = async function(req, res, next) {
try {
let users = db.User.find();
return res.status(200).json(users);
} catch(e) {
return next(e);
}
};
This returns an error that reads Converting circular structure to JSON. When I console.log() the results of db.User.find(), I get a Query object. I've checked everything else. All of my other routes are working normally.
Well...I figured it out. I'll post the answer that I discovered in case anyone else is trying to figure this out. It turns out, through a little bit more careful reading of the documentation, that the Query object that is returned has to be executed. There are two ways to execute it - with a callback function or by returning a promise (but not both). I found this page on queries in the mongoose docs helpful. My final handler looked like this.
exports.getUsers = async function(req, res, next) {
try {
db.User.find()
.then(users => {
return res.status(200).json(users);
});
} catch(e) {
return next(e);
}
};
Next time I guess I'll dig around for a few more minutes before asking.
Edit to add:
Found a second solution. Due to the use of the async function, I also was able to use following inside the try block.
let users = await db.User.find();
return res.status(200).json(users);

How do I store large files using Meteor?

I'm building a Meteor (meteorjs) app that needs to store and display PDF files, sometimes as large as 500Mb. GridFS doesn't seem to be integrated yet so I'm wondering if it's worth using Meteor in this case or stick to Rails.
Ideally I would not use S3 - I'd like to keep the files on my server.
UPDATE: it seems it's possible to connect outside of Meteor directly, I don't need PDFs to be automatically moved - and it likely doesn't make sense.
More specifically I'm now looking at:
MongoDB -> ElasticSearch using https://github.com/richardwilly98/elasticsearch-river-mongodb
Using the instructions at https://github.com/richardwilly98/elasticsearch-river-mongodb/wiki
You can use GridFS inside meteor without touching any extra package
var db = MongoInternals.defaultRemoteCollectionDriver().mongo.db; //grab the database object
var GridStore = MongoInternals.NpmModule.GridStore;
WebApp.connectHandlers.use('/someurl', function(req, res) {
var bigFile = new GridStore(db, 'bigfile.iso', 'r') //to read
bigFile.open(function(error, result) {
if (error) return
bigFile.stream(); //stream the file
bigFile.on('error', function(e) {...}) //handle error etc
bigFile.on('end', function() {bigFile.close();}); //close the file when done
bigFile.pipe(res); //pipe the file to res
});
});
However, the current GridStore/mongo (v1.3.x) used by Meteor is a bit dated, the newest verion is 2.x from http://mongodb.github.io/node-mongodb-native/2.0/api-docs/
The v1.x doesnt seem to pipe well so you may need to use the newer version
The second option
var db = MongoInternals.defaultRemoteCollectionDriver().mongo.db; //grab the database object
var GridStore = Npm.require('mongodb').GridStore; //add Npm.depends({mongodb:'2.0.13'}) in your package.js
WebApp.connectHandlers.use('/someurl', function(req, res) {
var bigFile = new GridStore(db, 'bigfile.iso', 'r').stream(true); //the new API doens't require bigFile.open() and will close automatically on end
bigFile.on('error', function(e) {...}); //handle error etc
bigFile.on('end', function() {...});
bigFile.pipe(res); //pipe the file to res
});
In this example, I use the WebApp.connectHandlers, but of course you can use iron: router or something. I tried with a file of 500 MB and it pipes all well. You also need to set the res.writeHead(200) and other stuff such as content-type, etc

Backbone.js with MongoDB passing req.params into exports functions

I am trying to send a request parameter through to an 'exports' method for a mongodb find in an express.js, backbone.js application. I am having a difficult
time getting the parameters to pass through to mongodb and with '#'.
The breakage is the passing of parameters into the exported mongodb function.
Here is the flow of data:
First the request is successfully routed to the 'upcoming' function:
"upcoming/uni/:uni" : "upcoming",
It flows on to the 'upcoming' function without a problem.
upcoming: function(uni) {
console.log("uni: "+uni);
pag.reset();
console.log("Hit upcoming list target");
setCollectionType('upcoming');
var upcomingCourses = buildCollection();
// ------------------------------------------------------------------------
// here is the problem how do I pass the parameter value through the fetch?
// Although it may also have to do with '#' please read on.
// ------------------------------------------------------------------------
upcomingCourses.fetch({success: function(){
$("#content").html(new ListView({model: upcomingCourses, page: 1}).el);
}});
this.headerView.selectMenuItem('home-menu');
},
The routing for the mongo methods is:
app.get('/upcoming/uni/:uni', mongomod.findUpcoming);
So the following method is exported from the mongodb js file and is executed reliable. However the req.params are not passed through.
Interspersed in the code I have described its' runtime behaviour:
exports.findUpcoming = function(req, res) {
console.log("university", req.params.uni); // This consistently is unpopulated
var uni = req.params.uni;
console.log("Size: "+req.params.length); // This will always be 0
for (var i=0; i < req.params.length; i++) {
console.log("Parameters: "+req.params[i]);
}
db.collection('upcoming', function(err, collection) {
if (typeof uni === 'undefined') {
console.log("The value is undefined");
uni = "Princeton University"; // here we add a string to test it it will work.
}
collection.find({university:uni}).toArray(function(err, items) {
if (err) {
console.log("Error: "+err);
} else {
console.log("No Error");
console.log("Count: "+items.length);
console.log(items[0]['university']);
res.send(items);
}
});
});
};
On additional and important note:
The url, in a working, runtime environment would be:
http://localhost:3000/#upcoming/uni/Exploratorium
This one fails, but the following URL will work in passing the params through these functions however it returns the JSON to the screen rather then
the rendered version:
http://localhost:3000/upcoming/uni/Exploratorium
The problem could be a miss understanding of # and templates. Please, if you see the error enlightenment would be greatly appreciated.
Nothing after the # gets passed to the server. See How to get hash in a server side language? or https://stackoverflow.com/a/318581/711902.
I found a solution to the problem of passing the parameters from the client side to the server side. By changing the url of the collection the parameters will be passed to the server side:
upcomingCourses.url = "/upcoming/uni/"+uni; // <-- here's the ticket where uni is param
upcomingCourses.fetch({success: function(){
$("#content").html(new ListView({model: upcomingCourses, page: 1}).el);
}});
This can be made more elegant but it is a way to pass the parameters on to the server.
Thanks

How to return Mongoose results from the find method?

Everything I can find for rending a page with mongoose results says to do it like this:
users.find({}, function(err, docs){
res.render('profile/profile', {
users: docs
});
});
How could I return the results from the query, more like this?
var a_users = users.find({}); //non-working example
So that I could get multiple results to publish on the page?
like:
/* non working example */
var a_users = users.find({});
var a_articles = articles.find({});
res.render('profile/profile', {
users: a_users
, articles: a_articles
});
Can this be done?
You're trying to force a synchronous paradigm. Just does't work. node.js is single threaded, for the most part -- when io is done, the execution context is yielded. Signaling is managed with a callback. What this means is that you either have nested callbacks, named functions, or a flow control library to make things nicer looking.
https://github.com/caolan/async#parallel
async.parallel([
function(cb){
users.find({}, cb);
},
function(cb){
articles.find({}, cb);
}
], function(results){
// results contains both users and articles
});
I'll play the necromancer here, as I still see another, better way to do it.
Using wonderful promise library Bluebird and its promisifyAll() method:
var Promise = require('bluebird');
var mongoose = require('mongoose');
Promise.promisifyAll(mongoose); // key part - promisification
var users, articles; // load mongoose models "users" and "articles" here
Promise.props({
users: users.find().execAsync(),
articles: articles.find().execAsync()
})
.then(function(results) {
res.render('profile/profile', results);
})
.catch(function(err) {
res.send(500); // oops - we're even handling errors!
});
Key parts are as follows:
Promise.promisifyAll(mongoose);
Makes all mongoose (and its models) methods available as functions returning promises, with Async suffix (.exec() becomes .execAsync(), and so on). .promisifyAll() method is nearly-universal in Node.JS world - you can use it on anything providing asynchronous functions taking in callback as their last argument.
Promise.props({
users: users.find().execAsync(),
articles: articles.find().execAsync()
})
.props() bluebird method takes in object with promises as its properties, and returns collective promise that gets resolved when both database queries (here - promises) return their results. Resolved value is our results object in the final function:
results.users - users found in the database by mongoose
results.articles - articles found in the database by mongoose (d'uh)
As you can see, we are not even getting near to the indentation callback hell. Both database queries are executed in parallel - no need for one of them to wait for the other. Code is short and readable - practically corresponding in length and complexity (or rather lack of it) to wishful "non-working example" posted in the question itself.
Promises are cool. Use them.
The easy way:
var userModel = mongoose.model('users');
var articleModel = mongoose.model('articles');
userModel.find({}, function (err, db_users) {
if(err) {/*error!!!*/}
articleModel.find({}, function (err, db_articles) {
if(err) {/*error!!!*/}
res.render('profile/profile', {
users: db_users,
articles: db_articles
});
});
});
Practically every function is asynchronous in Node.js. So is Mongoose's find. And if you want to call it serially you should use something like Slide library.
But in your case I think the easiest way is to nest callbacks (this allows f.e. quering articles for selected previously users) or do it completly parallel with help of async libraries (see Flow control / Async goodies).
I have a function that I use quite a bit as a return to Node functions.
function freturn (value, callback){
if(callback){
return callback(value);
}
return value;
};
Then I have an optional callback parameter in all of the signatures.
I was dealing with a very similar thing but using socket.io and DB access from a client. My find was throwing the contents of my DB back to the client before the database had a chance to get the data... So for what it's worth I will share my findings here:
My function for retrieving the DB:
//Read Boards - complete DB
var readBoards = function() {
var callback = function() {
return function(error, data) {
if(error) {
console.log("Error: " + error);
}
console.log("Boards from Server (fct): " + data);
}
};
return boards.find({}, callback());
};
My socket event listener:
socket.on('getBoards', function() {
var query = dbConnection.readBoards();
var promise = query.exec();
promise.addBack(function (err, boards) {
if(err)
console.log("Error: " + err);
socket.emit('onGetBoards', boards);
});
});
So to solve the problem we use the promise that mongoose gives us and then once we have received the data from the DB my socket emits it back to the client...
For what its worth...
You achieve the desired result by the following code. Hope this will help you.
var async = require('async');
// custom imports
var User = require('../models/user');
var Article = require('../models/article');
var List1Objects = User.find({});
var List2Objects = Article.find({});
var resourcesStack = {
usersList: List1Objects.exec.bind(List1Objects),
articlesList: List2Objects.exec.bind(List2Objects),
};
async.parallel(resourcesStack, function (error, resultSet){
if (error) {
res.status(500).send(error);
return;
}
res.render('home', resultSet);
});

How to keep DRY when using node-mongodb-native

db.open(function(err,db){
//handle error
db.collection("book",function(err, collection){
//handle error
collection.doSomething1(... function(err, result){
//handle error
collection.doSomething2(... function(err, result){
...
})
})
})
})
but we wont wrote db.open every time when we want do something, but we must make sure that db has opened when we use it.
we still wont like handle error every time in the same code.
we can also reuse the collection.
just like this
errorHandledDB.doSomething1("book",... function(result){
errorHandledDB.doSomething2("book",...function(result){
...
})
})
I implemented a server-application using mongodb for logging. I implemented data access using some provider classes, as shown in the example.
provider.filelog.js
var Db= require('mongodb/db').Db,
ObjectID= require('mongodb/bson/bson').ObjectID,
Server= require('mongodb/connection').Server,
log = require('lib/common').log;
FilelogProvider = function (host, port, database) {
this.db= new Db(database, new Server(host, port, {auto_reconnect: true}, {}));
this.db.open(function(){});
};
FilelogProvider.prototype.getCollection= function(callback) {
this.db.collection('filelogs', function(error, log_collection) {
if (error) callback(error);
else {
log_collection.ensureIndex([[ 'created', 1 ]], false, function(err, indexName) {
if (error) callback(error);
callback(null, log_collection);
});
}
});
};
FilelogProvider.prototype.findAll = function(callback) {
this.getCollection(function(error, log_collection) {
if (error) callback(error);
else {
log_collection.find(function(error, cursor) {
if (error) callback(error);
else {
cursor.toArray(function(error, results) {
if (error) callback(error);
else callback(null, results);
});
}
});
}
});
};
Since i use Grasshopper as my http-middleware, i can easily inject the providers using the DI functionality provided by gh:
server.js
gh.addToContext({
providers: {
filelog: new FilelogProvider(conf.mongodb_host, conf.mongodb_port, conf.mongodb_database),
status: new ServerstatusProvider(conf.mongodb_host, conf.mongodb_port, conf.mongodb_database)
},
log: log
});
Accessing the providers in every controller function is now a breeze:
gh.get('/serve', function() {
this.providers.filelog.findAll(function(err, res) {
// access data here
});
});
This implementation is pretty specific to Grasshopper (as it's using DI) but i think you'll get the idea. I also implemented a solution using express and mongoose, you find it here. This solution is a bit cleaner than using the native driver, as it exposes models to use against the database.
Update
Just for the sake of it: if you really want to stick to the DRY-principle, stop tinkering on an ORM implementation yourself and use Mongoose. If you need special functionality like Map/Reduce, you still can use the native driver (on which Mongoose is built).
Answer my own question. Because there is no more good options, I do it myself, I start a project to simplify it, check node-mongoskin.
I'm talking theoretically here, with no regards to mongo.
I would recommend you to try building a wrapping of a kind.
A Data access layer or at least models, it all depends on your architecture and needs,
and that's on your side.
Just wrap the access to mongodb with a layer of abstract commands, than write an abstract model object and all other model objects will inherit from it, and will automatically set all getters and setters for the attributes of the record you pulled from the mongo db.
for updating you just give it a save method, that iterates and saves all the changes made to it.
Since it's not a relational and I don't know if this is well suited for your design, the model may not be useful here.
Hope this helps, Good luck!