Meteor serving large static collections - mongodb

I started building and app and chose meteor as a platform, but I stumbled upon a problem, I need to serve large collections of data to user let's say 2000-5000 records, now I understand that having such large reactive collection is a problem for meteor, but the thing is I don't need it be reactive, I just need statistically display it to user whenever he requests it. I just started out with meteor and don't know of it's capabilities, so I wonder if something like this is possible? For example php queries ~3000 records from mysql and prints it to user in around 3 second.
But using meteor even for smaller collections let's say 500 records I have to wait for a lot more time: ~1min.
I have a clue that this slow loading might be caused by meteor default MongoDB implementation, and using external database should increase performance, though I did not tried it yet. Anyway, the question would be can I achieve fast loading of large data collections in meteor and if so how would I do that, and what are best practices of handling large collections in meteor?
PS. I chose meteor because I do need it's reactivity for some cases, with small collections. But I also need to serve larger static collections. But I wonder if I can combine both in meteor?

A couple of pointers, which may help with your static collections:
Use 'reactive: false' in your find queries that don't need to be reactive as that'll stop meteor watching for updates.
http://docs.meteor.com/#/full/find
Figure out what fields you need where and only return the bare minimum. You can use session variables to filter based on the context, which will make your publications a lot more effective.
http://docs.meteor.com/#/full/meteor_publish
Surely the user doesn't need to see all 2000-5000 records at once? Are you not able to implement some sort of paging mechanism?
Best pattern for pagination for Meteor

Related

MongoDB Document of Size 300kb taking 8-15s

I am using MongoDB Atlas Free Tier hosted on GCP. I have documents which have arrays containing 300kb data. A simple Get By ID query takes around 8-15 seconds. There are less than 50 records in the collection so probably indexing is not an issue. Also, the I have used my custom Ids, and not the built in ObjectIds in my collection. Is this much query time normal? If yes, what are some ways to address this issue as I need fast realtime analytics on Frontend. I already have Redis in mind, but is there any better way to address this?
Ensure your operations are not throttled. https://docs.atlas.mongodb.com/reference/free-shared-limitations/
Test performance with a different driver (another language), verify you are using most recent driver releases.
Test smaller documents to identify whether time is being expended on the server or over the network.
Test with mongo shell.
As for an answer, I highly recommend you not to deal with M0 Atlas tier. Or at least choose it wisely, don't choose US-based cluster if you thousand of miles away from States side. Don't understood me wrong. It's a good product. But it depends on your costs.
As for myself, I prefer to deal with MongoDB Community Edition version and deploy it on my VPS/VDS. Of course it doesn't provide you such good web-interface like you have seen in Atlas. And there is no support of Realms functional (stitch), but instead you could design it yourself. And also, every performance issue is depend on you.
As for me, I using MongoDB not for real-time data, but visual snapshots on front-end, and I have no problems with performance.
I mean if I have them, then I deal with them myself, via indexing,
increasing VPS CPU/RAM, optimizing queries and so on
Also, one more thing about your problem: «I have documents which have arrays containing 300kb data»
If you have an array field in your schema, and it stores lots of data, especially if it's embedded docs, are you sure that you are using right schema pattern?
You might wanna take a look at this articles at Mongo University about architecture patterns.
Probably it will be much better for you to have a different collection for embedded docs, and request them via aggregation.$lookup when they needed.

Meteor MongoDB Server Aggregation into new Collection

I'm currently experimenting with a test collection on a LAN-accessible MongoDB server and data in a Meteor (v1.6) application. View layer of choice is React and right now I'm using the createContainer to bind the subscriptions to props.
The data that gets put in the MongoDB storage is updated on a daily basis and consists of a big set of data from several SQL databases, netting up to about 60000 lines of JSON per day. The data has been ever-so-slightly reshaped to be turned into a usable format whilst remaining as RAW as I'd like it to be.
The working solution right now is fetching all this data and doing further manipulations client-side to prepare the data for visualization. The issue should seem obvious: each client is fetching a set of documents that grows every day and repeats a lot of work on earlier entries before being ready to display. I want to do this manipulation on the server, through MongoDB's Aggregation Framework.
My initial idea is to do the aggregations on the server and to create new Collections containing smaller, more specific datasets without compromising the RAWness of the original Collection. That would mean the "reduced" Collections can still be reactive, as I've been able to confirm through testing in a Remote Desktop, subscribing to an aggregated Collection which I can update through Robo3T.
I don't know if this would be ideal. As far as storage goes, there's plenty of room for the extra Collections. But I have no idea how to set up an automated aggregation script on said server. And regarding Meteor, I've tried using meteorhacks:aggregate and jcbernack:reactive-aggregate but couldn't figure out how to deal with either one of them. If anyone is dealing, or has dealt with, something similar; I'd love to hear ideas / suggestions.

I wonder if there are a lot of collections

Do many mongodb collections have a big impact on mongodb performance, memory and capacity? I am designing an api with mvc pattern, and a collection is being created for each model. I question the way I am doing now.
MongoDB with the WirdeTiger engine supports an unlimited number of collections. So you are not going to run into any hard technical limitations.
When you wonder if something should be in one collection or in multiple collections, these are some of the considerations you need to keep in mind:
More collections = more maintenance work. Sharding is configured on the collection level. So having a large number of collections will make shard configuration a lot more work. You also need to set up indexes for each collection separately, but this is quite easy to automatize, because createIndex on an index which already exists does nothing.
The MongoDB API is designed in a way that every database query operates on one collection at a time. That means when you need to search for a document in n different collections, you need to perform n queries. When you need to aggregate data stored in multiple collections, you run into even more problems. So any data which is queried together should be stored together in the same collection.
Creating one collection for each class in your model is usually a god rule of thumb, but it is not a golden hammer solution. There are situations where you want to embed object in their parent-object documents instead of putting them into a separate collection. There are also cases where you want to put all objects with the same base-class in the same collection to benefit from MongoDB's ability to handle heterogeneous collections. But that goes beyond the scope of this question.
Why don't you use this and test your application ?
https://docs.mongodb.com/manual/tutorial/evaluate-operation-performance/
By the way your question is not completely clear... is more like a "discussion" rather than question. And you're asking others to evaluate your work instead of searching the web the rigth approach.

How to decide whether to store deep documents or thin related documents in a NoSQL database

New To NoSQL
In my 8 years of web development I've always used a relational database. Recently I started using MongoDB for a simple, multi-user web app where users can create their own photo galleries.
My Domain
My Domain is quite simple, there are "users" > "sites" > "photo sets" > "photos".
I've been struggling on how to decide how to store these documents. In the application sometimes I only need a small collection of "photos", and sometimes only the "sets", but always I need some information about the "user", and possibly the "site".
Thin Versus Deep
Currently I'm storing multiple thin documents, using my own implementation of foreign keys. The problem of course is that I sometimes have to make multiple calls to Mongo to render a single page.
Questions
Of course I'm sure there are ways to get around these inefficiencies, caches etc, but how do NoSQLers approach these problems:
Is it normal to related your documents like this?
Is it better to just store potentially massive deep documents?
Am I getting it wrong, and actually I should be storing multiple documents specifically for different views?
If you're storing multiple documents for different views, how do you manage updates?
Is the answer to use the "embed" features of Mongo? Is that how most solve this issue?
Thinks to think about when using a NoSQL Database, especially MongoDB:
How you manipulate the data?
Dynamic Queries
Secondary Indexes
Atomic Updates
Map Reduce
What about your Access Patterns (per Collection)?
Read / Write Ratio
Types of updates
Types of queries
Data life-cycle
Basic Knowledge:
Document writes are atomic
Maximum Document Size is 16Meg (with GridFS you could store larger files too)
Watch out for:
Careless Indexing
Large, deeply nested documents
Here=s an older talk about Schema Design: Schema Design Basics

Easiest way to scale Mongo with limited resources?

I have a web server (40gig hd, 1 gig ram) that runs Mongo and a Rails application.
The Mongo DB is a document store of Twitter tweets and users, which has several million records. I perform map-reduce queries on the data to extract things like most common hashtags, words, mentions etc (very standard stuff). The meta-data of each tweet is already stored, so the map-reduce is really as efficient as a single collect.
However, since it is run on a (fairly) large dataset, it can't be done in real-time anymore - for example I have a report generator that works out a whole bunch of these map-reduces in a row and takes about 2 minutes for 20 thousand tweets.
What is the quickest, cheapest way to scale mongo, especially in map-reduce performance? I can set up an additional server and split the load, but wonder if I should use sharding, replication or both? Sharding may be overkill for this situation.
Would love some input on my mysql-mongo connection. mysql contains twitter profiles that store twitter ids for each profile. each time a map reduce is done, it collects all IDs to be fed as options into the mapreduce ie:
#profile_tweet_ids = current_profile_tweet_ids # array of ids
#daily_trend = TwitterTweet.daily_trend :query => {:twitter_id => {"$in" => #profile_tweet_ids}}
The mapreduce function in TwitterTweet looks like:
def daily_trend(options={})
options[:out] = "daily_trend"
map = %Q( function(){
if (this.created_at != null)
{
emit(this.created_at.toDateString(), 1);
}
})
result = collection.map_reduce(map, standard_reduce, options)
normalize_results(result)
end
Any advice is appreciated!
If you are doing simple counts, sums, uniques etc, you may be able to avoid using map-reduce completely. You can use the $inc operator to get most of the stuff that you need in real-time.
I have explained this in detail in my blog post on real-time analytics with MongoDB.
Sounds like your use case is more in the lines of online stream / event processing.
You can use mongo or other databases / caching product to store reference data, and an event processing framework for receiving and processing the events. There are a few tools that can help you with that - out the back of my head here are a few: Twitter Storm, Apache S4, GigaSpaces XAP (disclaimer - I work for GigaSpaces) and GridGain.
Use one of the cloud services like MongoLab.. Depends on your definition of cheap though..
The answer regarding using operators rather than MapReduce has merits, and may be far more beneficial to your efforts to get real time responses. Map Reduce on mongodb does not lend itself to yielding real time responses.
Further to that, you may also benefit from the new aggregation framework (http://www.mongodb.org/display/DOCS/Aggregation+Framework), once that is available in the next release.
To answer the more general question about how to scale out MapReduce, adding a new server may not help if you are simply going to add it as a secondary, as a secondary it will not have the capability to store your M/R results in a collection, so inline is your only option. If you do not need to store results in a collection then this is your easiest way forward. For more information, see an in-depth discussion here: http://groups.google.com/group/mongodb-user/browse_thread/thread/bd8f5734dc64117a
Sharding can help with scaling out, but bear in mind that you will need to run everything through a mongos process, have config servers and that the mongos will need to finalize the result sets returned from each shard, so you add a new potential bottleneck depending on your data and you will need more than just one extra machine to have it working in a reliable manner.
It is the connections between different data items that is most valuable to them (they let the public do the work of categorising the data to make it valuable) and hence also the most dangerous to you http://indresult.com