MongoDB Document of Size 300kb taking 8-15s - mongodb

I am using MongoDB Atlas Free Tier hosted on GCP. I have documents which have arrays containing 300kb data. A simple Get By ID query takes around 8-15 seconds. There are less than 50 records in the collection so probably indexing is not an issue. Also, the I have used my custom Ids, and not the built in ObjectIds in my collection. Is this much query time normal? If yes, what are some ways to address this issue as I need fast realtime analytics on Frontend. I already have Redis in mind, but is there any better way to address this?

Ensure your operations are not throttled. https://docs.atlas.mongodb.com/reference/free-shared-limitations/
Test performance with a different driver (another language), verify you are using most recent driver releases.
Test smaller documents to identify whether time is being expended on the server or over the network.
Test with mongo shell.

As for an answer, I highly recommend you not to deal with M0 Atlas tier. Or at least choose it wisely, don't choose US-based cluster if you thousand of miles away from States side. Don't understood me wrong. It's a good product. But it depends on your costs.
As for myself, I prefer to deal with MongoDB Community Edition version and deploy it on my VPS/VDS. Of course it doesn't provide you such good web-interface like you have seen in Atlas. And there is no support of Realms functional (stitch), but instead you could design it yourself. And also, every performance issue is depend on you.
As for me, I using MongoDB not for real-time data, but visual snapshots on front-end, and I have no problems with performance.
I mean if I have them, then I deal with them myself, via indexing,
increasing VPS CPU/RAM, optimizing queries and so on
Also, one more thing about your problem: «I have documents which have arrays containing 300kb data»
If you have an array field in your schema, and it stores lots of data, especially if it's embedded docs, are you sure that you are using right schema pattern?
You might wanna take a look at this articles at Mongo University about architecture patterns.
Probably it will be much better for you to have a different collection for embedded docs, and request them via aggregation.$lookup when they needed.

Related

MongoDB integration with Solr

I am beginner with mongodb and its integraiton with Solr. From different posts I got an idea about the integration steps. But need info on the below
I have the data in mongodb, for faster retrieval we are integrating it with Solr.
Solr indexes all mongodb entries. Is this indexing one time activity after integration or Do we need to periodically update Solr to index the entries which got inserted after the integration ?
If we need to periodically update solr, it becomes an extra overhead to maintain it in Solr as well along with mongodb. Best approaches on overcoming it.
As far as I know you do not have official(supported/complete) solution to integrate MongoDB and Solr, but let me give you some ideas/direction.
For me the best approach is when it is possible to modify the application and add to the persistence layer the fact that you have all writes operations done in MongoDB and Solr in the "same" time. Like that you can control exactly what you want to send to the Database and what you want to index for a full text operation. But as I said this means that you have to change your application code. (You will have anyway to change it to be able to query Solr when needed). And yes you have to index all the existing documents the first time
You can use a "connector" approach where MongoDB and Solr are kind of connected together, this could be done in various ways.
You can use for example the MongoDB Connector available here : https://github.com/10gen-labs/mongo-connector
LucidWorks, the company behind Solr has also a connector for MongoDB, documented here : http://docs.lucidworks.com/display/help/Create+a+New+MongoDB+Data+Source# (I have not used it so cannot comment, but it is also an approach)
You point #2 is true, you have to manage two clusters and be sure the data are in sync, and sometimes pay the price of inconsistency between the Solr index and the document just updated in MongoDB... So you need to see if the best approach for your application is to use MongoDB alone or MongoDB with Solr (see comment below)
Just a small comment in addition to this answer:
You are talking about "faster retrieval", not sure it should be the reason, if you write correct queries with correct indexes in MongoDB you should be able to do it without Solr. If you requirement is really oriented towards the power of solr meaning: full text index (with all related features it makes sense)
How large is your data? MongoDB has a few good indexing mechanism of its own.
There is a powerful geo-api and for full text search there is http://docs.mongodb.org/manual/core/index-text/. So it would be ideal to identify if your need fits into MongoDB or you need to spill over to SOLR.
About the indexing part. How often if your data updated? If you can afford to have infrequent updates, then a batch job with once a day re-indexing may work for you. Ideally SOLR would work well for some form of master data.

MongoDB with LOTS OF datas?

I'm a beginner with a non SQL structure like here with MongoDB and I don't find somebody talk about a collection with lots of data, like 1.000.000 entries ? and more ?
I saw a company page on the official site. But nothing with large data companies.
I heard about a combo with SQL : Large data are stocked on SQL tables, and only the "cache" are on MongoDB, but it's the only one solution for MongoDB and large data ?
We're using MongoDB to power Where's it Up, and the api behind it. We're currently pushing in >3 million documents per day. MongoDB is the only storage engine in use. We were keeping a bunch around for a while, but we're now using TTL to delete old records.
Things are going super well, just make sure you have all the indexes you need. Querying a million+ records without an index is bad, regardless of your storage engine. Auto-failover has been super helpful.
Something to watch out for is updating records to include more information, it can be pretty expensive if the document grows past pre-allocated space. We ended up changing how we stored data to avoid updates, and create new documents instead.
MongoDB in it's current incarnation is explicitly designed to make it easy to scale out.
As for the numbers: one of my test databases has 10M records and runs easily on my MacBook Air, which is 4 years old now.
So what you can do when your current cluster can not handle the data stored (either because the indices are too big for your RAM or because of processing the queries takes too long): add another node to your MongoDB cluster. Your performance gain should be something between slightly below linear (if your cluster was in perfect condition otherwise) up to several orders of magnitude (when indices didn't fit into RAM and/or IO was pushed to it's limits before and that situation changed after scaling out).
A word of warning: you should have somebody who knows about MongoDB administration in case you want to put you deployment into production. Though MongoDB administration seems to be easy, it is by no means something to be done by a layman. Especially not for production use.

Is MongoDB a good fit for this?

In a system I'm building, it's essentially an issue tracking system, but with various issue templates. Some issue types will have different formats that others.
I was originally planning on using MySQL with a main issues table and an issues_meta table that contains key => value pairs. However, I'm thinking NoSQL (MongoDB) might be the better option.
Can MongoDB provide me with the ability to generate "standard"
reports, like # of issues by type, # of issues by type by month, # of
issues assigned per person, etc? I ask this because I've read a few
sources that said Mongo was bad at reporting.
I'm also planning on storing my audit logs in Mongo, since I want a single "table" for all actions (Modifications to any table). In Mongo I can store each field that was changed easily, since it is schemaless. Is this a bad idea?
Anything else I should know, and will Mongo work for what I want?
I think MongoDB will be a perfect match for that use case.
MongoDB collections are heterogeneous, meaning you can store documents with different fields in the same bag. So different reporting templates won't be a show stopper. You will be able to model a full issue with a single document.
MongoDB would be a good fit for logging too. You may be interested in capped collections.
Should you need to have relational association between documents, you can do have it too.
If you are using Ruby, I can recommend you Mongoid. It will make it easier. Also, it has support for versioning of documents.
MongoDB will definitely work (and you can use capped collections to automatically drop old records, if you want), but you should ask yourself, does it fit to this task well? For use case you've described it is better option to use Redis (simple and fast enough) or Riak (if you care a lot about your log data).

Can DynamoDB or SimpleDB replace my MongoDB use-case?

I was wondering if DynamoDB or SimpleDB can replace my MongoDB use-case? Here is how I use MongoDB
15k entries, and I add 200 entries per hour
15 columns each of which is indexed using (ensureIndex)
Half of the columns are integers, the others are text fields (which basically have no more than 10 unique values)
I run about 10k DB reads per hour, and they are super fast with MongoDB right now. It's an online dating site. So the average Mongo query is doing a range search on 2 of the columns (e.g. age and height), and "IN" search for about 4 columns (e.g. ethnicity is A, B, or C... religion is A, B, ro C).
I use limit and skip very frequently (e.g. get me the first 40 entries, the next 40 entries, etc)
I use perl to read/write
I'm assuming you're asking because you want to migrate directly to an AWS hosted persistence solution? Both DynamoDB and SimpleDB are k/v stores and therefor will not be a "drop-in" replacement for a document store such as MongoDB.
With the exception of the limit/skip one (which require a more k/v compatible approach) all your functional requirements can easily be met by either of the two solutions you mentioned (although DynamoDB in my opinion is the better option) but that's not going to be the main issue. The main issue is going to be to move from a document store, especially one with extensive query capabilities, to a k/v store. You will need to rework your schema, rethink your indexes, work within the constraints of a k/v store, make use of the benefits of a k/v store, etc.
Frankly if your current solution works and if MongoDB feels like a good functional fit I'd certainly not migrate unless you have very strong non-technical reasons to do so (such as, say, your boss wants you to ;) )
What would you say is the reason you're considering this move or are you just exploring whether or not it's possible in the first place?
If you are planning to have your complete application on AWS then you might also consider using Amazon RDS (hosted managed MySQL). It's not clear from your description if you actually need MongoDB's document model so considering only the querying capabilities RDS might come close to what you need.
Going with either SimpleDB or DynamoDB will most probably require you to rethink some of the features based around aggregation queries. As regards choosing between SimpleDB and DynamoDB there are many important differences, but I'd say that the most interesting ones from your point of view are:
SimpleDB indexes all attributes
there're lots of tips and tricks that you'll need to learn about SimpleDB (see what the guys from Netflix learned while using SimpleDB)
DynamoDB pricing model is based on actual write/read operations (see my notes about DynamoDB)

Using MongoDB and Redis together?

We started with Redis, storing active data, logged in users, etc. We're using some pubsub too for realtime data passing.
Recently we added Mongo to fit our geo spatial needs, and it seems great for non-active data too.
How should these two work together? It is dumb to use both? Is it dumb to pass chunks of data from mongo to redis when they becomes active?
Our thoughts were that we might store everything in mongo but then pass user data from mongo to redis when a user is active and the data is likely to be accessed. I know Mongo does some cacheing like this on its own, we are new to both of them and just want to know how they should be used together, if at all.
Thanks!!
It is dumb to use both? Is it dumb to pass chunks of data from mongo to redis when they becomes active?
So I feel like there's actually a legitimate to test and validate this question. Redis is basically an "in-memory" DB, so how much better can you do by giving that RAM to Mongo?
Historically, we've used the Memcache/MySQL combo to basically "add RAM" to MySQL and limit the amount of writing it needed to do. We did this simply because it was complicated to shard MySQL.
However, MongoDB provides a sharding mechanism. So you can "add RAM" to a problem (along with "adding disks") simply by adding more shards.
Thanks to the way memory-mapped files work, MongoDB tends to keep recently used data in memory. So if you're pulling recent data into Redis, that data is probably also in memory on the MongoDB side, so it's not clear that you benefit from having it in two places.
Is it dumb ...
That's hard to say without some testing and analysis. MongoDB doesn't really have the pub/sub mechanism, but it does tend to have fast query times, so it may be appropriate in specific spots.