How does MongoDB offload logic and processing to the client? - mongodb

I was looking at Kristina's book and at the very beginning of the intro chapter I read that MongoDB offloads logic and processing to the client side whenever possible.
Can someone please explain this in more detail?
When it says whenever possible, what is this "when"? And how does it determine if it is possible?
And also about processing and logic... what are some examples? Like an insert or update being done client side and not server side?

The authors are describing one of the MongoDB design principles.
Here are some of the ways MongoDB offloads processing to the client:
Object IDs are usually generated and provided by the client
Since Mongo is schema-free, the client is responsible for ensuring that all the required fields are present and contain valid data (notable exception: constraining unique indexes)
There are no joins: such aggregation often needs to be done by the client
Aggregate functions are severely limited (though there are some more coming in v2.2)
Neither inserts, updates, nor any of the like are processed by the client in any significant way.

Related

Meteor publishing and subscribing to a large Collection

So let's take this scenario, in an e-commerce application, a user searches for "wrist watches".
Is it advisable for me to publish and subscribe the entire Products collection ? Because that table my grow a lot in size. Is it possible for me to fetch from a collection without subscribing to it ?
Also, in Meteor 1.3, which is the best place to define collections ? From what I read, it has to be in /imports/api, but some light on it might be helpful.
Thanks,
When you want to get data to your meteor client, you have three options - choose your own adventure.
option 1: publish the whole collection
pros: easy to implement, fast to use/filter on the client once the data has arrived, publication can be reused on the server for all clients
cons: doesn't scale well / doesn't work past a couple of thousand documents, may be a lot to transmit to the client
use when: you have a small size-bounded collection and the client needs all of it for filtering / searching / selecting
option 2: use a method
You can have a meteor method deliver the filtered documents to the client instead of publishing them. I.e. the user searches for "wrist watches", and the method delivers only those documents. See this section of the guide for more details. You can stuff the documents into a local collection if you like, but it isn't required.
pros: performance, scalability, data isolation (you don't have to worry that some subset of the documents were added by another subscription)
cons: it's more work to set up and manage than a subscription
use when: you have an unbounded collection and you need a subset in the most performant way
option 3: use a reactive subscription
This is very similar to (2) except you'll be re-subscribing in an autorun after changing your search parameters. See this section for more details.
pros: easier to implement than (2)
cons: more computationally expensive an a bit slower than (2) with the possible exception that publications could be reused on the server (unlikely in the case of a search)
use when: you have an unbounded collection and you need a subset with the least amount of effort/code
Without knowing more about your particular use case, I'd go with (2).
As for where to define your collections, see this section and the todos app for examples. The recommendation is to use imports/api as you mentioned. For an explanation of why, see this question. If you need more detail, I'd recommend opening a separate question.
Generally speaking we don't post all fetched data onto a page at once. It too lengthy for the customers in terms of user experience. A common advice is pagination plus sorting.
As to Meteor, collections on the server are different from collections on the client. In short, a collection on the client is a subset of the server collection. Data in that subset is determined by a publication-subscription mechanism of Meteor. Data is published on the server and you subscribe to it on the client. This way you derive the subset. Morever you can define filtering, sorting, count, ect to shape the derived subset based on what and how you like the subset to be used on the client. The documentation contains a pretty decent guide and details about Meteor collections.
The place to define collections is really flexible in Meteor. It doesn't have to be /imports/api. It can be any location that can be accessed by both the server and the client, because in general use cases the server needs to see the data and define methods for manipulating the collection, and the client needs to see it as well for rendering data on web pages. But, as said, it is flexible and depends on how you implement and structure your applications. It can a location accessible by both the server and the client, but it needs not to be. In some cases the collections are defined on the server only, and the client fetch the data from implicit and indirect protocols. Meteor method is one of them, and Restful API is another to name a few. It's case by case and you do what you feel best. That is where the fun is from. Subscription is common and convenient but not the only.
Meteor defines special rules to folder access on the server and client respectively, and Meteor 1.3 imposes a new rule for modulation. I enjoy reading Meteor documentation and find them really useful, like this one helps develop solid knowledge on the afore-mentioned rules.

does mongodb have the properties such as trigger and procedure in a relational database?

as the title suggests, include out the map-reduce framework
if i want to trigger an event to run a consistency check or security operations before a record is inserted, how can i do that with MongoDB?
MongoDB does not support triggers, but people have created solutions around them, mostly using the oplog, though this will only help you if you are running with replica sets, as the oplog is a capped collection that keeps track of data changes for the purposes of replication.
For a nodejs solution see: https://www.npmjs.org/package/mongo-watch or see an earlier SO thread: How to listen for changes to a MongoDB collection?
If you are concerned with consistency, read about write concern in mongoDB. http://docs.mongodb.org/manual/core/write-concern/ You can be as relaxed or as strict as you want by setting insert write concern levels, from fire and hope to getting an acknowledgement from all members of the replica set.
So, if you want to run a consistency check before inserting data, you probably will have to move that logic to the client application and set your write concern level to a level that will ensure consistency.
MongoDb does not have triggers or stored procedures. While there are solutions that some have used to try to emulate the behavior, as it is not a built-in feature, you'll need to decide whether the solutions are effective for you. Searching for "triggers and mongodb" should find dozens. All depend on the oplog and replicas.
But, given the nature of MongoDb and a typical 3 tier architecture, I would expect that at the point of data insertion, which could be on a web server for example, you would run, on the web server, the necessary consistency and security checks. You wouldn't allow a client such as a mobile application to directly set data into the database collection without some checks.
Many drivers for MongoDb and extended libraries have validation and consistency checks built in already, so there is less to do. Using unique indexes for some fields can also provide a level of consistency that you cannot do from the driver alone. Look at calls like findAndModify which make atomic updates.

How to group mongoDb requests for better performance?

Is there a way to group requests to MongoDb?
For example I have one collection.find and three collection.aggregate requests from C# code.
I'm wondering is there a way to combine them and send only one request through the network.
MongoDB's wire protocol is designed for a single operation per message, so there is no support for grouping multiple operations into a single request. The only thing that comes close to this is a bulk insert, but that's really one operation that takes multiple documents.
On a non-sharded MongoDB system, you could conceivably perform multiple server-side operations via a single eval command. This would entail sending MongoDB a JavaScript function that runs multiple operations and then stuffs their results into a single result to be returned. However, I'd be hard-pressed to think of a realistic situation where that'd be preferable to sending multiple requests via your driver (JS is less performant, there are concurrency issues, etc.).

Can MongoDB be a consistent event store?

When storing events in an event store, the order in which the events are stored is very important especially when projecting the events later to restore an entities current state.
MongoDB seems to be a good choice for persisting the event store, given its speed and flexibel schema (and it is often recommended as such) but there is no such thing as a transaction in MongoDB meaning the correct event order can not be garanteed.
Given that fact, should you not use MongoDB if you are looking for a consistent event store but rather stick with a conventional RDMS, or is there a way around this problem?
I'm not familiar with the term "event store" as you are using it, but I can address some of the issues in your question. I believe it is probably reasonable to use MongoDB for what you want, with a little bit of care.
In MongoDB, each document has an _id field which is by default in ObjectId format, which consists of a server identifier, and then a timestamp and then a sequence counter. So you can sort on that field and you'll get your objects in their creation order, provided the ObjectIds are all created on the same machine.
Most MongoDB client drivers create the _id field locally before sending an insert command to the database. So if you have multiple clients connecting to the database, sorting by _id won't do what you want since it will sort first by server-hash, which is not what you want.
But if you can convince your MongoDB client driver to not include the _id in the insert command, then the server will generate the ObjectId for each document and they will have the properties you want. Doing this will depend on what language you're working in since each language has its own client driver. Read the driver docs carefully or dive into their source code -- they're all open source. Most drivers also include a way to send a raw command to the server. So if you construct an insert command by hand this will certainly allow you to do what you want.
This will break down if your system is so massive that a single database server can't handle all of your write traffic. The MongoDB solution to needing to write thousands of records per second is to set up a sharded database. In this case the ObjectIds will again be created by different machines and won't have the nice sorting property you want. If you're concerned about outgrowing a single server for writes, you should look to another technology that provides distributed sequence numbers.

MongoDB with redis

Can anyone give example use cases of when you would benefit from using Redis and MongoDB in conjunction with each other?
Redis and MongoDB can be used together with good results. A company well-known for running MongoDB and Redis (along with MySQL and Sphinx) is Craiglist. See this presentation from Jeremy Zawodny.
MongoDB is interesting for persistent, document oriented, data indexed in various ways. Redis is more interesting for volatile data, or latency sensitive semi-persistent data.
Here are a few examples of concrete usage of Redis on top of MongoDB.
Pre-2.2 MongoDB does not have yet an expiration mechanism. Capped collections cannot really be used to implement a real TTL. Redis has a TTL-based expiration mechanism, making it convenient to store volatile data. For instance, user sessions are commonly stored in Redis, while user data will be stored and indexed in MongoDB. Note that MongoDB 2.2 has introduced a low accuracy expiration mechanism at the collection level (to be used for purging data for instance).
Redis provides a convenient set datatype and its associated operations (union, intersection, difference on multiple sets, etc ...). It is quite easy to implement a basic faceted search or tagging engine on top of this feature, which is an interesting addition to MongoDB more traditional indexing capabilities.
Redis supports efficient blocking pop operations on lists. This can be used to implement an ad-hoc distributed queuing system. It is more flexible than MongoDB tailable cursors IMO, since a backend application can listen to several queues with a timeout, transfer items to another queue atomically, etc ... If the application requires some queuing, it makes sense to store the queue in Redis, and keep the persistent functional data in MongoDB.
Redis also offers a pub/sub mechanism. In a distributed application, an event propagation system may be useful. This is again an excellent use case for Redis, while the persistent data are kept in MongoDB.
Because it is much easier to design a data model with MongoDB than with Redis (Redis is more low-level), it is interesting to benefit from the flexibility of MongoDB for main persistent data, and from the extra features provided by Redis (low latency, item expiration, queues, pub/sub, atomic blocks, etc ...). It is indeed a good combination.
Please note you should never run a Redis and MongoDB server on the same machine. MongoDB memory is designed to be swapped out, Redis is not. If MongoDB triggers some swapping activity, the performance of Redis will be catastrophic. They should be isolated on different nodes.
Obviously there are far more differences than this, but for an extremely high overview:
For use-cases:
Redis is often used as a caching layer or shared whiteboard for distributed computation.
MongoDB is often used as a swap-out replacement for traditional SQL databases.
Technically:
Redis is an in-memory db with disk persistence (the whole db needs to fit in RAM).
MongoDB is a disk-backed db which only needs enough RAM for the indexes.
There is some overlap, but it is extremely common to use both. Here's why:
MongoDB can store more data cheaper.
Redis is faster for the entire dataset.
MongoDB's culture is "store it all, figure out access patterns later"
Redis's culture is "carefully consider how you'll access data, then store"
Both have open source tools that depend on them, many of which are used together.
Redis can be used as a replacement for a traditional datastore, but it's most often used with another normal "long" data store, like Mongo, Postgresql, MySQL, etc.
Redis works excellently with MongoDB as a caching server. Here is what happens.
Anytime that mongoose issues a cache query, it will first go over to the cache server.
The cache server will check to see if that exact query has ever been issued before.
If it hasn’t then the cache server will take the query, send it over to mongodb and Mongo will execute the query.
We will then take the result of that query, it then goes back to the cache server, the cache server will store the result of the query on itself.
It will say anytime I execute that query, I get this response and so its going to maintain a record between queries that are issued and responses that come back from those queries.
The cache server will take the response and send it back to mongoose, mongoose will give it to express and it eventually ends up inside the application.
Anytime that the same exact query is issued again, mongoose will send the same query to the cache server, but if the cache server sees that this query was issued before it will not send the query onto mongodb, instead its going to take the response to the query it got the last time and immediately send it back over to mongoose. There is no indices here, no full table scan, nothing.
We are doing a simple lookup to say has this query been executed? Yes? Okay, take the request and send it back immediately and don’t send anything to mongo.
We have the mongoose server, the cache server (Redis) and Mongodb.
On the cache server there might be a datastore with key value type of data store where all the keys are some type of query issued before and the value the result of that query.
So maybe we are looking up a bunch of blogposts by _id.
So maybe the keys in here are the _id of the records we have looked up before.
So lets imagine that mongoose issues a new query where it tries to find a blogpost with _id of 123, the query flows into the cache server, the cache server will check to see if it has a result for any query that was looking for an _id of 123.
If it does not exist in the cache server, this query is taken and sent on to the mongodb instance. Mongodb will execute the query, get a response and send it back.
This result is sent back over to the cache server who takes that result and immediately sends it back to mongoose so we get as fast a response as possible.
Right after that, the cache server will also take the query issued, and add that on to its collection of queries that have been issued and take the result of the query and store it right up against the query.
So we can imagine that in the future we issue the same query again, it hits the cache server, it looks at all the keys it has and says oh I already found that blogpost, it doesn’t reach out to mongo, it just takes the result of the query and sends it directly to mongoose.
We are not doing complex query logic, no indices, nothing like that. Its as fast as possible. Its a simple key value lookup.
Thats an overview of how the cache server (Redis) works with MongoDB.
Now there are other concerns. Are we caching data forever? How do we update records?
We don’t want to always be storing data in the cache and be reading from the cache.
The cache server is not used for any write actions. The cache layer is only used for reading data. If we ever write data, writing will always go over to the mongodb instance and we need to ensure that anytime we write data we clear any data stored on the cache server that is related to the record we just updated in Mongo.