Designing the query layer in CQRS - cqrs

I have been studying up on microservices and in particular the CQRS pattern presented well by this video. I think the command part is easy to understand for me as long as I embrace the eventual consistency and fine with it. However, what should happen when you do a simple query against the denormalized query layer using its API. I am thinking this way because you will probably do lazy loading of denormalized data to the query layer as the queries come right? Especially if that data you are querying for is really an aggregation of data scattered all across micro services, do you have to resort to a massive orchestration of firing fetch data event with some sort of context ID, and all microservices in turn publish their data with the same context ID so that denormalizer can listen and fill the aggregated data to its layer and finally respond back to the client?

What you described is actually API composition pattern - https://microservices.io/patterns/data/api-composition.html
In CQRS, a separate DB will be used for queries. The result will be directly retrieved from this dedicated DB.
CQRS generally is combined with event souring pattern. (https://microservices.io/patterns/data/event-sourcing.html)
That means any state change in the system will be represented by an event.
In the query service/logic, you need to subscribe to all the interested events and update data in the query DB accordingly in the event handler. Thus the data in query DB is eventually consistent with the data in command-side DB.
(https://microservices.io/patterns/data/cqrs.html)
CQRS makes querying easier/more efficient and improve separation of concerns.
However, as you can see, it is more complicated to implement than API composition, and it has an innate issue - replication lag - the data in query DB may not reflect the latest state. So it is generally recommended to use API composition if possible, use CQRS only when you must.

Related

Is it ok to use Meteor publish composite for dozens of subscriptions?

Currently, our system is not entirely normalized, and we use meteor-publish-composite to obtain the normalized data in mongodb. Some models have very few dependencies, but others have arrays of objects (i.e. sub-documents) with few foreign keys that we are subscribing to when fetching each model.
An example would be a Post containing a list of Comment sub-documents, where each comment has a userId field.
My question is, while I know it would be faster to use collection hooks and update the collection with data denormalization, how does Meteor handle multiple subscriptions on the same collection?
Is a hundred subscriptions on the same collection affect the application speed (significantly)? What about a thousand? etc.
This may not fully answer your question, however after spending countless hours tuning the performance of a large meteor app, I thought I would share some of the things that I have learned.
In Meteor, when you define a publication, you are setting up a reactive query that continues to push data to subscribed clients when changes to the underlying mongo data causes the result of the query to change. In other words, it sets up a query that will continually push data to clients as the data is inserted, updated, or removed. The mechanism by which it does this is by creating an observer on the query.
When an observer is initialized (e.g. when publication is subscribed to), it will query mongodb for the initial dataset to send down and then use the oplog to detect changes going forward. Fortunately, meteor is able to re-use an existing observer for a new subscription if the query is for the same collection, same selectors, and same options.
This means that you could create hundreds of subscriptions against many different publications, but if they are hitting against the same collection and using the same query selectors then you effectively only have 1 observe in play. For more details, I highly recommend reading this article from kadira.io (from which I acquired the information I used in this answer).
In addition to this, Meteor is also able to deal with multiple publications publishing the same document, and when this occurs, the documents will be merged into one. See this for more detail.
Lastly, because of Meteor's MergeBox component, it will minimize the data being sent over the wire across all your subscriptions by keeping track of what data changed vs. what is already on the client.
Therefore, in your specific example, it sounds like you will be running several different subscriptions on effectively the same query (since you are just trying to de-normalize your data) and dataset. Because of all the optimizations that I described above, I would guess that you won't be plagued by performance issues by taking this approach.
I have done similar things in one of my apps and have never had an issue.

How to design MongoDB data model to store Event Sourcing events

If I create a single table (or document in document databases) per aggregate type,I can merge databases or shard them whenever I refactor the write side's microservices, and as the result the application becomes more scalable, and it also increases the speed of loading events.
Are there any side effects I should be aware of while I'm designing the event store like that?
Edit:
I'm currently using MongoDb.
What if I create a collection per aggregate id ?
Or a database per aggregate type, and a collection per aggregate id ...?
Is that problematic in performance, ease of data administration, maintainability, or further scalability?
If I create a single table (or document in document databases),I can merge databases or shard them whenever I refactor the write microservices, and as the result the application becomes more scalable.
Are there any side effects I should be aware of while I'm designing the event store like that?
I haven't seen any authoritative discussion of that design.
There was a discussion in the event sourcing community about having a separate table for each type of aggregate. You can find that discussion here. Executive summary: the more experienced practitioners seemed to be startled that anybody would do that on purpose.
One thing that you should keep in mind is that while events are real (they describe something of interest to the business), aggregates are artificial. You are probably going to be unhappy if redesigning your aggregate boundaries requires that you move your events all over the place.
The following may be helpful
https://github.com/NEventStore/NEventStore.Persistence.MongoDB
http://www.slideshare.net/dbellettini/cqrs-and-event-sourcing-with-mongodb-and-php
http://blingcode.blogspot.com/2010/12/cqrs-building-transactional-event-store.html

How to live without transactions?

Most of the popular NoSQL databases (MongoDB, RethinkDB) do not support ACID transactions. They are very popular today within developers of different systems.
The problem is: how to guarantee data consistency without transactions?
I thought that data consistency is one of the main things in production. Am I wrong?
Maybe there is some technics to restore data consistency?
I would like to use RethinkDB for my project, but I'm scare about missed transactions.
I do not know much about RethinkDB, so this answer is primarily based on MongoDB.
while MongoDB can not provide atomic operations on multiple documents at the same time, it does guarantee atomicity for a single operation which affects one document. That means when one query changes multiple fields of the same document, you can be sure that all these changes will be performed at the same time. Combined with the MongoDB philosophy of keeping a consistent dataset in one document instead of spreading it over many rows of different related tables, this removes many situations where you would need transactions in a relational database.
not every project needs complex transactions. Sure, there are some domains where it is essential (like most situations where you deal with money), but in other cases it isn't actually that big of a deal when some data is inconsistent for a few milliseconds. You need to consider how important data consistency is for your project. When you come to the conclusion that there are many situations where you do need transactions, then by all means, stick to SQL.
In a pinch, MongoDB can simulate multi-document transactions by using the two-phase commit model. It's not easy to implement, it's not easy to work with, it does not result in a pretty data model, but it is a valid workaround when you have a project which would be perfect for MongoDB in all regards except for that one use-case which just can't do without transactions.
A lot of popular NoSQL data stores don't support atomic multi-key updates (transactions) of the box but most of them provide primitives which allow you to build ACID transactions on the application level.
If a data store supports per key linearizability and compare-and-set operation (atomic document updates) then it's enough to implement serializable client-side transactions. For example, this approach is used in Google's Percolator and in CockroachDB database.
In my blog I created step-by-step visualization of serializable cross shard client-side transactions, described the major use cases and provided links to the variants of the algorithm. I hope it will help you to understand how to work with transactions with NoSQL data stores.
Among the data stores which support per key linearizability and CAS are:
Cassandra with lightweight transactions
Riak with consistent buckets
RethinkDB
ZooKeeper
Etdc
HBase
DynamoDB
MongoDB
By the way, if you're fine with Read Committed isolation level then it makes sense to take a look on RAMP transactions by Peter Bailis. They can be also implemented with the same set of primitives.
In RethinkDB, you have some guanrantee for atomicity. According to the document https://rethinkdb.com/docs/architecture/
Write atomicity is supported on a per-document basis – updates to a
single JSON document are guaranteed to be atomic. RethinkDB is
different from other NoSQL systems in that atomic document updates
aren’t limited to a small subset of possible operations – any
combination of operations that can be performed on a single document
is guaranteed to update the document atomically
When you want to run a non-atomic update, you have to explicitly opt in for it, according to https://www.rethinkdb.com/api/javascript/update/
nonAtomic: if set to true, executes the update and distributes the
result to replicas in a non-atomic fashion. This flag is required to
perform non-deterministic updates, such as those that require reading
data from another table.
It has an issue to track some Transaction support for RethinkDB here: https://github.com/rethinkdb/rethinkdb/issues/4598
Anyway, you don't have good transaction but you have some basic guarantee that is enough for you. And try to design your operation around those basic thing.

MongoDB realtime query

I've heard about RethinkDB and since I'm developing a multi-player online game I think if MongoDB pushes the changes (let's say new rows) instead of pulling rows, it would be much faster for both server-side and client-side.
Is there any wrapper or techniques to make a realtime query to MongoDB or not?
You can leverage tailable cursors on capped collections. At the lowest level, that would require writing all changes to the capped collection first, then have them be applied by some kind of worker (an event sourcing pattern). That's a severe change of application architecture, so it's probably not what you want.
A more generic approach is to watch the oplog, a special capped collection that is used to synchronize master and secondary nodes and that contains all operations performed on documents, so no change in application architecture is required.
Still, this is somewhat more low-level than what RethinkDB exposes, in particular because you need to perform a diff. There are wrappers that can hide some of the complexity, but I haven't used them and I don't know what programming language you're using. Oplog monitoring is used, for example, by Meteor, which is pretty much built on publish/subscribe and hides most of the complexity, so it's generally possible, though it seems it's more complicated than with RethinkDB.

Port From Entity Framework to MongoDB

I'm planing to port from entity framework 4.0 to MongoDb. What are the best practices that can minimize the impact since the project is having social networking functionality hence, maintain a complex relational database.As a result, performance should be a matter if we use
relational database.
We have used domain Layer(using POCO), repository pattern and DTO Mapping in the project.Also,
What are the advantages and disadvantages of the decision ? At the same time, how it affect to my domain layer implementation ?
If you want to 'minimize impact' you'll want to create a database in MongoDB the one you have in SQL. Since there are no joins in the database you'll need to do multiple reads to complete your query. In itself that's not too bad because MongoDB is really fast, but obviously it has other issues (concurrency, etc.).
If, however, you want to move over fully to the NOSQL-way of doing things you'll likely not be able to 'minimize impact', you'll need to make substantial changes to the way you store content, the way you access it and the way you update it.
Storage: You'll likely create documents in your database that are denormalized and much closer to 'ViewModels' than 'Models'. You might for example store a count of child records in a parent record so that you can display it without having to load them or count them.
Access: You might end up using Map-Reduce for some queries to your database which is a very different mind-set from a traditional query.
Updates: In all likelihood your approach to updating will be different in order to take advantage of the many fine-grained MongoDB update features like $inc. Instead of posting back some large view model and then applying it to your model and then updating the database you might instead provide a much finer-grained Ajax call back that updates a single value. Take a look at CQRS for more ideas on how to think about models for updates vs queries.