Nested queries with MediatR/CQRS - cqrs

I'm just wondering if the design I will be trying to implement is valid CQRS.
I'm going to have a query handler that itself will send more queries to other sub-handlers. Its main task is going to aggregate results from multiple services.
Is this ok to send queries from within handlers? I can already think of 3 level deep hierachies of these in my application.

No, MediatR is designed for a single level of requests and handlers. A better design may be to create a service/manager of some kind which invokes multiple, isolated queries using MediatR and aggregate the results. The implementation may be similar to what you have in mind, except that it's not a request handler itself but rather an aggregation of multiple request handlers.

This will badly affect the system's resilience and compute time and it will increase coupling.
If any of the sub-handlers fails then the entire handler will fail. If the queries are send in a synchronous way then the total compute time is sum of the individual queries times.
One way to reuse the sub-handlers is to query them in the background, outside the client's request, it possible. In this way, when a client request comes you already have the data locally, increasing the resilience and compute time. You will be left only with the coupling but it could worth it if the reuse is heavier than the coupling.
I don't know if any of this is possible in MediatR, there are just general principles of system architecture.

Related

Query vs Transaction

In this picture, we can see saga is the one that implements transactions and cqrs implements queries. As far as I know, a transaction is a set of queries that follow ACID properties.
So can I consider CQRS as an advanced version of saga which increases the speed of reads?
Note that this diagram is not meant to explain what Sagas and CQRS are. In fact, looking at it this way it is quite confusing. What this diagram is telling you is what patterns you can use to read and write data that spans multime microservices. It is saying that in order to write data (somehow transactionally) across multiple microservices you can use Sagas and in order to read data which belongs to multiple microservices you can use CQRS. But that doesn't mean that Sagas and CQRS have anything in common. They are two different patterns to solve completely different problems (reads and writes). To make an analogy, it's like saying that to make pizzas (Write) you can use an oven and to view the pizzas menu (Read) you can use a tablet.
On the specific patterns:
Sagas: you can see them as a process manager or state machine. Note that they do not implement transactions in the RDBMS sense. Basically, they allow you to create a process that will take care of telling each microservice to do a write operation and if one of the operations fails, it'll take care of telling the other microservices to rollback (or compensate) the action that they did. So, these "transactions" won't be atomic, because while the process is running some microservices will have already modified the data and others won't. And it is not garanteed that whatever has succeed can sucessfully be rolled back or compensated.
CQRS (Command Query Responsibility Segregation): suggests the separation of Commands (writes) and Queries (Reads). The reason for that, it is what I was saying before, that the reads and writes are two very different operations. Therefore, by separating them, you can implement them with the patterns that better fit each scenario. The reason why CQRS is shown in your diagram as a solution for reading data that comes from multiple microservices is because one way of implementing queries is to listen to Domain Events coming from multiple microservices and storing the information in a single database so that when it's time to query the data, you can find it all in a single place. An alternative to this would be Data Composition. Which would mean that when the query arrives, you would submit queries to multiple microservices at that moment and compose the response with the composition of the responses.
So can I consider CQRS as an advanced version of saga which increases the speed of reads?
Personally I would not mix the concepts of CQRS and Sagas. I think this can really confuse you. Consider both patterns as two completely different things and try to understand them both independently.

Do Firebase/Firestore Transactions create internal queues?

I'm wondering if transactions (https://firebase.google.com/docs/firestore/manage-data/transactions) are viable tools to use in something like a ticketing system where users maybe be attempting to read/write to the same collection/document and whoever made the request first will be handled first and second will be handled second etc.
If not what would be a good structure for such a need with firestore?
Transactions just guarantee atomic consistent update among the documents involved in the transaction. It doesn't guarantee the order in which those transactions complete, as the transaction handler might get retried in the face of contention.
Since you tagged this question with google-cloud-functions (but didn't mention it in your question), it sounds like you might be considering writing a database trigger to handle incoming writes. Cloud Functions triggers also do not guarantee any ordering when under load.
Ordering of any kind at the scale on which Firestore and other Google Cloud products operate is a really difficult problem to solve (please read that link to get a sense of that). There is not a simple database structure that will impose an order where changes are made. I suggest you think carefully about your need for ordering, and come up with a different solution.
The best indication of order you can get is probably by adding a server timestamp to individual documents, but you will still have to figure out how to process them. The easiest thing might be to have a backend periodically query the collection, ordered by that timestamp, and process things in that order, in batch.

Using "Cursors" for paging in PostgreSQL [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
How to provide an API client with 1,000,000 database results?
Wondering of the use of Cursors is a good way to implement "paging" using PostgreSQL.
The use case is that we have upwards 100,000 rows that we'd like to make available to our API clients. We thought a good way to make this happen would be to allow the client to request the information in batches ( pages ). The client could request 100 rows at a time. We would return the 100 rows as well as a cursor, then when the client was ready, they could request the next 100 rows using the cursor that we sent to them.
However, I'm a little hazy on how cursors work and exactly how and when cursors should be used:
Do the cursors require that a database connection be left open?
Do the cursors run inside a transaction, locking resources until they are "closed"?
Are there any other "gotchas" that I'm not aware of?
Is there another, better way that this situation should be handled?
Thanks so much!
Cursors are a reasonable choice for paging in smaller intranet applications that work with large data sets, but you need to be prepared to discard them after a timeout. Users like to wander off, go to lunch, go on holiday for two weeks, etc, and leave their applications running. If it's a web-based app there's even the question of what "running" is and how to tell if the user is still around.
They are not suitable for large-scale applications with high client counts and clients that come and go near-randomly like in web-based apps or web APIs. I would not recommend using cursors in your application unless you have a fairly small client count and very high request rates ... in which case sending tiny batches of rows will be very inefficient and you should think about allowing range-requests etc instead.
Cursors have several costs. If the cursor is not WITH HOLD you must keep a transaction open. The open transaction can prevent autovacuum from doing its work properly, causing table bloat and other issues. If the cursor is declared WITH HOLD and the transaction isn't held open you have to pay the cost of materializing and storing a potentially large result set - at least, I think that's how hold cursors work. The alternative is just as bad, keeping the transaction implicitly open until the cursor is destroyed and preventing rows from being cleaned up.
Additionally, if you're using cursors you can't hand connections back to a connection pool. You'll need one connection per client. That means more backend resources are used just maintaining session state, and sets a very real upper limit on the number of clients you can handle with a cursor-based approach.
There's also the complexity and overhead of managing a stateful, cursor-based setup as compared to a stateless connection-pooling approach with limit and offset. You need to have your application expire cursors after a timeout or you face potentially unbounded resource use on the server, and you need to keep track of which connections have which cursors for which result sets for which users....
In general, despite the fact that it can be quite inefficient, LIMIT and OFFSET can be the better solution. It can often be better to search the primary key rather than using OFFSET, though.
By the way, you were looking at the documentation for cursors in PL/pgSQL. You want normal SQL-level cursors for this job.
Do the cursors require that a database connection be left open?
Yes.
Do the cursors run inside a transaction, locking resources until they
are "closed"?
Yes unless they are WITH HOLD, in which case they consume other database resources.
Are there any other "gotchas" that I'm not aware of?
Yes, as the above should explain.
For HTTP clients, don't use cursors to implement paging. For scalability, you don't want server resources tied up between requests.
Instead, use LIMIT and OFFSET on your queries; see LIMIT and OFFSET in the Pg docs.
But make sure that the indexing on your tables will support efficient queries of this form.
Design a RESTful API, so that the client can invoke the "next_url" (also passed in the response) to get the next set of rows.

Entity Framework Code First - Reducing round trips with .Load() and .Local

I'm setting up a new application using Entity Framework Code Fist and I'm looking at ways to try to reduce the number of round trips to the SQL Server as much as possible.
When I first read about the .Local property here I got excited about the possibility of bringing down entire object graphs early in my processing pipeline and then using .Local later without ever having to worry about incurring the cost of extra round trips.
Now that I'm playing around with it I'm wondering if there is any way to take down all the data I need for a single request in one round trip. If for example I have a web page that has a few lists on it, news and events and discussions. Is there a way that I can take down the records of their 3 unrelated source tables into the DbContext in one single round trip? Do you all out there on the interweb think it's perfectly fine when a single page makes 20 round trips to the db server? I suppose with a proper caching mechanism in place this issue could be mitigated against.
I did run across a couple of cracks at returning multiple results from EF queries in one round trip but I'm not sure the complexity and maturity of these kinds of solutions is worth the payoff.
In general in terms of composing datasets to be passed to MVC controllers do you think that it's best to simply make a separate query for each set of records you need and then worry about much of the performance later in the caching layer using either the EF Caching Provider or asp.net caching?
It is completely ok to make several DB calls if you need them. If you are affraid of multiple roundtrips you can either write stored procedure and return multiple result sets (doesn't work with default EF features) or execute your queries asynchronously (run multiple disjunct queries in the same time). Loading unrealted data with single linq query is not possible.
Just one more notice. If you decide to use asynchronous approach make sure that you use separate context instance in each asynchronous execution. Asynchronous execution uses separate thread and context is not thread safe.
I think you are doing a lot of work for little gain if you don't already have a performance problem. Yes, pay attention to what you are doing and don't make unnecessary calls. The actual connection and across the wire overhead for each query is usually really low so don't worry about it.
Remember "Premature optimization is the root of all evil".
My rule of thumb is that executing a call for each collection of objects you want to retrieve is ok. Executing a call for each row you want to retrieve is bad. If your web page requires 20 collections then 20 calls is ok.
That being said, reducing this to one call would not be difficult if you use the Translate method. Code something like this would work
var reader = GetADataReader(sql);
var firstCollection = context.Translate<whatever1>(reader);
reader.NextResult();
var secondCollection = context.Translate<whateve2r>(reader);
etc
The big down side to doing this is that if you place your sql into a stored proc then your stored procs become very specific to your web pages instead of being more general purpose. This isn't the end of the world as long as you have good access to your database. Otherwise you could just define your sql in code.

One big call vs. multiple smaller TSQL calls

I have a ADO.NET/TSQL performance question. We have two options in our application:
1) One big database call with multiple result sets, then in code step through each result set and populate my objects. This results in one round trip to the database.
2) Multiple small database calls.
There is much more code reuse with Option 2 which is an advantage of that option. But I would like to get some input on what the performance cost is. Are two small round trips twice as slow as one big round trip to the database, or is it just a small, say 10% performance loss? We are using C# 3.5 and Sql Server 2008 with stored procedures and ADO.NET.
I would think it in part would depend on when you need the data. For instance if you return ten datasets in one large process, and see all ten on the screen at once, then go for it. But if you return ten datasets and the user may only click through the pages to see three of them then sending the others was a waste of server and network resources. If you return ten datasets but the user really needs to see sets seven and eight only after making changes to sets 5 and 6, then the user would see the wrong info if you returned it too soon.
If you use separate stored procs for each data set called in one master stored proc, there is no reason at all why you can't reuse the code elsewhere, so code reuse is not really an issue in my mind.
It sounds a wee bit obvious, but only send what you need in one call.
For example, we have a "getStuff" stored proc for presentation. The "updateStuff" proc calls "getStuff" proc and the client wrapper method for "updateStuff" expects type "Thing". So one round trip.
Chatty servers are one thing you prevent up front with minimal effort. Then, you can tune the DB or client code as needed... but it's hard to factor out the roundtrips later no matter how fast your code runs. In the extreme, what if your web server is in a different country to your DB server...?
Edit: it's interesting to note the SQL guys (HLGEM, astander, me) saying "one trip" and the client guys saying "multiple, code reuse"...
I am struggling with this problem myself. And I don't have an answer yet, but I do have some thoughts.
Having reviewed the answers given by others to this point, there is still a third option.
In my appllication, around ten or twelve calls are made to the server to get the data I need. Some of the datafields are varchar max and varbinary max fields (pictures, large documents, videos and sound files). All of my calls are synchronous - i.e., while the data is being requested, the user (and the client side program) has no choice but to wait. He may only want to read or view the data which only makes total sense when it is ALL there, not just partially there. The process, I believe, is slower this way and I am in the process of developing an alternative approach which is based on asynchronous calls to the server from a DLL libaray which raises events to the client to announce the progress to the client. The client is programmed to handle the DLL events and set a variable on the client side indicating chich calls have been completed. The client program can then do what it must do to prepare the data received in call #1 while the DLL is proceeding asynchronously to get the data of call #2. When the client is ready to process the data of call #2, it must check the status and wait to proceed if necessary (I am hoping this will be a short or no wait at all). In this manner, both server and client side software are getting the job done in a more efficient manner.
If you're that concerned with performance, try a test of both and see which performs better.
Personally, I prefer the second method. It makes life easier for the developers, makes code more re-usable, and modularizes things so changes down the road are easier.
I personally like option two for the reason you stated: code reuse
But consider this: for small requests the latency might be longer than what you do with the request. You have to find that right balance.
As the ADO.Net developer, your job is to make the code as correct, clear, and maintainable as possible. This means that you must separate your concerns.
It's the job of the SQL Server connection technology to make it fast.
If you implement a correct, clear, maintainable application that solves the business problems, and it turns out that the database access is the major bottleneck that prevents the system from operating within acceptable limits, then, and only then, should you start persuing ways to fix the problem. This may or may not include consolidating database queries.
Don't optimize for performance until a need arisess to do so. This means that you should analyze your anticipated use patterns and determine what the typical frequency of use for this process will be, and what user interface latency will result from the present design. If the user will receive feedback from the app is less than a few (2-3) seconds, and the application load from this process is not an inordinate load on server capacity, then don't worry about it. If otoh the user is waiting an unacceptable amount of time for a response (subjectve but definitiely measurable) or if the server is being overloaded, then it's time to begin optimization. And then, which optimization techniques will make the most sense, or be the most cost effective, depend on what your analysis of the issue tells you.
So, in the meantime, focus on maintainability. That means, in your case, code reuse
Personally I would go with 1 larger round trip.
This will definately be influenced by the exact reusability of the calling code, and how it might be refactored.
But as mentioned, this will depend on your exact situation, where maintainability vs performance could be a factor.