CQRS Read Models & REST API - rest

We are implementing a REST API over our CQRS services. We of course don't want to expose any of our domain to users of the REST APIs.
However, a key tenant of CQRS is that the read models generally correspond to a specific view or screen.
With that being the case, it seems logical that the resources in our REST API, will map virtually 1:1 with the read / view models from our queries (where the queries return a DTO containing all the data for the view). Technically this is exposing a part of our domain (the read models - although returned as DTOs). In this case, this seems to be what we want. Any potential downsides to being so closely coupled?
In terms of commands, I have been considering an approach like:
https://www.slideshare.net/fatmuemoo/cqrs-api-v2. There is a slide that indicates that commands are not first class citizens. (See slide 26). By extension, am I correct in assuming that the DTOs returned from my queries will always be the first class citizens, which will then expose the commands that can be executed for that screen?
Thanks

Any potential downsides to being so closely coupled?
You need to be a little bit careful in terms of understanding the direction of your dependencies.
Specifically, if you are trying to integrate with clients that you don't control, then you are going to want to agree upon a contract -- message semantics and schema -- that you cannot change unilaterally.
Which means that the representations are relatively fixed, but you have a lot of freedom about about how you implement the production of that representation. You make a promise to the client that they can get a representation of report 12345, and it will have some convenient layout of the information. But whether that representation is something you produce on demand, or something that you cache, and how you build it is entirely up to you.
At this level, you aren't really coupling your clients to your domain model; you are coupling them to your views/reports, which is to say to your data model. And, in the CQRS world, that coupling is to the read model, not the write model.
In terms of commands, I have been considering an approach like...
I'm going gently suggest that the author, in 2015, didn't have a particularly good understanding of REST by today's standards.
The basic problem here is that the author doesn't recognize that caching is a REST constraint; and the design of our HTTP protocols needs to consider how general purpose components understand cache invalidation.
Normally, for a command (meaning here "a message intended to change the representation of the resource"), you normally want the target-uri of the HTTP request to match the identifier of the primary resource that changes.
POST /foo/123/command
Isn't particularly useful, from the perspective of cache invalidation, if nobody ever sends a GET /foo/123/command request.

Related

Shall REST API Design pay Respect to Inheritance?

This is more of a question regarding design rather than implementation. I have been working on a personal application with Python Flask backend and Vue.js SPA frontend for quite some time now and am now a bit puzzled on how to design the actual REST API.
My backend heavily depends on inheritance, e.g. there's the broad notion of an Account and then there is the ManagedAccount, exhibiting all properties of the former but also additional ones. Then there are Transactions, PredictedTransactions, RecurringTransactions etc., I guess you get the gist.
I read up on REST API design beforehand, e.g. on the Overflow blog and before I was working with IDs, I had the following design:
/accounts/ [GET] - returns all Accounts including ManagedAccounts with Account-specific properties
/accounts/managed/ [GET] - returns all ManagedAccounts with ManagedAccount-specific properties
/accounts/ [POST] - creates a new Account
/accounts/managed/ [POST] - creates a new ManagedAccount
Now, this worked quite well until I introduced ID-dependent functions, e.g.
/accounts/<:id> [PATCH] - modifies the received data in Account with given ID, can also be used to alter Account-specific properties in ManagedAccounts
/accounts/managed/<:id> [PATCH] - modifies the received data in ManagedAccount with given ID
/accounts/<:id> [DELETE] - deletes an Account, can also be used for ManagedAccount
For all IDs I am using RFC4122-compliant UUIDs in version 4, so there definitely will never be a collision within the route. Still, this behavior triggers an assertion error with Flask, which can be circumvented, but got me to rather double-check whether this design approach has any pitfalls I have not yet considered.
Two alternatives came to my mind:
Segregating the parent classes under a separate path, e.g. using /accounts/all/ for Account. This requires knowing the inheritance depth beforehand (e.g. would need breaking changes if I were to introduce a class inheriting from ManagedAccount).
Not incorporating inheritance in the routes, e.g. using /managed_accounts/ or /accounts_managed/. This would be the easiest method implementation-wise, but would - in my opinion - fail to give an accurate representation of the data model.
From the beginning on I want to give this API consistent guarantees as defined in the backend data model, e.g. for the Transaction I can currently assure that /transactions/executed/ and /transactions/predicted/ return mutually exclusive sets that, when joined, equal that returned by /transactions/.
Right now I am able to effectively utilize this on front-end side, e.g. poll data from Account in general and get specifics for ManagedAccount when required. Is the design I am currently pursuing a 'good' REST API design or am I clinging to concepts that are not applicable in this domain? If so, which design approach would make more sense and especially, why and in what situations would it be beneficial to the current one?
REST doesn't care what spellings you use for your URI, so long as your identifiers are consistent with the production rules defined in RFC 3986.
/d501c2c2-5729-469c-8eed-3643e1b06504
That UUID, of itself, is a perfectly satisfactory identifier for REST purposes.
This is more of a question regarding design
Yes, exactly -- the machines don't care what spellings you use for your resource identifiers, which gives you some freedom to address other concerns. The usual choice here is to make them easier for humans (easier for people to implement, easier for people to document, easier for operators to recognize in a log... lots of possibilities).
"No ambiguities in our automated routing" is a perfectly reasonable design constraint.
Path segments are used for the hierarchical parts of an identifier. There's no particular reason, however, that your identifier hierarchy needs to be aligned with "inheritance" hierarchy.
One long standing design principle on the web is that Cool URI Don't Change; so before introducing inheritance cues in your identifiers, be sure to consider how stable your inheritance design is.

REST API URL design for only single property from performance viewpoint

I am developing REST API and Frontend as a microservice. I know some basic principles of URL design, but there is a performance issue and I'm not sure how to deal with it.
For the convenience of displaying the webpage, I'd like to get certain information about more than 100 resources per page. (Actually, BFF exists as an orchestration layer)
Since the target resource includes the aggregation result from a large amount of database record, it takes about 3 seconds per request. However, the information I want on the webpage is only a part of it, and it doesn't require complex aggregation to get it, and that makes the response time much shorter.
Take a case as an example.
There is a resource of article, and return the resource data in articles/:id containing a complex aggregation. But in this case, all I need is a count of comments, which can be quickly obtained by issuing a SQL count statement without a counter cache.
However, when examining REST API design, I've never seen a case where a GET request that returns only a specific field.
And in microservices, API should only return resource state in loosely coupled situation, so I think it shouldn't be focused on specific fields.
What kind of URL design or performance optimization can be considered in the face of performance problems?
REST is the architectural style of the world wide web; the style and the web were developed in parallel in the 1990s. Given the usage patterns and technical constraints of the time, the attention of the style is focused more on the caching of large grained documents, rather than trying to reduce the latency of transport.
So you would be more likely to design your representation so that count is present somewhere in the representation, and addressable to that you can call attention to it. Thus: fragments.
So if you already had a resource with the identifier /articles, and having the comment count was important enough, then you might treat the representation like a DTO (Data Transfer Object), and simply include the comment count in the representation, accessible via some identifier like /articles#comment-count.
That's not necessarily a great fit for your use case.
An alternative is to just introduce a stand-alone comment count resource.
Any information that can be named can be a resource -- Fielding, 2000
If you are actually doing REST, then the spelling of the URI don't matter (consider - when is the last time you cared where the google search form actually submitted your query?). The identifiers are used as identifiers, general purpose clients don't try to extract semantics from the identifiers.
So using /d8a496c4-51c5-4eeb-8cbd-d5e777cbdee7 as your identifier for the comment-count should "just work".
It's not particularly friendly to the human reader, of course, so you might prefer something else. URI design is, in this sense, a lot like choosing a good variable name -- the machines don't care, so me make choices that are easier for the human beings to manage. That normally means choosing a spelling that is "consistent" with the other spellings in your API.
RFC 3986 introduces distinctions between the path, the query, and the fragment, that you can expect general-purpose components to understand; one of the potentially important distinctions is that reference resolution describes how a general purpose component can compute a new identifier from a base uri and a relative reference.
/articles/comment-count + ./1 -> /articles/1

REST design principles: Referencing related objects vs Nesting objects

My team and I we are refactoring a REST-API and I have come to a question.
For terms of brevity, let us assume that we have an SQL database with 4 tables: Teachers, Students, Courses and Classrooms.
Right now all the relations between the items are represented in the REST-API through referencing the URL of the related item. For example for a course we could have the following
{ "id":"Course1", "teacher": "http://server.com/teacher1", ... }
In addition, if ask a list of courses thought a call GET call to /courses, I get a list of references as shown below:
{
... //pagination details
"items": [
{"href": "http://server1.com/course1"},
{"href": "http://server1.com/course2"}...
]
}
All this is nice and clean but if I want a list of all the courses titles with the teachers' names and I have 2000 courses and 500 teachers I have to do the following:
Approximately 2500 queries just to read the data.
Implement the join between the teachers and courses
Optimize with caching etc, so that I will do it as fast as possible.
My problem is that this method creates a lot of network traffic with thousands of REST-API calls and that I have to re-implement the natural join that the database would do way more efficiently.
Colleagues say that this is approach is the standard way of implementing a REST-API but then a relatively simple query becomes a big hassle.
My question therefore is:
1. Is it wrong if we we nest the teacher information in the courses.
2. Should the listing of items e.g. GET /courses return a list of references or a list of items?
Edit: After some research I would say the model I have in mind corresponds mainly to the one shown in jsonapi.org. Is this a good approach?
My problem is that this method creates a lot of network traffic with thousands of REST-API calls and that I have to re-implement the natural join that the database would do way more efficiently. Colleagues say that this is approach is the standard way of implementing a REST-API but then a relatively simple query becomes a big hassle.
Your colleagues have lost the plot.
Here's your heuristic - how would you support this use case on a web site?
You would probably do it by defining a new web page, that produces the report you need. You'd run the query, you the result set to generate a bunch of HTML, and ta-da! The client has the information that they need in a standardized representation.
A REST-API is the same thing, with more emphasis on machine readability. Create a new document, with a schema so that your clients can understand the semantics of the document you return to them, tell the clients how to find the target uri for the document, and voila.
Creating new resources to handle new use cases is the normal approach to REST.
Yes, I totally think you should design something similar to jsonapi.org. As a rule of thumb, I would say "prefer a solution that requires less network calls". It's especially true if amount of network calls will be less by order of magnitude.
Of course it doesn't eliminate the need to limit the request/response size if it becomes unreasonable.
Real life solutions must have a proper balance. Clean API is nice as long as it works.
So in your case I would so something like:
GET /courses?include=teachers
Or
GET /courses?includeTeacher=true
Or
GET /courses?includeTeacher=brief|full
In the last one the response can have only the teacher's id for brief and full teacher details for full.
My problem is that this method creates a lot of network traffic with thousands of REST-API calls and that I have to re-implement the natural join that the database would do way more efficiently. Colleagues say that this is approach is the standard way of implementing a REST-API but then a relatively simple query becomes a big hassle.
Have you actually measured the overhead generated by each request? If not, how do you know that the overhead will be too intense? From an object-oriented programmers perspective it may sound bad to perform each call on their own, your design, however, lacks one important asset which helped the Web to grew to its current size: caching.
Caching can occur on multiple levels. You can do it on the API level or the client might do something or an intermediary server might do it. Fielding even mad it a constraint of REST! So, if you want to comply to the REST architecture philosophy you should also support caching of responses. Caching helps to reduce the number of requests having to be calculated or even processed by a single server. With the help of stateless communication you might even introduce a multitude of servers that all perform calculations for billions of requests that act as one cohesive system to the client. An intermediary cache may further help to reduce the number of requests that actually reach the server significantly.
A URI as a whole (including any path, matrix or query parameters) is actually a key for a cache. Upon receiving a GET request, i.e., an application checks whether its current cache already contains a stored response for that URI and returns the stored response on behalf of the server directly to the client if the stored data is "fresh enough". If the stored data already exceeded the freshness threshold it will throw away the stored data and route the request to the next hop in line (might be the actual server, might be a further intermediary).
Spotting resources that are ideal for caching might not be easy at times, though the majority of data doesn't change that quickly to completely neglect caching at all. Thus, it should be, at least, of general interest to introduce caching, especially the more traffic your API produces.
While certain media-types such as HAL JSON, jsonapi, ... allow you to embed content gathered from related resources into the response, embedding content has some potential drawbacks such as:
Utilization of the cache might be low due to mixing data that changes quickly with data that is more static
Server might calculate data the client wont need
One server calculates the whole response
If related resources are only linked to instead of directly embedded, a client for sure has to fire off a further request to obtain that data, though it actually is more likely to get (partly) served by a cache which, as mentioned a couple times now throughout the post, reduces the workload on the server. Besides that, a positive side effect could be that you gain more insights into what the clients are actually interested in (if an intermediary cache is run by you i.e.).
Is it wrong if we we nest the teacher information in the courses.
It is not wrong, but it might not be ideal as explained above
Should the listing of items e.g. GET /courses return a list of references or a list of items?
It depends. There is no right or wrong.
As REST is just a generalization of the interaction model used in the Web, basically the same concepts apply to REST as well. Depending on the size of the "item" it might be beneficial to return a short summary of the items content and add a link to the item. Similar things are done in the Web as well. For a list of students enrolled in a course this might be the name and its matriculation number and the link further details of that student could be asked for accompanied by a link-relation name that give the actual link some semantical context which a client can use to decide whether invoking such URI makes sense or not.
Such link-relation names are either standardized by IANA, common approaches such as Dublin Core or schema.org or custom extensions as defined in RFC 8288 (Web Linking). For the above mentioned list of students enrolled in a course you could i.e. make use of the about relation name to hint a client that further information on the current item can be found by following the link. If you want to enable pagination the usage of first, next, prev and last can and probably should be used as well and so forth.
This is actually what HATEOAS is all about. Linking data together and giving them meaningful relation names to span a kind of semantic net between resources. By simply embedding things into a response such semantic graphs might be harder to build and maintain.
In the end it basically boils down to implementation choice whether you want to embed or reference resources. I hope, I could shed some light on the usefulness of caching and the benefits it could yield, especially on large-scale systems, as well as on the benefit of providing link-relation names for URIs, that enhance the semantical context of relations used within your API.

Difference between CQRS and CQS

I am learning what is CQRS pattern and came to know there is also CQS pattern.
When I tried to search I found lots of diagrams and info on CQRS but didn't found much about CQS.
Key point in CQRS pattern
In CQRS there is one model to write (command model) and one model to read (query model), which are completely separate.
How is CQS different from CQRS?
CQS (Command Query Separation) and CQRS (Command Query Responsibility Segregation) are very much related. You can think of CQS as being at the class or component level, while CQRS is more at the bounded context level.
I tend to think of CQS as being at the micro level, and CQRS at the macro level.
CQS prescribes separate methods for querying from or writing to a model: the query doesn't mutate state, while the command mutates state but does not have a return value. It was devised by Bertrand Meyer as part of his pioneering work on the Eiffel programming language.
CQRS prescribes a similar approach, except it's more of a path through your system. A query request takes a separate path from a command. The query returns data without altering the underlying system; the command alters the system but does not return data.
Greg Young put together a pretty thorough write-up of what CQRS is some years back, and goes into how CQRS is an evolution of CQS. This document introduced me to CQRS some years ago, and I find it a still very useful reference document.
This is an old question but I am going to take a shot at answering it. I do not often answer questions on StackOverflow, so please forgive me if I do something outside the bounds of community in terms of linking to things, writing a long answer, etc.
There are many differences between CQRS and CQS however CQRS uses CQS inside of its definition! Let's start with defining the two and then we can discuss differences.
CQS defines two types of messages depending on their return value: no return value (void) specifies this is a Command; a return value (non-void) specifies this method is a Query.
Commands change information
Queries return information
Commands change state. Queries do not.
Now for CQRS, which uses the same definition as CQS for Commands and Queries. What CQRS says is that we do not want one object with Command and Query methods. Instead we want two objects: one with all the Commands and one with all the Queries.
The idea overall is very simple; it's everything after doing this where things become interesting. There are numerous talks online, of me discussing some of the attributes associated (sorry way too much to type here!).
CQS is about Command and Queries. It doesn't care about the model. You have somehow separated services for reading data, and others for writing data.
CQRS is about separate models for writes and reads. Of course, usage of write model often requires reading something to fulfill business logic, but you can only do reads on read model. Separate Databases are state of the art. But imagine single DB with separate models for read and writes modeled in ORM. It's very often good enough.
I have found that people often say they practice CQRS when they have CQS.
Read the inventor Greg Young's answer
I think, like "Dependency Injection" the concepts are so simple and taken for granted that the fact that they have fancy names seems to drive people to think they're something more than they are, especially as CQRS is often quoted alongside Event Sourcing.
CQS is the separation of methods that read to those that change state; don't do both in a single method. This is micro level.
CQRS extends this concept into a higher level for machine-machine APIs, separation of the message models and processing paths.
So CQRS is a principle you apply to the code in an API or facade.
I have found CQRS to essentially be a very strong S in SOLID, pushing the separation deeply into the psyche of developers to produce more maintainable code.
I think web applications are a bad fit for CQRS since the mutation of state via representation transfer means the command and query are two sides of the same request-response. The representation is a command and the response is the query.
For example, you send an order and receive a view of all your orders.
Imagine if the code of a website was factored into a command side and query side. The route action handling code would need to fall into one of those sides, but it does both.
Imagining a stronger segregation, if the code was moved into two different compilable codebases, then the website would accept a POST of a form, but the user would have to browse to another website URL to see the impact of the action. This is obviously crazy. One workaround would be to always redirect, though this wouldn't really be RESTful since the ideal REST application is where the next representation contains hypertext to drive the next state transition and so on.
Given that a website is a REST API between human and machine (or machine and machine), this also includes REST APIs, though other types of HTTP message passing API may be a perfect fit for CQRS.
A service or facade within the bounds of the website may obviously work well with CQRS, though the action handlers would sit outside this boundary.
See CQS on Wikipedia
The biggest difference is CQRS uses separate data stores for commands and queries. A query store can use a different technology like a document database or just be a denormalized schema in the same database that makes querying the data easier.
The data between databases is usually copied asynchronously using something like a service bus. Therefore, data in the query store is eventually consistent (is going to be there at some point of time). Applications need to account for that. While it is possible to use the same transaction (same database or a 2-phase commit) to write in both stores, it is usually not recommended for scalability reasons.
CQS architecture reads and writes from the same data store/tables.

Is this generic REST service a good idea?

I have a question on whether or not a particular REST-service design is good or not.
The background is of having an inhouse monolithic system (will call this "the main system") dealing with e.g. customers. Then there are external components that have additional information on persons, which may or may not correspond 1-1 with a customer in the main system.
At present there is no definite specification of what kind of data is or may be associated with a person/customer in these external components.
The proposed design I have been presented with is a REST- service that exposes an API for the external system to call in order to feed the component with this arbitrary data associated with persons.
The idea is that by doing so the main system will have a single place to go to, to get the external data for customers/persons.
A proposed requirement of this REST service is that as new types of data is loaded into it by an external component, this data is automatically made accessible by the service, without it needing to be changed in any way, or redeployed. And "new data" generally means a new type of key value set. E.g. initially the service might provide data for customer identified by a customerId. Then an external component decides to post some kind of data associated to SSN. This should automatically entail that the service can be queried for this data by supplying SSN in the request.
In order to avoid the need to change/redeploy the service I’m assuming the solution will ahve to have a very generic scheme of reference, e.g.
http://url/generic-resource-name/?id=[customerId]&keyType=cusomterId
There is really nothing in the requirements that limits the data to be associated to a person, only that it’s key be made up of one value.
And example use case sequence could be:
So to the question:
Is it a good idea to implement such a general purpose service? And how does it rhyme with the principles of REST: the noun in question that the service will operate on will have to be very generic, really nothin short of “resource” or “data”, which in itself seems like a smell to me.
So to the question: Is it a good idea to implement such a general
purpose service?
I believe not. You are going straight into Inner platform effect antipattern. You must be very careful, or you might end like Vision.
Please also read a chapter "The allure of distributed objects" from Fowler's PoEAA book. Just to be careful.