Microservice Adapter. One for many or many / countries to one / country. Architectural/deployment decision - deployment

Say, I have System1 that connect to System 2 through the adapter-microservice between them.
System1 -> rest-calls --> Adapter (converts request-response + some extra logic, like validation) -> System2
System1 is more like a monolith, exists for many countries (but it may change).
Question is: From the perspective of MicroService architecture and deployment, should the Adapter be one per country. Say Adapter-Uk, Adappter-AU, etc. Or it should be just the Adapter that could handle many countries at the same time?
I mean:
To have a single system/adapter-service :
Advantage: is having one code-base in or place, the adaptive code-logic between countries in 90 % are the same. Easy to introduce new changes.
Disadvantage: once we deploy the system, and there is a bug, it could affect many countries at the same time. Not safe.
To have a separate system:
Disadvantage: once some generic change has been introduced to one system, then it should be "copy-pasted" for all other countries/services. Repetitive, not smart.. work, from developer point of view.
Advantage:
Safer to change/deploy.
Q: what is a preferable way from the point of view of microservice architecture?

I would suggest the following:
the adapter covers all countries in order to maintain single codebase and improve code reusability
unit and/or integration tests to cope with the bugs
spawn multiple identical instances of the adapter with load balancer in front

Since Prabhat Mishra asked in the comment.
After two years.. (took some time to understand what I have asked.)
Back then, for me was quite critical to have a resilient system, i.e. if I change a code in one adapter I did not want all my countries to go down (it is SAP enterprise system, millions clients). I wanted only once country to go down (still million clients, but fewer millions :)).
So, for this case: i would create many adapters one per country, BUT, I would use some code-generated common solution to create them, like common jar - so I would would not repeat my infrastructure or communication layers. Like scaffolding thing.
country-adapter new "country-1"
add some country specific code (not changing any generated one (less code to repeat, less to support))
Otherwise, if you feel safe (like code reviewing your change, making sure you do not change other countries code), then 1 adapter is Ok.
Another solution, is to start with 1 adapter, and split to more if it is critical (feeling not safe about the potential damage / cost if fails).
In general, seems, all boils down to the same problem, which is: WHEN to split "monolith" to pieces. The answer is always: when it causes problems to be as big as it is. But, if you know your system well, you know in advance that WHEN is now (or not).

Related

When to use multiple KieBases vs multiple KieSessions?

I know that one can utilize multiple KieBases and multiple KieSessions, but I don't understand under what scenarios one would use one approach vs the other (I am having some trouble in general understanding the definitions and relationships between KieContainer, KieBase, KieModule, and KieSession). Can someone clarify this?
You use multiple KieBases when you have multiple sets of rules doing different things.
KieSessions are the actual session for rule execution -- that is, they hold your data and some metadata and are what actually executes the rules.
Let's say I have an application for a school. One part of my application monitors students' attendance. The other part of my application tracks their grades. I have a set of rules which decides if students are truant and we need to talk to their parents. I have a completely unrelated set of rules which determines whether a student is having trouble academically and needs to be put on probation/a performance plan.
These rules have nothing to do with one another. They have completely separate concerns, different rule inputs, and are triggered in different parts of the application. The part of the application that is tracking attendance doesn't need to trigger the rules that monitor student performance.
For this application, I would have two different KieBases: one for attendance, and one for academics. When I need to fire the rules, I fire one or the other -- there is no use case for firing both at the same time.
The KieSession is the runtime for when we fire those rules. We add to it the data we need to trigger the rules, and it also tracks some other metadata that's really not relevant to this discussion. When firing the academics rules, I would be adding to it the student's grades, their classes, and maybe some information about the student (eg the grade level, whether they're an "honors" student, tec.). For the attendance rules, we would need the student information, plus historical tardiness/absence records. Those distinct pieces of data get added to the sessions.
When we decide to fire rules, we first get the appropriate KieBase -- academics or attendance. Then we get a session for that rule set, populate the data, and fire it. We technically "execute" the session, not the rules (and definitely not the rule base.) The rule base is just the collection of the rules; the session is how we actually execute it.
There are two kinds of sessions -- stateful and stateless. As their names imply, they differ with how data is stored and tracked. In most cases, people use stateful sessions because they want their rules to do iterative work on the inputs. You can read more about the specific differences in the documentation.
For low-volume applications, there's generally little need to reuse your KieSessions. Create, use, and dispose of them as needed. There is, however, some inherent overhead in this process, so there comes a point in which reuse does become something that you should consider. The documentation discusses the solution provided out-of-the box for Drools, which is session pooling.
(When trying to wrap your head around this, I like to use an analogy of databases. A session is like a JDBC connection: for small applications you can create them, use them, then close them as you need them. But as you scale you'll quickly find that you need to look into connection pooling to minimize this overhead. In this particular analogy, the rule base would be the database that the rules are executing against -- not the tables!)

CQRS projections, joining data from different aggregates via probe commands

In CQRS when we need to create a custom-tailored projections for our read-models, we usually prefer a "denormalized" projections (assume we are talking about projecting onto a DB). It is not uncommon to have the information need by the application/UI come from different aggregates (possibly from different BCs).
Imagine we need a projected table to contain customer's information together with her full address and that Customer and Address are different aggregates in our system (possibly in different BCs). Meaning that, addresses are generated and maintained independently of customers. Or, in other words, when a new customer is created, there is no guarantee that there will be an AddressCreatedEvent subsequently produced by the system, this event may have already been processed prior to the creation of the customer. All we have at the time of CreateCustomerCommand is an UUID of an existing address.
We have several solutions here.
Enrich CreateCustomerCommand and the subsequent CustomerCreatedEvent to contain full address of the customer (looking up this information on the fly from the UI or the controller). This way the projection handler will just update the table directly upon receiving CustomerCreatedEvent.
Use the addrUuid provided in CustomerCreatedEvent to perform an ad-hoc query in the projection handler to get the missing part of the address information before updating the table.
These are commonly discussed solution to this problem. However, as noted by many others, there are problems with each approach. Enriching events can be difficult to justify as well described by Enrico Massone in this question, for example. Querying other views/projections (kind of JOINs) will work but introduces coupling (see the same link).
I would like describe another method here, which, as I believe, nicely addresses these concerns. I apologize beforehand for not giving a proper credit if this is a known technique. Sincerely, I have not seen it described elsewhere (at least not as explicitly).
"A picture speaks a thousand words", as they say:
The idea is that :
We keep CreateCustomerCommand and CustomerCreatedEvent simple with only addrUuid attribute (no enriching).
In API controller we send two commands to the command handler (aggregates): the first one, as usual, - CreateCustomerCommand to create customer and project customer information together with addrUuid to the table leaving other columns (full address, etc.) empty for time being. (Warning: See the update, we may have concurrency issue here and need to issue the probe command from a Saga.)
Right after this, and after we have obtained custUuid of the newly created customer, we issue a special ProbeAddrressCommand to Address aggregate triggering an AddressProbedEvent which will encapsulate the full state of the address together with the special attribute probeInitiatorUuid which is, of course our custUuid from the previous command.
The projection handler will then act upon AddressProbedEvent by simply filling in the missing pieces of the information in the table looking up the required row by matching the provided probeInitiatorUuid (i.e. custUuid) and addrUuid.
So we have two phases: create Customer and probe for the related Address. They are depicted in the diagram with (1) and (2) correspondingly.
Obviously, we can send as many such "probe" commands (in parallel) as needed by our projection: ProbeBillingCommand, ProbePreferencesCommand, etc. effectively populating or "filling in" the denormalized projection with missing data from each handled "probe" event.
The advantages of this method is that we keep the commands/events in the first phase simple (only UUIDs to other aggregates) all the while avoiding synchronous coupling (joining) of the projections. The whole approach has a nice EDA feeling about it.
My question is then: is this a known technique? Seems like I have not seen this... And what can go wrong with this approach?
I would be more then happy to update this question with any references to other sources which describe this method.
UPDATE 1:
There is one significant flaw with this approach that I can see already: command ProbeAddrressCommand cannot be issued before the projection handler had a chance to process CustomerCreatedEvent. But this is impossible to know from the API gateway (or controller).
The solution would probably involve a Saga, say CustomerAddressJoinProjectionSaga with will start upon receiving CustomerCreatedEvent and which will only then issue ProbeAddrressCommand. The Saga will end upon registering AddressProbedEvent. Or, if many other aggregates are involved in probing, when all such events have been received.
So here is the updated diagram.
UPDATE 2:
As noted by Levi Ramsey (see answer below) my example is rather convoluted with respect to the choice of aggregates. Indeed, Customer and Address are often conceptualized as belonging together (same Aggregate Root). So it is a better illustration of the problem to think of something like Student and Course instead, assuming for the sake of simplicity that there is a straightforward relation between the two: a student is taking a course. This way it is more obvious that Student and Course are independent aggregates (students and courses can be created and maintained at different times and different places in the system).
But the question still remains: how can we obtain a projection containing the full information about a student (full name, etc.) and the courses she is registered for (title, credits, the instructor's full name, prerequisites, etc.) all in the same table, if the UI requires it ?
A couple of thoughts:
I question why address needs to be a separate aggregate much less in a different bounded context, in view of the requirement that customers have an address. If in some other bounded context customer addresses are meaningful (e.g. you want to know "which addresses have more customers" etc.), then that context can subscribe to the events from the customer service.
As an alternative, if there's a particularly strong reason to model addresses separately from customers, why not have the read side prospectively listen for events from the address aggregate and store the latest address for a given address UUID in case there's a customer who ends up with that address. The reliability per unit effort of that approach is likely to be somewhat greater, I would expect.

REST design principles: Referencing related objects vs Nesting objects

My team and I we are refactoring a REST-API and I have come to a question.
For terms of brevity, let us assume that we have an SQL database with 4 tables: Teachers, Students, Courses and Classrooms.
Right now all the relations between the items are represented in the REST-API through referencing the URL of the related item. For example for a course we could have the following
{ "id":"Course1", "teacher": "http://server.com/teacher1", ... }
In addition, if ask a list of courses thought a call GET call to /courses, I get a list of references as shown below:
{
... //pagination details
"items": [
{"href": "http://server1.com/course1"},
{"href": "http://server1.com/course2"}...
]
}
All this is nice and clean but if I want a list of all the courses titles with the teachers' names and I have 2000 courses and 500 teachers I have to do the following:
Approximately 2500 queries just to read the data.
Implement the join between the teachers and courses
Optimize with caching etc, so that I will do it as fast as possible.
My problem is that this method creates a lot of network traffic with thousands of REST-API calls and that I have to re-implement the natural join that the database would do way more efficiently.
Colleagues say that this is approach is the standard way of implementing a REST-API but then a relatively simple query becomes a big hassle.
My question therefore is:
1. Is it wrong if we we nest the teacher information in the courses.
2. Should the listing of items e.g. GET /courses return a list of references or a list of items?
Edit: After some research I would say the model I have in mind corresponds mainly to the one shown in jsonapi.org. Is this a good approach?
My problem is that this method creates a lot of network traffic with thousands of REST-API calls and that I have to re-implement the natural join that the database would do way more efficiently. Colleagues say that this is approach is the standard way of implementing a REST-API but then a relatively simple query becomes a big hassle.
Your colleagues have lost the plot.
Here's your heuristic - how would you support this use case on a web site?
You would probably do it by defining a new web page, that produces the report you need. You'd run the query, you the result set to generate a bunch of HTML, and ta-da! The client has the information that they need in a standardized representation.
A REST-API is the same thing, with more emphasis on machine readability. Create a new document, with a schema so that your clients can understand the semantics of the document you return to them, tell the clients how to find the target uri for the document, and voila.
Creating new resources to handle new use cases is the normal approach to REST.
Yes, I totally think you should design something similar to jsonapi.org. As a rule of thumb, I would say "prefer a solution that requires less network calls". It's especially true if amount of network calls will be less by order of magnitude.
Of course it doesn't eliminate the need to limit the request/response size if it becomes unreasonable.
Real life solutions must have a proper balance. Clean API is nice as long as it works.
So in your case I would so something like:
GET /courses?include=teachers
Or
GET /courses?includeTeacher=true
Or
GET /courses?includeTeacher=brief|full
In the last one the response can have only the teacher's id for brief and full teacher details for full.
My problem is that this method creates a lot of network traffic with thousands of REST-API calls and that I have to re-implement the natural join that the database would do way more efficiently. Colleagues say that this is approach is the standard way of implementing a REST-API but then a relatively simple query becomes a big hassle.
Have you actually measured the overhead generated by each request? If not, how do you know that the overhead will be too intense? From an object-oriented programmers perspective it may sound bad to perform each call on their own, your design, however, lacks one important asset which helped the Web to grew to its current size: caching.
Caching can occur on multiple levels. You can do it on the API level or the client might do something or an intermediary server might do it. Fielding even mad it a constraint of REST! So, if you want to comply to the REST architecture philosophy you should also support caching of responses. Caching helps to reduce the number of requests having to be calculated or even processed by a single server. With the help of stateless communication you might even introduce a multitude of servers that all perform calculations for billions of requests that act as one cohesive system to the client. An intermediary cache may further help to reduce the number of requests that actually reach the server significantly.
A URI as a whole (including any path, matrix or query parameters) is actually a key for a cache. Upon receiving a GET request, i.e., an application checks whether its current cache already contains a stored response for that URI and returns the stored response on behalf of the server directly to the client if the stored data is "fresh enough". If the stored data already exceeded the freshness threshold it will throw away the stored data and route the request to the next hop in line (might be the actual server, might be a further intermediary).
Spotting resources that are ideal for caching might not be easy at times, though the majority of data doesn't change that quickly to completely neglect caching at all. Thus, it should be, at least, of general interest to introduce caching, especially the more traffic your API produces.
While certain media-types such as HAL JSON, jsonapi, ... allow you to embed content gathered from related resources into the response, embedding content has some potential drawbacks such as:
Utilization of the cache might be low due to mixing data that changes quickly with data that is more static
Server might calculate data the client wont need
One server calculates the whole response
If related resources are only linked to instead of directly embedded, a client for sure has to fire off a further request to obtain that data, though it actually is more likely to get (partly) served by a cache which, as mentioned a couple times now throughout the post, reduces the workload on the server. Besides that, a positive side effect could be that you gain more insights into what the clients are actually interested in (if an intermediary cache is run by you i.e.).
Is it wrong if we we nest the teacher information in the courses.
It is not wrong, but it might not be ideal as explained above
Should the listing of items e.g. GET /courses return a list of references or a list of items?
It depends. There is no right or wrong.
As REST is just a generalization of the interaction model used in the Web, basically the same concepts apply to REST as well. Depending on the size of the "item" it might be beneficial to return a short summary of the items content and add a link to the item. Similar things are done in the Web as well. For a list of students enrolled in a course this might be the name and its matriculation number and the link further details of that student could be asked for accompanied by a link-relation name that give the actual link some semantical context which a client can use to decide whether invoking such URI makes sense or not.
Such link-relation names are either standardized by IANA, common approaches such as Dublin Core or schema.org or custom extensions as defined in RFC 8288 (Web Linking). For the above mentioned list of students enrolled in a course you could i.e. make use of the about relation name to hint a client that further information on the current item can be found by following the link. If you want to enable pagination the usage of first, next, prev and last can and probably should be used as well and so forth.
This is actually what HATEOAS is all about. Linking data together and giving them meaningful relation names to span a kind of semantic net between resources. By simply embedding things into a response such semantic graphs might be harder to build and maintain.
In the end it basically boils down to implementation choice whether you want to embed or reference resources. I hope, I could shed some light on the usefulness of caching and the benefits it could yield, especially on large-scale systems, as well as on the benefit of providing link-relation names for URIs, that enhance the semantical context of relations used within your API.

Spring Batch: dynamic composite reader/processor/writer

I've seen this (2010) and this (SO, 2012), but still have not got the answer I need...
Is there an option in Spring Batch to have a dynamic composite reader/processor/writer?
The idea is to have the ability to replace processor at runtime, and in case of multiple processors (AKA composite-processor), to have the option to add/remove/replace/change order of processors. As mentioned, same for reader/writer.
I thought of something like reading the processors list from DB (using cache?) and there the items (beans' names) can be changed. Does this make sense?
EDIT - why do I need this?
There are cases that I use processors as "filters", and it may occur that the business (the client) may change the requirements (yes, it is very annoying) and ask to switch among filters (change the priority).
Other use case is having multiple readers to get the data from different data warehouse, and again - the client changes the warehouse from time to time (integration phase), and I do not want my app to be restarted each and every time. There are many other use cases, of course. plus this.
Thanks
I've started working on this project:
https://github.com/OhadR/spring-batch-dynamic-composite
that implements the requirements in the question above. If someone wanna contribute - feel free!

Migrating to Bounded Context

I currently have a Web API project which currently has all the system processing in the same solution. I'm breaking this out into separate solutions so that they can be ran independently (e.g. an Azure WebJob) as I don't want to have to redploy the Web project if something in the backend has changed.
My issue with this is that even though I have separated the logic they are tied together by a single context so that if I make a change in one I will have to redeploy all as the migrations won't match up.
So that's why I've been looking at Bounded Context and DDD. I'm looking at how to break this up but having trouble understanding how relationships work.
A lot of the site is administrative (i.e. creating entities, no actual processing) so was going to split contexts around this e.g.:
A user adds and maintains currency conversion rates (this is two entities in
total).
A user adds and maintains details on how to process payments (note that is is not processing payments, it only holds information about paypal account details etc).
So I was splitting the context's up by this, does this sound reasonable to start with (there are a lot more like this such as tax bands, charge structures etc)?
If this is the way to go, how do I handle relationships between those two contexts? As an example:
A payment method requires a link to an 'active' currency conversion. I understand I can just have this as an Id, but I need to check it's state so need access to the model.
A currency conversion can only be set to 'Inactive' if there are no payment methods currently using it. Again this needs access to the other model.
So logically the models need access to each other, how would this be included in the context? Can I add navigation properties to a model in a different context? Or should I add it as a separate DbSet and possibly map using a view?
Thanks
So I was splitting the context's up by this, does this sound
reasonable to start with (there are a lot more like this such as tax
bands, charge structures etc)?
"So that they can be ran and deployed independently" may not be a sufficient heuristic to tell when you should split Bounded Contexts. This addresses one aspect of the solution space, but if you haven't looked well enough at the problem space, you'll suffer from a misalignment between BC's and subdomains that can cause a lot of friction. You might end up always deploying a cluster of seemingly unrelated "independently deployable units" together because you didn't realize they talk about the same thing.
Identifying subdomains is the product of distillating your business - separating the big functional areas and defining which parts are your core domain and which are ancillary activities. Each subdomain has its own specific semantics (Ubiquitous Language). In your case, as has been pointed out in the comments, Currency Conversion and Payment Methods might well be part of the same subdomain (Payment?). It does not automatically mean that they should also be in the same BC but it might be a good idea to keep subdomains aligned 1-to-1 with BC's, as additional BC's come at a cost.
Back to deployability, even if it can be one beneficial effect of Bounded Contexts, they are not always so easily translatable in terms of independent units of deployment. Context mapping patterns (Shared Kernel, Customer Supplier, etc.) and BC communication in general can lead to a model, and therefore a part of a codebase, being shared by multiple BC's. Code and API synchronization issues arise that can question a simplistic "deployable free electron" view.
Just because you're using the Bounded Context approach doesn't mean you have to use DDD's tactical patterns (Aggregate Root, invariants, etc.) inside each BC. Using them should be an educated decision to trade solution space complexity off for problem space manageability. If "Currency Conversion can only be set to inactive..." is the only rule pertaining to payment method and currency management in your business, it might not be worth the bother to give that Bounded Context a full-fledged rich domain model. CRUD could be better suited there.