How to retrieve data from another bounded context in ddd?

How to retrieve data from another bounded context in ddd? - rest

Initially, There is an app runs in Desktop, however, the app will run in web platform in the future.
There are some bounded contexts in the app and some of them needs to retrieve data from another. In this case, I don't know which approach I have to use for this case.
I thought of using mediator pattern that a bound context "A" requests data "X" and then mediator call another bound context, like B" " and gets the correct data "X". Finally, The mediator brings data "X" to BC "A".
This scenario will be change when the app runs in web, then I've thought of using a microservice requests data from another microservice using meaditor pattern too.
Do the both approaches are interest or there is another better solution?
Could anyone help me, please?
Thanks a lot!

If you're retrieving data from other bounded contexts through either DB or API calls, your architecture might potentially fall to death star pattern because it introduces unwanted coupling and knowledge to the client context.
A better approach might be is looking at event-driven mechanisms like webhooks or message queues as a way of emitting data that you want to share to subscribing context(s). This is good because it reduces coupling of your bounded context(s) through data replication across contexts which results to higher bounded contexts independence.
This gives you the feeling of "Who cares if bounded context B is not available ATM, bounded context A and C have data they need inside them and I can resume syncing later since my data update related events are recorded on my queue"

The answer to this question breaks down into two distinct areas:
the logical challenge of communicating between different contexts, where the same data could be used in very different ways. How does one context interpret the meaning of the data?
and the technical challenge of synchronizing data between independent systems. How do we guarantee the correctness of each system's behavior when they both have independent copies of the "same" data?
Logically, a context map is used to define the relationship between any bounded contexts that need to communicate (share data) in any way. The domain models that control the data are only applicable with a single bounded context, so some method for interpreting data from another context is needed. That's where the patterns from Evan's book come in to play: customer/supplier, conformist, published language, open host, anti-corruption layer, or (the cop-out pattern) separate ways.
Using a mediator between services can be though of as an implementation of the anti-corruption layer pattern: the services don't need to speak the same language, because there's an independent buffer between them doing the translation. In a microservice architecture, this could be some kind of integration service between two very different contexts.
From a technical perspective, direct API calls between services in different bounded contexts introduce dependencies between those services, so an event-driven approach like what Allan mentioned is preferred, assuming your application is okay with the implications of that (eventual consistency of the data). Picking a messaging platforms that gives you the guarantees necessary to keep the data in sync is important. Most asynchronous messaging protocols guarantee "at least once" delivery, but ordering of messages and de-duplication of repeats is up to the application.
Sometimes it's simpler to use a synchronous API call, especially if you find yourself doing a lot of request/response type messaging (which can happen if you have services sending command-type messages to each other).
A composite UI is another pattern that allows you to do data integration in the presentation layer, by having each component pull data from the relevant service, then share/combine the data in the UI itself. This can be easier to manage than a tangled web of cross-service API calls in the backend, especially if you use something like an IT/Ops service, NGINX, or MuleSoft's Experience API approach to implement a "backend-for-frontend".

What you need is a ddd pattern for integration. BC "B" is upstream, "A" is downstream. You could go for an OHS PL in upstream, and ACL in downstream. In practice this is a REST API upstream and an adapter downstream. Every time A needs the data from B , the adapter calls the REST API and adapts the info returned to A domain model. This would be sync. If you wanna go for an async integration, B would publish events to MQ with the info, and A would listen for those events and get the info.

I want to add-on a comment about analysis in DDD. Exist e several approaches for sending data to analytic.
1) If you have a big enterprise application and you should collect a lot of statistic from all bounded context better move analytic in separate service and use a message queue for send data there.
2) If you have a simple application separate your Analytic from your App in other context and use an event or command to speak with there.

Related

Designing event-based architecture for the customer service

Being a developer with solid experience, i am only entering the world of microservices and event-driven architecture. Things like loose coupling, independent scalability and proper implementation of asynchronous business processes is something that i feel should get simplified as compared with traditional monolith approach. So giving it a try, making a simple PoC for myself.
I am considering making a simple application where user can register, login and change the customer details. However, i want to react on certain events asynchronously:
customer logs in - we send them an email, if the IP address used is new to the system.
customer changes their name, we send them an email notifying of the change.
The idea is to make a separate application that reacts on "CustomerLoggedIn", "CustomerChangeName" events.
Here i can think of three approaches, how to implement this simple functionality, with each of them having some drawbacks. So, when a customer submits their name change:
Store change name Changed name is stored in the DB + an event is sent to Kafkas when the DB transaction is completed. One of the big problems that arise here is that if a customer had 2 tabs open and almost simultaneously submits a change from initial name "Bob" to "Alice" in one tab and from "Bob" to "Jim" in another one, on a database level one of the updates overwrites the other, which is ok, however we cannot guarantee the order of the events to be the same. We can use some checks to ensure that DB update is only done when "the last version" has been seen, thus preventing the second update at all, so only one event will be emitted. But in general case, this pattern will not allow us to preserve the same order of events in the DB as in Kafka, unless we do DB change + Kafka event sending in one distributed transaction, which is anti-pattern afaik.
Change the name in the DB, and use Debezium or similar DB CDC to capture the event and stream it. Here we get a single event source, so ordering problem is solved, however what bothers me is that i lose the ability to enrich the events with business information. Another related drawback is that CDC will stream all the updates in the "customer" table regardless of the business meaning of the event. So, in this case, i will probably need to build a Kafka Streams application to convert the DB CDC events to business events and decouple the DB structure from event structure. The potential benefit of this approach is that i will be able to capture "direct" DB changes in the same manner as those originated in the application.
Emit event from the application, without storing it in the DB. One of the subscribers might to the DB persistence, another will do email sending, etc. The biggest problem i see here is - what do i return to the client? I cannot say "Ok, your name is changed", it's more like "Ok, you request has been recorded and will be processed". In case if the customer quickly hits refresh - he expects to see his new name, as we don't want to explain to the customers what's eventual consistency, do we? Also the order of processing the same event by "email sender" and "db updater" is not guaranteed, so i can send an email before the change is persisted.
I am looking for advices regarding any of these three approaches (and maybe some others i am missing), maybe the usecases when one can be preferrable over others?

It sounds to me like you want event sourcing. In event sourcing, all you need to store is the event: the current state of a customer is derived from replaying the events (either from the beginning of time, or since a snapshot: the snapshot is just an optional optimization). Some other process (there are a few ways to go about this) can then project the events to Kafka for consumption by interested parties. Since every event has a sequence number, you can use the sequence number to prevent concurrent modification (alternatively, the more actor modely event-sourcing implementations can use techniques like cluster sharding in Akka to achieve the same ends).
Doing this, you can have a "write-side" which processes the updates in a strongly consistent manner and can respond to queries which only involve a single customer having seen every update to that point (the consistency boundary basically makes customer in this case an aggregate in domain-driven-design terms). "Read-sides" consuming events are eventually consistent: the latencies are typically fairly short: in this case your services sending emails are read-sides (as would be a hypothetical panel showing names of all customers), but the customer's view of their own data could be served by the write-side.
(The separation into read-sides and write-side (the pluralization is significant) is Command Query Responsibility Segregation, which sometimes gets interpreted as "reads can only be served by a read-side". This is not totally accurate: for one thing a write-side's model needs to be read in order for the write-side to perform its task of validating commands and synchronizing updates, so nearly any CQRS-using project violates that interpretation. CQRS should instead be interpreted as "serve reads from the model that makes the most sense and avoid overcomplicating a model (including that model in the write-side) to support a new read".)

I think I qualify to answer this, having extensively used debezium for simplifying the architecture.
I would prefer Option 2:
Every transaction always results in an event emitted in correct order
Option 1/3 has a corner case, what if transaction succeeds, but application fails to emit the event?
To your point:
Another related drawback is that CDC will stream all the updates in
the "customer" table regardless of the business meaning of the event.
So, in this case, i will probably need to build a Kafka Streams
application to convert the DB CDC events to business events and
decouple the DB structure from event structure.
I really dont think that is a roadblock. The benefit you get is potentially other usecases may crop up where another consumer to this topic may want to read other columns of the table.
Option 1 and 3 are only going to tie this to your core application logic, and that is not doing any favor from simplifying PoV. With option 2, with zero code changes to core application APIs, a developer can independently work on the events, with no need to understand that core logic.

hexagonal architecture and transactions concept

I'm trying to get used to hexagonal architecture and can't get how to implement common practical problems, already realized with different approaches. I think my core problem is to understand level of responsibility extracted to adapter and ports.
Reading articles on the web it is ok with primitive examples like:
we have RepositoryInterface which can be implemented in
mysql/txt/s3/nosql storage
or
we have NotificationSendingInterface and have email/sms/web push realizations
but those are very refined examples and simply interface/realization details separation.
In practice, however, coding service in domain model we usually know interface+realization guarantees more deeply.
For illustration purpose example I decided to ask about storage+transaction pair.
How transaction conception for storage should be implemented in hex architecture?
Assume we have simple crud service interface inside domain level
StorageRepoInterface
save(...)
update(...)
delete(...)
get(...)
and we want some kind of transaction guarantee while working with those methods, e.g. delete+save in one transaction.
How it should be designed and implemented according to hex conception?
Is it should be implemented with some external coordination interface of TransactionalOperation? If yes, then in general, TransactionalOperation must know how to implement transaction guaranty working with all implementations of StorageRepoInterface(mb within additional transaction-oriented operation interface)
If no, then seems there should be explicit transaction guarantees from StorageRepoInterface in the domain level(inside hex) with additional methods?
Either way it is no look so "isolated and interfaced based" as stated.
Can someone point me how to change mindset correctly for such situations or where to read?
Thanks in advance.

In Hex Arch, driver ports are the API of the application, the use case boundary. Use cases are transactional. So you have to control the transactionality at the driver ports methods. You enclose every method in a transaction.
If you use Spring you could use declarative transaction (#Transactional annotation).
Another way is to explicity open a db transaction before the execution of the method, and to close (commit / rollback) it after the method.
A useful pattern for applying transactionality is the command bus, wrapping it with a decorator which enclose the command in a transaction.
Transactions are infraestructure, so you should have a driven port and an adapter implementing the port.
The implementation must use the same db context (entity manager) used by persistence adapters (repositories).
Vaughn Vernon talks about this topic in the "Managing transactions" section (pages 432-437) of his book "Implementing DDD".

Instead of using command bus pattern, you could simply inject a TransactionPort to your command handler (defined at domain level).
The TransactionPort would have two methods (start and commit).
The TransactionAdapter would be your custom implementation (defined at infrastructure level).
Then you could do somethig like:
this.transactionalPort.start();
# Do you stuff
this.transactionalPort.commit();

too many rest api calls in Microservices

Say there are two services,
service A and service B.
Service A needs data from service B to process a request. So as to avoid tight coupling we make a rest API call to the service B instead of directly querying service B's database.
Doesn't making an HTTP call to the service B for every request reduces the response time?
I have seen the other solution to cache the data at service A. I have following questions.
What if the data is rapidly changing?
what if the data is critically important such as user account balance details and there has to be strong consistency.
what about data duplication and data consistency?
By introducing the rest call arent are we introducing a point of failure? what if service B is down?
Also by the increasing requests to service A for that particular API, service B load is also increasing.
Please help me with this.

These are many questions at once, let me try to give a few comments in random order:
If Service A needs data from service B, then B is already a single point of failure, so the reliability question is just moved from B's database to B's API endpoint. It's very unlikely, that this makes a big difference.
A similar argument goes for the latency: A good API layer including caching might even decrease average latency.
Once more the same with load: The data dependency of A on B already includes the load on B's database. And again a good API layer with caching might even help with the load.
So while the decoupling (from tight to loose) brings a lot of advantages, load and reliability are not necessarily in the disadvantages list.
A few words about caching:
Read caching can help a lot with load: Typically a request from A to B should indicate the version of the requested entity, that is available in the cache (possibly none of course), Endpoint B then can just verify if the entity has changed and if not stop all processing and just return an "unchanged" message. B can keep the information, which entities have changed in the immediate past in a much smaller data store than the entities themselves, most likely keeping them in RAM or even in process, speeding up things quite noticeably.
Such a mechanism can much easier be introduced in an API endpoint for B then in the database itself, so querying the API can scale much better than querying the DB.

I guess the first question you should ask yourself is are A and B really two different services - what's the reason for partitioning them in the first place? After all, they seem to be coupled both temporally and by data.
one of the reasons to separate a service into two executables might be the can change independently or serve different access paths, in which case you may want to consider them different aspects of the same service - now this may seem like a distinction without a difference, but it is important when looking at the whole picture and which parts of the system can know about internal structures of others and defending the system into deteriorating to a big ball of mud where every "service" can access any other "service" data and they are all dependent on each other
If these two components are indeed different services, you may also consider moving to a model where service B published data changes actively. This way service A can cache the relevant parts of B's data. B is still the source of truth and A is decoupled from B's availability (depending on the expiration of data)

Defining gRPC RPCs

I'm looking for some suggestions here. The usecase is a networking device (like router) with networking operations performed over gRPC.
Let's say there are "n" model objects, like router, interfaces, routing configuration objects like OSPF etc. Every networking operation, like finally be a CRUD on on or many of the model objects.
Now, when defining this over a gRPC service, there seems to be 2 options:
Define generic gRPC RPCs, like "SET" and "GET". The parameter will be a list of objects and operations. Like SET((router, update), (interface, update)..
Define very specific RPCs. Like "setInterfaceProperty_x", "createOSPFInstance".. And there could be many many such RPCs.
With #2, we are building the application intelligence in the RPCs itself. Every new feature might need new RPCs from this service.
With #1, the RPCs are the means, but the intelligence reside with the application which uses the RPC in a context. The RPC list will be just a very few and doesn't change over time.
What is the preferred approach? Generic RPCs (and keep it very few) or have tens (or more) of operation driven RPCs? I see some opensource projects like P4Runtime take approach #1.
Thanks for your time. I can provide more information if required.

You should use option #2. This puts your interface contract in the proto, rather than in your application. You leave your self many open doors by picking option #2 that would be cumbersome or unsupportable otherwise:
If the API definition of an object doesn't match the internal representation, you need to define a mapping between the two. Suppose you update your internal code to not need InterfaceProperty any more, and it was instead moved to a new field called BetterInterfaceProperties. Option one would force you to keep the old field exposed, while option 2 would allow you to reinterpret the call and do the right thing.
Fine grained access controls are easier with specific methods. All users may be able to set publicProperty, but only admins can set dangerousProperty. By grouping all the fields into a single call (as in #1), your caller has to reinterpret error messages, while option #2 it's more clear why authorization failed.
Smaller return values. Having a method like getSpecificProperty will do much less work than getFullObject. As your data model gets more complex, you will have to include more and more data on return messages. Even if the caller only cares about one thing, they have to wait for all of them. Consider a Database application. The database might have to do several unnecessary queries to fill in fields the client will never read.
There are reason to use #1, but they aren't that valuable until you identify what properties go together and are logically a single RPC. (such as a Get)

CQRS + Event Sourcing: (is it correct that) Commands are generally communicated point-to-point, while Domain Events are communicated through pub/sub?

Didn't know how to shorten that title.
I'm basically trying to wrap my head around the concept of CQRS (http://en.wikipedia.org/wiki/Command-query_separation) and related concepts.
Although CQRS doesn't necessarily incorporate Messaging and Event Sourcing it seems to be a good combination (as can be seen with a lot of examples / blogposts combining these concepts )
Given a use-case for a state change for something (say to update a Question on SO), would you consider the following flow to be correct (as in best practice) ?
The system issues an aggregate UpdateQuestionCommand which might be separated into a couple of smaller commands: UpdateQuestion which is targeted at the Question Aggregate Root, and UpdateUserAction(to count points, etc) targeted at the User Aggregate Root. These are send asynchronously using point-to-point messaging.
The aggregate roots do their thing and if all goes well fire events QuestionUpdated and UserActionUpdated respectively, which contain state that is outsourced to an Event Store.. to be persisted yadayada, just to be complete, not really the point here.
These events are also put on a pub/sub queue for broadcasting. Any subscriber (among which likely one or multiple Projectors which create the Read Views) are free to subscribe to these events.
The general question: Is it indeed best practice, that Commands are communicated Point-to-Point (i.e: The receiver is known) whereas events are broadcasted (I.e: the receiver(s) are unknown) ?
Assuming the above, what would be the advantage/ disadvantage of allowing Commands to be broadcasted through pub/sub instead of point-to-point?
For example: When broadcasting Commands while using Saga's (http://blog.jonathanoliver.com/2010/09/cqrs-sagas-with-event-sourcing-part-i-of-ii/) could be a problem, since the mediation role a Saga needs to play in case of failure of one of the aggregate roots is hindered, because the saga doesn't know which aggregate roots participate to begin with.
On the other hand, I see advantages (flexibility) when broadcasting commands would be allowed.
Any help in clearing my head is highly appreciated.

Yes, for Command or Query there is only one and exactly one receiver (thus you can still load balance), but for Events there could be zero or more receivers (subscribers)

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse