Service broker with SqlNotificationRequest - triggers

I am in the process of evaluating a Service Broker with SQL noticiation for my project. My requirements is User places a order from System A and it will update Order Table. As soon as order is place i need to notify the System B. I have done a quick POC with Trigger , Service Broker and SQLNotificaiton ADO.NET. It is working as i expected.
What i would like to know the group
A) What are the best practices i need to follow for this?
B) What are disadvantages with the above approach if any?
C) Are there any disadvantes of using the Triggers? If so what are those for above approach?
The order table will get order from System A like 1000 to 1500 every day. I also would like to know the performance of above approach.

If what you're trying to do is simply push data from System A to System B, then as long as there are no clients connected to System B, you won't need Sql Notification.
Instead of using triggers you may consider Change Tracking. Take a look at "Real Time Data Integration..." article on Service Broker Team Blog.

Related

Streaming data Complex Event Processing over files and a rather long period

my challenge:
we receive files every day with about 200.000 records. We keep the files for approx 1 year, to support re-processing, etc..
For the sake of the discussion assume it is some sort of long lasting fulfilment process, with a provisioning-ID that correlates records.
we need to identify flexible patterns in these files, and trigger events
typical questions are:
if record A is followed by record B which is followed by record C, and all records occured within 60 days, then trigger an event
if record D or record E was found, but record F did NOT follow within 30 days, then trigger an event
if both records D and record E were found (irrespective of the order), followed by ... within 24 hours, then trigger an event
some pattern require lookups in a DB/NoSql or joins for additional information either to select the record, or to put into the event.
"Selecting a record" can be simple "field-A equals", but can also be "field-A in []" or "filed-A match " or "func identify(field-A, field-B)"
"days" might also be "hours" or "in previous month". Hence more flexible then just "days". Usually we have some date/timestamp in the record. The maximum is currently "within 6 months" (cancel within setup phase)
The created events (preferably JSON) needs to contain data from all records which were part of the selection process.
We need an approach that allows to flexibly change (add, modify, delete) the pattern, optionally re-processing the input files.
Any thoughts on how to tackle the problem elegantly? May be some python or java framework, or does any of the public cloud solutions (AWS, GCP, Azure) address the problem space especially well?
thanks a lot for your help
After some discussions and readings, we'll try first Apache Flink with the FlinkCEP library. From the docs and blog entries it seems to be able to do the job. It also seems AWS's choice, running on their EMR cluster. We didn't find any managed service on GCP nor Azure providing the functionalities. Of course we can always deploy and manage it ourselves. Unfortunately we didn't find a Python framework

DDD: How to solve this using Domain-Driven design?

I'm new to DDD and cutting my teeth on the following exercise. The use case is real, but my attempt to solve it with DDD is purely for learning.
We have multiple Git repos, each containing a file that we call
product spec. The system needs to respond to a HTTP POST by cloning all
the repos, and then update the product spec in those that match some
information in the POST body. System also needs to log the POST request as the cause for updating the product spec.
I'd like to use Aggregates and event sourcing for solving this problem because they seem like a good fit. Event sourcing comes with automatic persistence of the commands, so if I convert the POST body to a command, I get auditing for free.
Problem is, the POST may match multiple product spec. I'm not sure how to deal with that. Should I create a domain service, let it find all the matching product spec and then issue an update command to each? Or should I have the aggregate root do so? If using aggregate root to update multiple entities, it itself needs to be an entity, so what would it be in my problem domain?
The first comment to your question is right (the one of #VoiceOfUnreason): this 'is mostly side effect coordination'.
But I will try to answer your question: How to solve this using DDD / Event Sourcing:
The first aggregate root could just be named: 'MultipleRepoOperations'. This aggregate root has only one stream of events.
The command that fires the whole process could be: 'CloneAndUpdateProdSpecRepos' which carries a list of all the repos to be cloned and updated.
When the aggregate root processes the command it will simply spit a bunch of events of type 'UserRequestedToCloneAndUpdateProdSpec'
The second bounded context manages all the repos, and it its subscribed to all the events from 'MultipleRepoOperations' and will receive each event emitted by it. This bounded context aggregate root can be called: 'GitRepoManagement', and has a stream per repo. Eg: GitRepoManagement-Repo1, GitRepoManagement-Repo215, GitRepoManagement-20158, etc.
'GitRepoManagement' receives each event of type 'UserRequestedToCloneAndUpdateProdSpec', replays its corresponding repo stream in order to rehydrate the current state, and then tries to clone and update the product spec for the repo. When fails emits a failed event or a suceed if appropiate.
for learning purposes try to choose problem domain that has more complex rules and logic, where many actions is needed. for example small game (card game,multiplayer quiz game or whatever). or simulate some real world process like school management or some business process.

CQRS and Passing Data

Suppose I have an aggregate containing some data and when it reaches a certain state, I'd like to take all that state and pass it to some outside service. For argument and simplicity's sake, lets just say it is an aggregate that has a list and when all items in that list are checked off, I'd like to send the entire state to some outside service. Now when I'm handling the command for checking off the last item in the list, I'll know that I'm at the end but it doesn't seem correct to send it to the outside system from the processing of the command. So given this scenario what is the recommended approach if the outside system requires all of the state of the aggregate. Should the outside system build its own copy of the data based on the aggregate events or is there some better approach?
Should the outside system build its own copy of the data based on the aggregate events.
Probably not -- it's almost never a good idea to share the responsibility of rehydrating an aggregate from its history. The service that owns the object should be responsible for rehydration.
First key idea to understand is when in the flow the call to the outside service should happen.
First, the domain model processes the command arguments, computing the update to the event history, including the ChecklistCompleted event.
The application takes that history, and saves it to the book of record
The transaction completes successfully.
At this point, the application knows that the operation was successful, but the caller doesn't. So the usual answer is to be thinking of an asynchronous operation that will do the rest of the work.
Possibility one: the application takes the history that it just saved, and uses that history to create schedule a task to rehydrate a read-only copy of the aggregate state, and then send that state to the external service.
Possibility two: you ditch the copy of the history that you have now, and fire off an asynchronous task that has enough information to load its own copy of the history from the book of record.
There are at least three ways that you might do this. First, you could have the command schedule the task as before.
Second, you could have a event handler listening for ChecklistCompleted events in the book of record, and have that handler schedule the task.
Third, you could read the ChecklistCompleted event from the book of record, and publish a representation of that event to a shared bus, and let the handler in the external service call you back for a copy of the state.
I was under the impression that one bounded context should not reach out to get state from another bounded context but rather keep local copies of the data it needed.
From my experience, the key idea is that the services shouldn't block each other -- or more specifically, a call to service B should not block when service A is unavailable. Responding to events is fundamentally non blocking; does it really matter that we respond to an asynchronously delivered event by making an asynchronous blocking call?
What this buys you, however, is independent evolution of the two services - A broadcasts an event, B reacts to the event by calling A and asking for a representation of the aggregate that B understands, A -- being backwards compatible -- delivers the requested representation.
Compare this with requiring a new release of B every time the rehydration logic in A changes.
Udi Dahan raised a challenging idea - the notion that each piece of data belongs to a singe technical authority. "Raw business data" should not be replicated between services.
A service is the technical authority for a specific business capability.
Any piece of data or rule must be owned by only one service.
So in Udi's approach, you'd start to investigate why B has any responsibility for data owned by A, and from there determine how to align that responsibility and the data into a single service. (Part of the trick: the physical view of a service can span process boundaries; in other words, a process may be composed from components that belong to more than one service).
Jeppe Cramon series on microservices is nicely sourced, and touches on many of the points above.
You should never externalise your state. Reporting on that state is a function of the read side, as it produces reports and you'll need that data to call the service. The structure of your state is plastic, and you shouldn't have an external service that relies up that structure otherwise you'll have to update both in lockstep which is a bad thing.
There is a blog that puts forward a strong argument that the process manager is the correct place to put this type of feature (calling an external service), because that's the appropriate place for orchestrating events.

Can I use Time as globally unique event version?

I found time as the best value as event version.
I can merge perfectly independent events of different event sources on different servers whenever needed without being worry about read side event order synchronization. I know which event (from server 1) had happened before the other (from server 2) without the need for global sequential event id generator which makes all read sides to depend on it.
As long as the time is a globally ever sequential event version , different teams in companies can act as distributed event sources or event readers And everyone can always relay on the contract.
The world's simplest notification from a write side to subscribed read sides followed by a query pulling the recent changes from the underlying write side can simplify everything.
Are there any side effects I'm not aware of ?
Time is indeed increasing and you get a deterministic number, however event versioning is not only serves the purpose of preventing conflicts. We always say that when we commit a new event to the event store, we send the new event version there as well and it must match the expected version on the event store side, which must be the previous version plus exactly one. If there will be a thousand or three millions of ticks between two events - I do not really care, this does not give me the information I need. And if I have missed one event on the go is critical to know. So I would not use anything else than incremental counter, with events versioned per aggregate/stream.

Preventing update loops for multiple databases using CDC

We have a number of legacy systems that we're unable to make changes to - however, we want to start taking data changes from these systems and applying them automatically to other systems.
We're thinking of some form of service bus (no specific tech picked yet) sitting in the middle, and a set of bus adapters (one per legacy application) to translate between database specific concepts and general update messages.
One area I've been looking at is using Change Data Capture (CDC) to monitor update activity in the legacy databases, and use that information to construct appropriate messages. However, I have a concern - how best could I, as a consumer of CDC information, distinguish changes applied by the application vs changes applied by the bus adapter on receipt of messages - because otherwise, the first update that gets distributed by the bus will get re-distributed by every receiver when they apply that change to their own system.
If I was implementing "poor mans" CDC - i.e. triggers, then those triggers execute within the context/transaction/connection of the original DML statements - so I could either design them to ignore one particular user (the user applying incoming updates from the bus), or set and detect a session property to similar ignore certain updates.
Any ideas?
If I understand your question correctly, you're trying to define a message routing structure that works with a design you've already selected (using an enterprise service bus) and a message implementation that you can use to flow data off your legacy systems that only forward-ports changes to your newer systems.
The difficulty is you're trying to apply changes in such a way that they don't themselves generate a CDC message from the clients receiving the data bundle from your legacy systems. In fact, all you're concerned about is having your newer systems consume the data and not propagate messages back to your bus, creating unnecessary crosstalk that might exponentiate, overloading your infrastructure.
The secret is how MSSQL's CDC features reconcile changes as they propagate through the network. Specifically, note this caveat:
All the changes are logged in terms of LSN or Log Sequence Number. SQL
distinctly identifies each operation of DML via a Log Sequence Number.
Any committed modifications on any tables are recorded in the
transaction log of the database with a specific LSN provided by SQL
Server. The __$operationcolumn values are: 1 = delete, 2 = insert, 3 =
update (values before update), 4 = update (values after update).
cdc.fn_cdc_get_net_changes_dbo_Employee gives us all the records net
changed falling between the LSN we provide in the function. We have
three records returned by the net_change function; there was a delete,
an insert, and two updates, but on the same record. In case of the
updated record, it simply shows the net changed value after both the
updates are complete.
For getting all the changes, execute
cdc.fn_cdc_get_all_changes_dbo_Employee; there are options either to
pass 'ALL' or 'ALL UPDATE OLD'. The 'ALL' option provides all the
changes, but for updates, it provides the after updated values. Hence
we find two records for updates. We have one record showing the first
update when Jason was updated to Nichole, and one record when Nichole
was updated to EMMA.
While this documentation is somewhat terse and difficult to understand, it appears that changes are logged and reconciled in LSN order. Competing changes should be discarded by this system, allowing your consistency model to work effectively.
Note also:
CDC is by default disabled and must be enabled at the database level
followed by enabling on the table.
Option B then becomes obvious: institute CDC on your legacy systems, then use your service bus to translate these changes into updates that aren't bound to CDC (using, for example, raw transactional update statements). This should allow for the one-way flow of data that you seek from the design of your system.
For additional methods of reconciling changes, consider the concepts raised by this Wikipedia article on "eventual consistency". Best of luck with your internal database messaging system.