How can I assure consistency when using an event-carried state transfer approach in Kafka

How can I assure consistency when using an event-carried state transfer approach in Kafka - apache-kafka

Let's suppose a simplified scenario like this:
There are two Kafka topics, users and orders and three microservices user-service, order-service and shipping-service.
When an order is placed through the order service, an OrderCreated event is added to the orders topic and listened by the shipping service. This service needs to get the user information to send the order. According to my requirements I can't make a REST call to user-service but use a stateful approach. That is to say, the shipping service is a Kafka Streams application that listens to the users topic, having a KTable backed by a local store with the full user table information. Thus, when processing the order it already has the user information available locally.
However, one concern of this approach is the consistency of the local user information in the shipping service, e.g:
A user updates its shipping address in the user-service, it updates its local SQL database and publishes an event in the user topic with this change.
The user places an order, so order-service publishes it in the order topic.
For whatever reason shipping service could process the OrderCreated event from order topic before reading the UserUpdated information from the user topic so it would use an address which is not valid anymore.
How could I guarantee that the shipping service always has an updated user information in this event-carried state transfer scenario?

If you need ordering guarantees, you would need to write both the user information update as well as the order into the same topic (and in particular into the same partition) because Kafka only guarantees order within a single partition.
You could call this topic "user_action" with a unique user-id as key (both an user information update as well as an user order is an user action). In your case, all three services would consume the "user_action" topic. While the user service only considers user updates and the order service only considers orders, the shipping service considers both.
This blog post might help, too: https://www.confluent.io/blog/put-several-event-types-kafka-topic/

Related

What data goes into a message between distributed applications?

I am trying to implement queues into our microservice architecture, to be specific AWS SNS/SQS.
For example I have this scenarion.
After order is created Orders MS raises OrderCreated event and this event publishes message to AWS OrderCreated SNS. SQS queue InvoiceCreate is subscribed to OrderCreated SNS and will get this message.
Evertyhing makes scence so far. If Invoicing MS is listening to InvoiceCreate queueu and retrieves all new messages - Invoicing MS should create an invoice, but my question is with what data?
a) contact Order MS (to order data relevant for creating invoice). If unable to do so, message will be left in queue until Invoicing MS is able to collect the relevant data
b) message published should contain all the relevant data needed to create an invoice.
If choosing A Invoicing MS will not be decoupled and it will be depending on Order MS, but on the other hand it can collect additional data other then the data packed with original message.
If choosing B, since OrderCreated event and OrderCreated SNS doesnt really know who will use message data ie. OrderCreated could be also used to perform different actions, I am confused how to preciselly decide what data should be stuffed in this message

Our architecture is set up more like your option B. To use your example, the Order service would publish it's OrderCreated event and attach - as a payload - most (or even all) of the Order information in the Payload section of the message. We format message and payload as JSON for compatibility, but you can do whatever.
In some cases, we don't publish all info, just specific fields for the Added/Edited entity - it depends on the service and the sensitivity of the information. So long as you only ever add fields to a message (don't remove any), you are honoring the contract and aren't really tightly coupled to it.
Again, to your example, the InvoiceService could get its information from one or more of several options:
Pull it directly from the OrderCreated message if you include everything needed
Pull what it can from the OrderCreated message, publish an InvoiceStarted event that triggers the OrderService (and/or others) to send it an OrderInvoiceComplete message with the rest of the details it needs
Keep a local copy of whatever key data it needs - populated by subscribing to other events - so that it can combine OrderCreated data with some local data to flesh out an invoice
It's best to avoid the InvoiceService responding to a message by making a call directly back to the OrderService - this is a pretty tight coupling that can be avoided by simply messaging back if you have to.
So, there are lots of options. I personally prefer the technique of putting all data that might be useful into the messages when things are created/updated and letting consuming services decide what to use/ignore. For our scenario, that works well but we have only a few well-contained clients accessing our services so there may be more secure ways to do it that aren't relevant for us.

RabbitMQ Structure For Private Messaging

I am currently looking to buildout a messaging service where users can send and receive messages privately between each other. I may have a need for multi-user chat, but for the most part, I only want single recipients to be able to read messages sent to them.
With looking at RabbitMQ, does it make sense to use one exchange, and create a queue for each user when they login and destroy each queue on logout? Are there major performance issues with creating a queue for each user or are there better alternatives?
I am building a REST API and plan on having users send messages to others through an endpoint (/send) and subscribe to their own message streams via websockets or something similar. I will probably store messages in MongoDB as well, so users can access all of their previous messages. Any suggestions on structure are appreciated.

I think your approach is correct. You event don't need an exchange if you will use the default exchange (AMQP Default). And during login create a new queue and keep queue name same as user name. (Just need to make sure user names are unique) And if you publish message to the default exchange with username (ie: queue name) as routing key, RbbitMQ will route that message to that queue only. And on logout if you delete the queue then user is going to miss the messages when he is not online. If it is OK then create queue after login and use the configuration exclusive which says queue gets deleted when there is no consumer. But if you want to keep offline messages then you need to create queue permanently during user signup.

What are the strategies for Pusher channel structures in social status update applications?

When building a social application it's common to follow other users or topics as an indication of interest in updates by the user or topic. For example, following other users on Twitter, Friending other people on Facebook or liking a product or brand on Facebook.
Pusher has the concept of channels that you subscribe to. Channels are a human readable string that provide a logical identifier to information (e.g. "some-channel-name") and therefore seems to naturally suggest that in a social application any updates on a user or topic should be sent on a channel specific to that item (e.g. "userX-status-updates" or "myBrand-status-updates").
However, this raises concerns about how efficient it is to subscribe to multiple channels if a user is following a high number of other users or topic.
Therefore, what are the appropriate strategies for structuring channels in an social status update style application that uses Pusher?

The first thing to clarify is that you need a mapping of who you are following so for the purposes of this answer I'm going to assume that it's stored in a DB on the server. It also assumed that status updates are triggered as follows:
Client (userX posts status update) -> Your Server (sanitize & validate)
Your Server -> Pusher
Pusher -> Clients (users interested in updates from UserX)
There are two possible solutions to the channel information architecture problem:
Channel Per User Status: A user subscribes to a userX-status-updates channel for all the users that they follow and users trigger update events on their own status update channel.
Users I'm Following Channel: When a user posts a status update you look up who is following that user and publish the update on a users-you-follow-updates channel.
Strategy 1. is the most optimal solution as it keeps interactions with your own infrastructure an Pusher to a minimum.
Here's the detail on these two strategies:
1. Channel Per User Status
The assumption here is that subscribing to channels is costly but that not entirely correct. Channels are simply a way of routing events. However, if you are using authenticated channels (private & presence) you need to authenticate the subscription via your own server. If you use the Pusher WebSocket libraries "out of the box" each subscription will result in a request to your server. So, a user is following 1,000 users that's 1,000 requests to your server.
But, for the pusher-js library there is a multi-auth plugin that can batch the authentication requests into a single call.
There is also a BatchAuthorizer for the Pusher WebSocket Java library, but it's only a sample solution to this scenario.
2. Users I'm Following Channel
Note: although this is an option it's probably only appropriate for smaller numbers of users
In this scenario a user sends their status update to the server, the server performs a lookup of which users are interested in the update and triggers and update even on a channel for each interested user.
For example, give users UserA, UserB and UserC each of those users will subscribe to their own update channel; UserA-followers-updates, UserB-followers-updates, and UserC-followers-updates respectively. If each of these users follows UserZ then when UserZ makes as status update that update is published on each of those channels.
This may also sound inefficient, however it is possible to trigger the same event on 10 channels at a time. So in the above example it would only require one call to the Pusher HTTP API to send the status update to all interested users. More information on multi-channel event publishing here.

Presence information of Publishers in PubSub

Setup:
I have setup a pubsub service wherein the publishers publish geolocation data at regular intervals.
The subscribers receive the location data of the publishers.
The subscribers are not presence subscribed, in the sense, the subscribers are not in the publishers rosters.
Problem:
The subscribers need to know the presence status of publishers.
Is there a way for the subscribers to know the presence status of publishers?

No, since there is no direct relationship between subscribers and publishers, which is typical of any pubsub design. To accomplish this the subscribers would need to know who the publishers are, which is not a good generic pubsub design.
It sounds like what you actually want is PEP (Personal Eventing Protocol), which is a subset of pubsub. In this case, the subscribers are subscribing to nodes belonging to the actual user they are interested in. If they are subscribed to the users presence, they automatically have access to the users nodes.
NOTE: I have recently found out that the newer version of the spec does in fact support an attribute that identifies the publisher. Thus making it feasible to get their presence, but you would still have to subscribe or query for it.

CQRS and email notification

Reading up on CQRS there is a lot of talk of email notification - i'm wondering where to get the data from. Imagine a senario where one user invites other users to an event. To inform a user that he has been invited to an event, he is sent an email.
The concrete steps might go like this:
A CreateEvent command with an associated collection of users to invite, is received by the server.
A new Meeting aggregate is created and a method InviteUser is called for each user that is to be invited.
Each time a user is invited to an event, a domain event UserWasInvitedToEvent is raised.
An email notification sender picks up the domain event and sends out the notification email.
Now my question is this: Where do I go for information to include in the email?
Say I want to include a description of the event as well as the user's name. Since this is CQRS I can't get it thru my domain model; All the properties of the domain objects are private! Should I then query the read side? Or maybe move email notification to a different service entirely?

In CQRS, you're separating the command from the query side. You will always want to go to the query side in order to get data for a given event handler. The write database is going to be a separate database that contains the data necessary for building up your domain objects and will not be optimized for reads, but for writes.
The domain should register and send an EventCreated event to the event handlers / processors. This could be raised from the constructor of the Meeting aggregate.
The event processing component would pick up the EventCreated event, and update the query database with the data contained in the event (ie, the Id of the event and its name).
The domain could register and send a UserWasInvitedToEvent event to the event processors.
The event processors would pick up the UserWasInvitedToEvent and update the query store with any reporting data necessary.
Another event processing component would also pick up the UserWasInvitedToEvent event. This process could have access to the query database and pull back all of the data necessary for sending the email.
The query database is nothing more than a reporting database, so you could even have a specific table that stores all of the data required for the email in one place.
In order to orchestrate several different events into a single handler (assuming the events might be processed in a different order at different times), you could utilize the concept of a Saga in your messaging bus. NServiceBus is an example of a messaging bus that supports Saga's. See this StackOverflow question as well: NServiceBus Delayed Message Processing.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse