I try to realize what is benefit of using Kafka stream in my business model. The customers publish an order and instantly gets offers from sellers who are online and intrested in this order.
In this case the streams are fit to join available sellers(online) to order stream and filter, sorting (by price) of offers. So as result the customer should give the best offers by price by request.
I discovered only one benefit: it is less of server calls(all calculations happends in stream).
My question is, why streams are matter in this case? Because I can make these business steps using the standard approach with the one monolithic application?
I know this question is opinion based, but after reading some books about stream processing it is still to hard change the mind on this approach.
only one benefit: it is less of server calls
Kafka Streams can still do "server calls", especially when using Interactive Queries with an RPC layer. Fetching data from a remote table, such as KSQLdb, is also a "server call".
This is not the only benefit. Have you tried to write a join between topics using regular consumer API? Or a filter/map in less than 2 lines of code (outside the config setup)?
can make these business steps using the standard approach with the one monolithic application?
A Streams topology can still be emdedded within a monolith, so I don't understand your point here. I assume you mean a fully synchronous application with a traditional database + API layer?
The books you say you've read should go over most benefits of stream processing, but you might want to checkout "Kafka Streams in Action" to specifically get the advantages of that
Related
It is often said that Kafka works well with domain driven designs.
Why is it then that Kafka blog posts mostly talk about CQRS or similar - suggesting seperate input and output topics?
It seems like a topic could be about a thing. Why should services 'talk' about that same thing spread out by an implementation detail of who/what is talking?
Isn't this a lot of overhead to protect services from peers that have issues causing them to spam the topic?
I'm hoping for responses that offer pros/cons - why people might think a given thing. Not opinions about the 'right' answer. If this is a better fit for a different SO, I'd appreciate being pointed the right direction.
To summarize
CQRS is a pattern you use within a service (Bounded context), that allows you to implement the command and query sides in separate ways. That is what CQRS is all about.
Event sourcing is an optional pattern that you often see together with CQRS, but it is not a requirement.
I see Kafka as the backbone between services, as the record of truth between services, like how the picture below:
In this case, you use Kafka to pass notifications of what happens in the different services. (Event-driven architecture). But I would not try to use Kafka for request/response communication even though it is possible.
When you aim for a request/response pattern, you typically want a synchronous response, like if the user sends a command to the system. But in general, to reduce coupling you should not send commands between services, better to publish events and let other services react to those events.
I am working with a third party vendor who I asked to provide me the events generated by a website
The vendor proposed to stream the events using Kafka ... why not...
On my side (the client) I am running a 100% MSSQL/Windows production environment and internal business want to have kpi and dashboard on website activities
Now the question - what would be the architecture to support a PoC so I can manage the inputs on one hand and create datamarts to deliver business needs?
Not clear what you mean by "events from website". Your Kafka producers are typically server side components, as you make API requests, you'd put Kafka producing events between those requests and your databases calls. I would be surprised if any third-party would just be able to do that immediately
Maybe you're looking for something like https://divolte.io/
You can also use CDC products to stream events out of your database
The architecture could be like this. The app streams event to Kafka. You can write a service to read the data from Kafka, do transformation and write to Database. You can then build Dashboard on top of DB.
Alternatively, you can populate indexes in Elastic Search and build Kibana dashboard as well.
My suggestion would be to use Lambda architecture to cater both Real-time and Batch processing needs:
Architecture:
Lambda architecture is designed to handle massive quantities of data by taking advantage of both batch and stream-processing methods.
This architecture attempts to balance latency, throughput, and fault-tolerance by using batch processing to provide comprehensive and accurate views of batch data, while simultaneously using real-time stream processing to provide views of online data.
Another Solution:
I am hoping to clarify a few ideas on Kafka Streams from an architectural standpoint.
I understand the stream processing and data enrichment uses, and that the data can be reused by other applications if pushed back into Kafka, but what is the correct implementation of a Streams Application?
My initial thoughts would be to create an application that pulls in a table, joins it to a stream, and then fires off an event for each entry rather than pushing it back into Kafka. If multiple services use this data, then each would materialize their own table, right?
And I haven't implemented a test application yet, which may answer some of these questions, but I think is a good place for planning. Basically, where should the event be triggered, in the streaming app or in a separate consumer app?
My initial thoughts would be to create an application that pulls in a table, joins it to a stream, and then fires off an event for each entry rather than pushing it back into Kafka.
In an event-driven architecture, where would the application send the events to (and how), if you think that Kafka topics shouldn't be the destination for sharing the events with other apps? Do you have other preferences?
If multiple services use this data, then each would materialize their own table, right?
Yes, that is one option.
Another option is to use the interactive queries feature in KStreams (aka queryable state), which allows your first application to expose its tables and state stores to other applications directly (e.g., via a REST API). Other apps would then not need to materialize their own tables. However, an architectural downside is that you have now a direct coupling between your first app and any other downstream applications through request-response communication. While this pattern of direct inter-service communication is popular for a microservices architecture, a compelling alternative is to not use direct communication but instead let microservices/apps communicate indirectly with each other via Kafka (i.e., to use the previous option).
Basically, where should the event be triggered, in the streaming app or in a separate consumer app?
This is a matter of preference, see above. To inform your thinking you may want to read the 4-part mini series about event-driven architectures with Kafka: https://www.confluent.io/blog/journey-to-event-driven-part-1-why-event-first-thinking-changes-everything (disclaimer: this blog series was written by a colleague of mine).
we currently have a library which we use to interact with kafka. but we planning to develop this library into a separate application. Other applications will send kafka messages using rest endpoint. Planning to use vert.x in this application to make it non-blocking and fast. Is it a good strategy. My concern 1) http will make it slower compared to TCP of kafka 2) streaming may not be possible 3) single point of failure
But being separate application - release management, control and support will be lot easier than currently.
Is it good strategy and has someone done like this before? Any suggestions?
Your consideration for going with HTTP/ TCP will depend on the number of applications that will be talking to your service. Let's say there is an IOT device that is sending lots of messages continuously, then using HTTP will be expensive and it will increase latency. Since HTTP connection establishment is an expensive operation.
Now, consider the case where you have a transactional system that is sending transaction events as they commit to your database then the rate of messages will be lower I assume, then it makes sense to use HTTP there.
It will depend on the rate of messages that your service will receive, that will decide the way you want to take.
Now, for your current approach of maintaining a library, it is a good way to maintain consistency across the organisation as long as the library is maintained and users of your library constantly update as and when you make changes to your library. It also has the advantage of not maintaining separate infrastructure/servers since your code will run in your users' application.
What would be best practice if you have an event-driven architecture and a service subscribing to events has to wait for multiple event (of the same kind) before proceeding with creating the next event in the chain?
An example would be a book order handling service that has to wait for each book in the order to have been handled by the warehouse before creating the event that the order has been picked so that the shipping service (or something similar) picks up the order and starts preparing for shipping.
Another useful pattern beside the Aggregator that Tom mentioned above is a saga pattern (a mini workflow).
I've used it before with messaging library called NServiceBus to handle coordinating multiple messages that are correlated to each other.
the pattern is very useful and fits nicely for long-running processes. even if your correlated messages are different messages, like OrderStarted, OrderLineProcessed, OrderCompleted.
You can use the Aggregator pattern, also called Parallel Convoy.
Essentially you need to have some way of identifying messages which need to be aggregated, and when the aggregated set as a whole has been recieved, so that processing can start.
Without going out and buying the book*, the Apache Camel integration platform website has some nice resource on implementing the aggregator pattern. While this is obviously specific to Camel, you can see what kind of things are involved.
* disclaimer, I am not affiliated in any way with Adison Wesley, or any of the authors of the book...