Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 4 years ago.
Improve this question
Among the different options of a write-journal to implement Event Sourcing, Kafka seems a very reasonable choice from "outside":
It has a great ecosystem
It is well documented
It naturally supports streaming and listeners
However, looking into Akka persistence, it appears that Kafka journal is supported only through a community contributed package, which has not been modified for the last 2 years. Is Kafka not a good option, are there better options and if it is the best option, how are people using it with akka-persistance?
The problems with using Kafka as the event journal for akka-persistence (lack of atomic writes) are mentioned in this comment, which also lists it as a reason the plugin hasn't been maintained:
https://github.com/krasserm/akka-persistence-kafka/issues/28#issuecomment-138933868
In this thread, however, there is evidence that people are working on forks that work with latest kafka and akka versions:
https://github.com/krasserm/akka-persistence-kafka/issues/20
You should have a look right here
It's a pull request of the fork maintained right here
This version uses kafka 1.0 and the new producer API with transaction. We try to respect the best the akka persistence specification. We keep on with kafka because, for us, it is the best solution for event sourcing.
Related
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 4 months ago.
Improve this question
All of the tutorials suggest, that we should use async communication using something like kafka over direct sync http based communication, when it comes to communication between micro-services.
Can somebody explain why, and how will async communication happen using kafka?
The question is very broad. I would argue that the main aim of using async communication is to separate domain boundaries, but there are other benefits of doing this: Partition failures and the ability to support spikes without bringing down a system.
Imagine a purchase on an online shop:
Payments needs to be processed.
Check for fraudulent operations.
A invoice needs to be created.
A fulfillment order needs to be created in the warehouse.
Purchase information has to be sent to analytics.
Update tailored product suggestions.
And probably a few tens more of things have to happen after an order is placed.
Some of these, the critical path, might need to happen synchronously (e.g. taken a payment) but all of the other ones can happen asyncronously. Kafka is just a message broker (check the docs or the free book Kafka: The Definitive Guide to know how it works)
It's also possible to build a platform that is fully asynchronous (e.g. using request-reply pattern).
An incredible good explanation of using messaging and examples are in the book Enterprise Integration Patterns. It's almost 20 years since it has been published, but everything in there is still current.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 1 year ago.
Improve this question
I would like to transfer data from one db system to any other db systems. which messaging system(Kafka, ActiveMQ, RabbitMQ.....likewise) would be better to achieve this with much throughput and performance.
I guess the answer for this type of questions is "it depends"
You probably could find many information on the internet about comparison between these message brokers,
as far as I can share from our experience and knowledge, Kafka and its ecosystem tools like kafka connect , introducing your requested behavior with source connectors and sink connectors using kafka in the middle,
Kafka connect is a framework which allows adding plugin called connectors
Sink connectors- reads from kafka and send that data to target system
Source connector- read from source store and write to kafka
Using kafka connect is "no code", calling rest api to set configuration of the connectors.
Kafka is distributed system that supports very high throughout with low latency. It supports near real time streaming of data.
Kafka is highly adopted by the biggest companies around the world.
There are many tools and vendors that support your use case, they vary in price and support, it depends from which sources you need to take data and to which targets you wish to write, should it be cdc/near real time or "batch" copy
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 1 year ago.
Improve this question
I'm want to learn about the differences between the two methods. I developed a project so It aggregates some data using Apache Kafka Streams API. And after that, I got on some solutions which are written with KSQL.
I've never got experienced with KSQL so I would like to learn when and which approach should select for aggregate some stuff? Could I use KSQL instead of Kafka Streams?
There's a blog post somewhere that talks about the "Kafka abstraction funnel"
KSQL doesn't provide as much flexibility as Kafka Streams, which in turn, abstracts many details of the core consumer/producer API.
If you have people more familiar with SQL, and not so good at other client libraries, you'd use KSQL. If you run into a feature not supported by KSQL (think, custom, complex data types) or need to embed streaming logic into a larger application without needing to remotely query the KsqlDB rest api, use Kafka Streams
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 4 years ago.
This post was edited and submitted for review 3 months ago and failed to reopen the post:
Original close reason(s) were not resolved
Improve this question
I'm very naive about data engineering but it seems to me that a popular pipeline for data used to be Kafka to Storm to something.... but as I understand it Kafka now seems to have data processing capabilities that may often render Storm unnecessary. So my question is simply, in what scenarios might this be true that Kafka can do it all, and in what scenarios might Storm still be useful?
EDIT:
Question was flagged for "opinion based".
This question tries to understand what capabilities Apache Storm offers that Apache Kafka Streaming does not (now that Kafka Streaming exists). The accepted answer touches on that. No opinions are requested by this question nor are they necessary to address the question. Question title edited to seem more objective.
You still need to deploy the Kafka code somewhere, e.g. YARN if using Storm.
Plus, Kafka Streams can only process between the same Kafka cluster; Storm has other spouts and bolts. But Kafka Connect is one alternative to that.
Kafka has no external dependency of a cluster scheduler, and while you may deploy Kafka clients in almost any popular programming language, it still requires external instrumentation, whether that's a Docker container or deployed on bare-metal.
If anything, I'd say Heron or Flink are true comparative replacements for Storm
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 1 year ago.
Improve this question
Apache Pulsar (by Yahoo) seems to be the next generation of Apache Kafka.
Apache RocketMQ (by Alibaba) seems to be the next generation of Apache ActiveMQ.
Both are open source distributed messaging and streaming data platforms.
But how do they compare? When should I prefer one over another in terms of features and performance?
Is Pulsar (like Kafka) strictly better at streaming, and RocketMQ (like ActiveMQ) strictly better at messaging?
Looks like you answer your own question.
To be fair, the main advantages of Pulsar against RocketMQ are:
Pulsar is oriented to topics and multi-topic.
RocketMQ is more interesting in batch and keeps the index of the messages.
RocketMQ you still need an adaptor to keep up with the backwards, Pulsar in the other hand comes built-in.
RabbitMQ is push model and RocketMQ is pulling model since has zero-loss tolerance.
Pulsar offers message priority and RocketMQ since it's a queue doesn't support that.