Apache Kafka Streams API vs KSQL [closed] - apache-kafka

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 1 year ago.
Improve this question
I'm want to learn about the differences between the two methods. I developed a project so It aggregates some data using Apache Kafka Streams API. And after that, I got on some solutions which are written with KSQL.
I've never got experienced with KSQL so I would like to learn when and which approach should select for aggregate some stuff? Could I use KSQL instead of Kafka Streams?

There's a blog post somewhere that talks about the "Kafka abstraction funnel"
KSQL doesn't provide as much flexibility as Kafka Streams, which in turn, abstracts many details of the core consumer/producer API.
If you have people more familiar with SQL, and not so good at other client libraries, you'd use KSQL. If you run into a feature not supported by KSQL (think, custom, complex data types) or need to embed streaming logic into a larger application without needing to remotely query the KsqlDB rest api, use Kafka Streams

Related

Which Messaging System to be used? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 1 year ago.
Improve this question
I would like to transfer data from one db system to any other db systems. which messaging system(Kafka, ActiveMQ, RabbitMQ.....likewise) would be better to achieve this with much throughput and performance.
I guess the answer for this type of questions is "it depends"
You probably could find many information on the internet about comparison between these message brokers,
as far as I can share from our experience and knowledge, Kafka and its ecosystem tools like kafka connect , introducing your requested behavior with source connectors and sink connectors using kafka in the middle,
Kafka connect is a framework which allows adding plugin called connectors
Sink connectors- reads from kafka and send that data to target system
Source connector- read from source store and write to kafka
Using kafka connect is "no code", calling rest api to set configuration of the connectors.
Kafka is distributed system that supports very high throughout with low latency. It supports near real time streaming of data.
Kafka is highly adopted by the biggest companies around the world.
There are many tools and vendors that support your use case, they vary in price and support, it depends from which sources you need to take data and to which targets you wish to write, should it be cdc/near real time or "batch" copy

What data source can be used as the source for steaming service like Event Hubs and Apache Kafka? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 2 years ago.
Improve this question
From what I read, both event hubs and apache kafka could be used for events steaming, however my question is:
1, ) What sort of data source can be defined as so called 'events' source to use event hubs or apache kafka for?
2, ) In which use case should we use events hubs other than apache kafka, and vice sersa?
Thank you.
Azure Event Hubs is a fully managed event streaming service where as you need to manage an Apache Kafka server. Both products have commonalities and differences, don't really want to list them all here. Here is a good quick read on comparison of both when to choose one over another - https://learn.microsoft.com/en-us/archive/blogs/opensourcemsft/choosing-between-azure-event-hub-and-kafka-what-you-need-to-know
The following scenarios are some of the scenarios where you can use Event Hubs:
Anomaly detection (fraud/outliers)
Application logging
Analytics pipelines, such as clickstreams
Live dashboarding
Transaction processing
User telemetry processing
Device telemetry streaming
I recommend you to start with this doc - https://learn.microsoft.com/en-us/azure/event-hubs/event-hubs-about

What capabilities does Apache Storm offer that are not now covered by Kafka Streaming? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 4 years ago.
This post was edited and submitted for review 3 months ago and failed to reopen the post:
Original close reason(s) were not resolved
Improve this question
I'm very naive about data engineering but it seems to me that a popular pipeline for data used to be Kafka to Storm to something.... but as I understand it Kafka now seems to have data processing capabilities that may often render Storm unnecessary. So my question is simply, in what scenarios might this be true that Kafka can do it all, and in what scenarios might Storm still be useful?
EDIT:
Question was flagged for "opinion based".
This question tries to understand what capabilities Apache Storm offers that Apache Kafka Streaming does not (now that Kafka Streaming exists). The accepted answer touches on that. No opinions are requested by this question nor are they necessary to address the question. Question title edited to seem more objective.
You still need to deploy the Kafka code somewhere, e.g. YARN if using Storm.
Plus, Kafka Streams can only process between the same Kafka cluster; Storm has other spouts and bolts. But Kafka Connect is one alternative to that.
Kafka has no external dependency of a cluster scheduler, and while you may deploy Kafka clients in almost any popular programming language, it still requires external instrumentation, whether that's a Docker container or deployed on bare-metal.
If anything, I'd say Heron or Flink are true comparative replacements for Storm

Apache Pulsar vs. Apache RocketMQ [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 1 year ago.
Improve this question
Apache Pulsar (by Yahoo) seems to be the next generation of Apache Kafka.
Apache RocketMQ (by Alibaba) seems to be the next generation of Apache ActiveMQ.
Both are open source distributed messaging and streaming data platforms.
But how do they compare? When should I prefer one over another in terms of features and performance?
Is Pulsar (like Kafka) strictly better at streaming, and RocketMQ (like ActiveMQ) strictly better at messaging?
Looks like you answer your own question.
To be fair, the main advantages of Pulsar against RocketMQ are:
Pulsar is oriented to topics and multi-topic.
RocketMQ is more interesting in batch and keeps the index of the messages.
RocketMQ you still need an adaptor to keep up with the backwards, Pulsar in the other hand comes built-in.
RabbitMQ is push model and RocketMQ is pulling model since has zero-loss tolerance.
Pulsar offers message priority and RocketMQ since it's a queue doesn't support that.

Kafka as an Akka-persistence journal [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 4 years ago.
Improve this question
Among the different options of a write-journal to implement Event Sourcing, Kafka seems a very reasonable choice from "outside":
It has a great ecosystem
It is well documented
It naturally supports streaming and listeners
However, looking into Akka persistence, it appears that Kafka journal is supported only through a community contributed package, which has not been modified for the last 2 years. Is Kafka not a good option, are there better options and if it is the best option, how are people using it with akka-persistance?
The problems with using Kafka as the event journal for akka-persistence (lack of atomic writes) are mentioned in this comment, which also lists it as a reason the plugin hasn't been maintained:
https://github.com/krasserm/akka-persistence-kafka/issues/28#issuecomment-138933868
In this thread, however, there is evidence that people are working on forks that work with latest kafka and akka versions:
https://github.com/krasserm/akka-persistence-kafka/issues/20
You should have a look right here
It's a pull request of the fork maintained right here
This version uses kafka 1.0 and the new producer API with transaction. We try to respect the best the akka persistence specification. We keep on with kafka because, for us, it is the best solution for event sourcing.