Which Messaging System to be used? [closed] - apache-kafka

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 1 year ago.
Improve this question
I would like to transfer data from one db system to any other db systems. which messaging system(Kafka, ActiveMQ, RabbitMQ.....likewise) would be better to achieve this with much throughput and performance.

I guess the answer for this type of questions is "it depends"
You probably could find many information on the internet about comparison between these message brokers,
as far as I can share from our experience and knowledge, Kafka and its ecosystem tools like kafka connect , introducing your requested behavior with source connectors and sink connectors using kafka in the middle,
Kafka connect is a framework which allows adding plugin called connectors
Sink connectors- reads from kafka and send that data to target system
Source connector- read from source store and write to kafka
Using kafka connect is "no code", calling rest api to set configuration of the connectors.
Kafka is distributed system that supports very high throughout with low latency. It supports near real time streaming of data.
Kafka is highly adopted by the biggest companies around the world.
There are many tools and vendors that support your use case, they vary in price and support, it depends from which sources you need to take data and to which targets you wish to write, should it be cdc/near real time or "batch" copy

Related

Apache Kafka Streams API vs KSQL [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 1 year ago.
Improve this question
I'm want to learn about the differences between the two methods. I developed a project so It aggregates some data using Apache Kafka Streams API. And after that, I got on some solutions which are written with KSQL.
I've never got experienced with KSQL so I would like to learn when and which approach should select for aggregate some stuff? Could I use KSQL instead of Kafka Streams?
There's a blog post somewhere that talks about the "Kafka abstraction funnel"
KSQL doesn't provide as much flexibility as Kafka Streams, which in turn, abstracts many details of the core consumer/producer API.
If you have people more familiar with SQL, and not so good at other client libraries, you'd use KSQL. If you run into a feature not supported by KSQL (think, custom, complex data types) or need to embed streaming logic into a larger application without needing to remotely query the KsqlDB rest api, use Kafka Streams

Pitfalls of kafka tiered storage [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
I have a question regarding the tiered storage feature in Kafka. I like this feature since it means in my case that I can use Kafka as infinite storage (with gcs backend for example). However, let's suppose for whatever reason the Kafka cluster got deleted and Kafka data is lost.
Is data in gcs/s3 store still useful?
I mean can I plug the old logs to a new Kafka cluster or is it totally useless now (terabytes of logs)?
BTW I know I can analyse the segments in the gcs/S3 store and extract data. but that's a bit hacky that's why I m trying to see if I can find a clean solution.
As of right now, if the cluster or specifically the topic that has tiered storage enabled gets deleted, the data in GCS/S3 will not be "reloaded" if you connect it to another cluster.
If you want to keep the data that's in GCS/S3, you will need to stream the data to a new topic that does not have tiered storage enabled or use kafka connect to independently write the data to a usable format before deleting it.
We do plan on improving this use case in the future.

Apache Pulsar vs. Apache RocketMQ [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 1 year ago.
Improve this question
Apache Pulsar (by Yahoo) seems to be the next generation of Apache Kafka.
Apache RocketMQ (by Alibaba) seems to be the next generation of Apache ActiveMQ.
Both are open source distributed messaging and streaming data platforms.
But how do they compare? When should I prefer one over another in terms of features and performance?
Is Pulsar (like Kafka) strictly better at streaming, and RocketMQ (like ActiveMQ) strictly better at messaging?
Looks like you answer your own question.
To be fair, the main advantages of Pulsar against RocketMQ are:
Pulsar is oriented to topics and multi-topic.
RocketMQ is more interesting in batch and keeps the index of the messages.
RocketMQ you still need an adaptor to keep up with the backwards, Pulsar in the other hand comes built-in.
RabbitMQ is push model and RocketMQ is pulling model since has zero-loss tolerance.
Pulsar offers message priority and RocketMQ since it's a queue doesn't support that.

Kafka as an Akka-persistence journal [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 4 years ago.
Improve this question
Among the different options of a write-journal to implement Event Sourcing, Kafka seems a very reasonable choice from "outside":
It has a great ecosystem
It is well documented
It naturally supports streaming and listeners
However, looking into Akka persistence, it appears that Kafka journal is supported only through a community contributed package, which has not been modified for the last 2 years. Is Kafka not a good option, are there better options and if it is the best option, how are people using it with akka-persistance?
The problems with using Kafka as the event journal for akka-persistence (lack of atomic writes) are mentioned in this comment, which also lists it as a reason the plugin hasn't been maintained:
https://github.com/krasserm/akka-persistence-kafka/issues/28#issuecomment-138933868
In this thread, however, there is evidence that people are working on forks that work with latest kafka and akka versions:
https://github.com/krasserm/akka-persistence-kafka/issues/20
You should have a look right here
It's a pull request of the fork maintained right here
This version uses kafka 1.0 and the new producer API with transaction. We try to respect the best the akka persistence specification. We keep on with kafka because, for us, it is the best solution for event sourcing.

IoT Streaming Architecture [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 6 years ago.
Improve this question
I just started learning about IoT and data streaming. Apologies if this question seems too obvious or generic.
I am working on a school project, which involves streaming data from hundreds (maybe thousands) of Iot sensors, storing said data on a database, then retrieving that data for display on a web-based UI.
Things to note are:
fault-tolerance and the ability to accept incomplete data entries
the database has to have the ability to load and query data by stream
I've looked around on Google for some ideas on how to build an architecture that can support these requirements. Here's what I have in mind:
Sensor data is collected by FluentD and converted into a stream
Apache Spark manages a cluster of MongoDB servers
a. the MongoDB servers are connected to the same storage
b. Spark will handle fault-tolerance and load balancing between MongoDB servers
BigQuery will be used for handling queries from UI/web application.
My current idea of a IoT streaming architecture :
The question now is whether this architecture is feasible, or whether it would work at all. I'm open to any ideas and suggestions.
Thanks in advance!
Note that you could stream your device data directly into BigQuery and avoid an intermediate buffering step.
See:
https://cloud.google.com/bigquery/streaming-data-into-bigquery