Kafka producer code will be handled by which team when an event is generated - apache-kafka

I have a basic knowledge of kafka topic/producer/consumer and broker.
I would like to understand how this works in realworld.
For example Consider below usecase .
User interacts with Web application
when user clicks on something an event is generated
So there will be one kafka producer running which writes messages to atopic when an event is generated
Then Consumer(for Ex: spark application reads from topic and process the data)
Whose responsibility it is to take care of the producer code? A frond end java/Web developer's? Because web developers are familiar with events and tomcat server logs.
Can anyone explain interms of developer and responsibility of taking care of each section.

In a "standard" scenario, following people/roles are involved:
Infrastructure Dev: Setup Kafka Instance (f.e. openshift/strimzi)
manage topics, users
Frontend Dev: Creating the frontend (f.e. react)
Backend Dev: Implementing backendsystem (f.e. asp .net core)
handle DB Connections, logging, monitoring, IAM, business logic, handling Events, Produce kafka Events, ...)
App Dev anyone writing or managing the "other apps" (f.e.spark application). Consumes (commit) the kafka Events
Since there are plenty implementations of the producer/consumer kafka API it's kind of language agnostic, (see some libs). But you are right the dev implementing the features regarding kafka should at least be familiar with pub-sub.
Be aware we are talking about roles, so there are not necessarily four people involved, it could also just be one person doing the whole job. Also this is just a generic real world scenario and can be completely different in your specific usecase/environment.

Related

Which messaging system for a web dashboard?

I would like to make a Web Dashboard system and I am facing a problem. I need to get an information that is in the cache of one of the instances of my program, for this I had thought of doing Pub/Sub with Kafka however I don't know how to do to Publish and get a response from one of my Subscriber. Do you know a pattern that allows this and a service that allows me to do this?
EDIT: I would like to design an infrastructure that follows this pattern:
Attached diagram is showing simple request->response flow, Kafka is designed for different types of architecture, so IMHO you should not focus on Kafka in this case.
However, if you still want to use Kafka for some other reasons I can suggest to you two options:
Stick with request->response flow and use ReplyingKafkaTemplate or AggregatingKafkaTemplate to handle it, second one is an extension of first one, this adds functionality to handle more responses then one. You can send a request to Kafka topic from the Dashboard application, then poll the message by one of the Bot instances, next, send reply to reply topic, and then process reply in Dashboard application.
Use Kafka to implement Event-Carried State Transfer pattern, move state (mutual guilds data) from Bot Instances directly to Dashboard application via Kafka topic. You can use several tools to implement this:
Bot applications send events to Kafka topic via simple KafkaProducer or KafkaTemplate, then use one of the Kafka Connect sink connectors to save data in Dashboards database.
Bot applications send events to Kafka topic via simple KafkaProducer or KafkaTemplate. Run Kafka Streams thread in Dashboard application and build a state using Kafka Streams functionalities - grouping, aggregating etc. Then read the state directly from Kafka Streams internal RocksDB database.

Publish to Apache Kafka topic from Angular front end

I need to create a solution that receives events from web/desktop application that runs on kiosks. There are hundreds of kiosks spread across the country and each one generate time to time automatic events and events when something happens.
Despite this application is a locked desktop application it is built in Angular v8. I mean, it runs in a webview.
I was researching for scalable but reliable solutions and found Apache Kafka seems to be a great solution. I know there are clients for NodeJS but couldn't find any option for Angular. Angular runs on browser, for this reason, it must communicate to backend through HTTP/S.
In the end, I realized the best way to send events from Angular is to create a API that just gets message from a HTTP/S endpoint and publishes to Kafka topic. Or, is there any adapter for Kafka that exposes topics as REST?
I suppose this approach is way faster than store message in database. Is this statement correct?
Thanks in advance.
this approach is way faster than store message in database. Is this statement correct?
It can be slower. Kafka is asynchronous, so don't expect to get a response in the same time-period you could perform a database read/write. (Again, would require some API, and also, largely depends on the database used)
is there any adapter for Kafka that exposes topics as REST?
Yes, the Confluent REST Proxy is an Apache2 licensed product.
There is also a project divolte/divolte-collector for collecting click-data and other browser-driven events.
Otherwise, as you've discovered, create your own API in any language you are comfortable with, and have it use a Kafka producer client.

architecture pattern for microservices

I have a microservices architecture whose logs have to be sent to a remote Kafka topic.
Next to it, the consumer of this topic will send the logs to an ELK stack (an other team)
I want to have a dedicated microservice (fwk-proxy-elasticsearch) whose responsability is to collec the logs from the others one and send them to the remote kafka topic.
what's the best protocol to dispatch all the logs aggregated from my microservices to the fwk-proxy-elasticsearch microservice ?
I want this pattern to not duplicate the security configuration of the remote kafka topic. I want to centralize it in a single place.
May I use vertx event bus for that ? or kafka is beter ? or someother tool ?
May I use vertx to send message from jvm to jvm ?
Moreover, in a microservice architecture, is it a good pattern to centralize a use case in a dedicated microservice? (remote http connection for example)
On my point of view, it allows business microservices to focus on a business issue and not to worry over the protocol that the result has to be sent.
Thanks!
I believe you can use both Vert.x event bus and Kafka to propagate the logs, there are pros and cons on each approach.
While I understand the reasoning behind this decision, I would still consider a dedicated solution built for this purpose, like Fluentd, which is able to aggregate the logs and push them into multiple sources (including Kafka, via the dedicated plugin). I'm sure there are other similar solutions.
There are a couple of important benefits that I see if you use a dedicated solution, instead of building it yourself:
The level of configurability, which is definitely useful in the future (in a dedicated solution, you need to write code each time you want to build something new)
The number of destinations where you can export the logs
Support for a hybrid architecture - with a few config updates, you will be able to grab logs from non-JVM microservices

What alternatives for event sourcing except Apache Kafka?

I really like idea of event sourcing. The main advantage for me is this:
If you build microservices than with event sourcing it becomes very easy to communicate. Your components are decoupled, all they need to do is to know where is event store.
What is the simplest event store do you know? I just want to store events that occurs in my application and let other components to see these for new events to come.
I'm using scala
I had experience with apache kafka, there are many libraries for reading kafka topics (for eg. akka kafka stream)
Apache kafka is a cluster system. It's hard to deploy, setup, this is the hardest part for me. I want to build application and work with services logic, not setting up kafka cluster. I heard about vertx and it's event bus, but i didn't tried it yet
Event sourcing is not about the tool, but about the design. You can do event sourcing even with MySQL.
However on the tooling side, you may check:
Lagom - it is superseding Akka I think, from the same teams, but seems to be easier.
EventStore - Simple event store from Greg Young
Event Sourcing is not about communicating between services but rather about storing data as an immutable log (typically within a service). For this use case Kafka is not a particularly good option. Read this post about some of the reasons why.
However, Kafka can be paired with an Event Sourcing solution to provide distribution of events to other services.

Kafka user - project design advise

I am new to Kafka and data streaming and need some advice for the following requirement,
Our system is expecting close to 1 million incoming messages per day. The message carries a project identifier. The message should be pushed to users of only that project. For our case, lets say we have projects A, B and C. Users who opens project A's dashboard only sees / receives messages of project A.
This is my idea so far on implementing solution for the requirement,
The messages should be pushed to a Kafka Topic as they arrive, lets call this topic as Root Topic. The messages once pushed to the Root Topic, can be read by a Kafka Consumer/Listener and based on the project identifier in the message can push that message to a project specific Topic. So any message can end up at Topic A or B or C. Thinking of using websockets to update the message as they arrive on the project users' dashboards. There will be N Consumers/Listeners for the N project Topics. These consumers will push the project specific message to the project specifc websocket endpoints.
Please advise if I can make any improvements to the above design.
Chose Kafka as the messaging system here as it is highly scalable and fault tolerant.
There is no complex transformation or data enrichment before it gets sent to the client. Will it makes sense to use Apache Flink or Hazelcast Jet for the streaming or Kafka streaming is good enough for this simple requirement.
Also, when should I consider using Hazelcast Jet or Apache Flink in my project.
Should i use Flink say when I have to update few properties in the message based on a web service call or database lookup before sending it to the users?
Should I use Hazelcast Jet only when I need the entire dataset in memory to arrive at a property value? or will using Jet bring some benefits even for my simple use case specified above. Please advise.
Kafka Streams are a great tool to convert one Kafka topic to another Kafka topic.
What you need is a tool to move data from a Kafka topic to another system via web sockets.
Stream processor gives you a convenient tooling to build this data pipeline (among others connectors to Kafka and web sockets and scalable, fault-tolerant execution environment). So you might want use stream processor even if you don't transform the data.
The benefit of Hazelcast Jet is it's embedded scalable caching layer. You might want to cache your database/web service calls so that the enrichment is performed locally, reducing remote service calls.
See how to use Jet to read from Kafka and how to write data to a TCP socket (not websocket).
I would like to give you another option. I'm not Spark/Jet expert at all, but I've studying them for a few weeks.
I would use Pentaho Data Integration(kettle) to consume from the Kafka and I would write a kettle step (or User Defined Java Class step) to write the messages to a Hazelcast IMAP.
Then, would use this approach http://www.c2b2.co.uk/middleware-blog/hazelcast-websockets.php to provided the Websockets for the end-users.