Publish to Apache Kafka topic from Angular front end - apache-kafka

I need to create a solution that receives events from web/desktop application that runs on kiosks. There are hundreds of kiosks spread across the country and each one generate time to time automatic events and events when something happens.
Despite this application is a locked desktop application it is built in Angular v8. I mean, it runs in a webview.
I was researching for scalable but reliable solutions and found Apache Kafka seems to be a great solution. I know there are clients for NodeJS but couldn't find any option for Angular. Angular runs on browser, for this reason, it must communicate to backend through HTTP/S.
In the end, I realized the best way to send events from Angular is to create a API that just gets message from a HTTP/S endpoint and publishes to Kafka topic. Or, is there any adapter for Kafka that exposes topics as REST?
I suppose this approach is way faster than store message in database. Is this statement correct?
Thanks in advance.

this approach is way faster than store message in database. Is this statement correct?
It can be slower. Kafka is asynchronous, so don't expect to get a response in the same time-period you could perform a database read/write. (Again, would require some API, and also, largely depends on the database used)
is there any adapter for Kafka that exposes topics as REST?
Yes, the Confluent REST Proxy is an Apache2 licensed product.
There is also a project divolte/divolte-collector for collecting click-data and other browser-driven events.
Otherwise, as you've discovered, create your own API in any language you are comfortable with, and have it use a Kafka producer client.

Related

Should one service take care of both processing Kafka messages and API calls simultaneously?

We want to subscribe to a Kafka topic with a microservice. So that the service does not have to accept API calls from the surrounding systems and Kafka messages at the same time, we would like to interpose an 'importer service'. However, in the end we would have the same problem again because the importer service now has to process both the Kafka messages and the API calls from the aforementioned microservice. As a solution to this problem, we considered giving both services access to the same database. The importer service could then receive the Kafka message, process it and write it to the database. The original microservice would then not go to the importer service, but would get the data directly from the DB. However, the approach seems a bit dirty, since you shouldn't share databases between services. Do you have any ideas how to solve this more elegantly? And if there isn't a better approach, should one service really take care of processing Kafka messages and API calls simultaneously?
One service should not process both kafka and api messages.
You can make a service that wraps the database and both services will communicate with it.

Kafka producer code will be handled by which team when an event is generated

I have a basic knowledge of kafka topic/producer/consumer and broker.
I would like to understand how this works in realworld.
For example Consider below usecase .
User interacts with Web application
when user clicks on something an event is generated
So there will be one kafka producer running which writes messages to atopic when an event is generated
Then Consumer(for Ex: spark application reads from topic and process the data)
Whose responsibility it is to take care of the producer code? A frond end java/Web developer's? Because web developers are familiar with events and tomcat server logs.
Can anyone explain interms of developer and responsibility of taking care of each section.
In a "standard" scenario, following people/roles are involved:
Infrastructure Dev: Setup Kafka Instance (f.e. openshift/strimzi)
manage topics, users
Frontend Dev: Creating the frontend (f.e. react)
Backend Dev: Implementing backendsystem (f.e. asp .net core)
handle DB Connections, logging, monitoring, IAM, business logic, handling Events, Produce kafka Events, ...)
App Dev anyone writing or managing the "other apps" (f.e.spark application). Consumes (commit) the kafka Events
Since there are plenty implementations of the producer/consumer kafka API it's kind of language agnostic, (see some libs). But you are right the dev implementing the features regarding kafka should at least be familiar with pub-sub.
Be aware we are talking about roles, so there are not necessarily four people involved, it could also just be one person doing the whole job. Also this is just a generic real world scenario and can be completely different in your specific usecase/environment.

Solution architecture Kafka

I am working with a third party vendor who I asked to provide me the events generated by a website
The vendor proposed to stream the events using Kafka ... why not...
On my side (the client) I am running a 100% MSSQL/Windows production environment and internal business want to have kpi and dashboard on website activities
Now the question - what would be the architecture to support a PoC so I can manage the inputs on one hand and create datamarts to deliver business needs?
Not clear what you mean by "events from website". Your Kafka producers are typically server side components, as you make API requests, you'd put Kafka producing events between those requests and your databases calls. I would be surprised if any third-party would just be able to do that immediately
Maybe you're looking for something like https://divolte.io/
You can also use CDC products to stream events out of your database
The architecture could be like this. The app streams event to Kafka. You can write a service to read the data from Kafka, do transformation and write to Database. You can then build Dashboard on top of DB.
Alternatively, you can populate indexes in Elastic Search and build Kibana dashboard as well.
My suggestion would be to use Lambda architecture to cater both Real-time and Batch processing needs:
Architecture:
Lambda architecture is designed to handle massive quantities of data by taking advantage of both batch and stream-processing methods.
This architecture attempts to balance latency, throughput, and fault-tolerance by using batch processing to provide comprehensive and accurate views of batch data, while simultaneously using real-time stream processing to provide views of online data.
Another Solution:

Kafka user - project design advise

I am new to Kafka and data streaming and need some advice for the following requirement,
Our system is expecting close to 1 million incoming messages per day. The message carries a project identifier. The message should be pushed to users of only that project. For our case, lets say we have projects A, B and C. Users who opens project A's dashboard only sees / receives messages of project A.
This is my idea so far on implementing solution for the requirement,
The messages should be pushed to a Kafka Topic as they arrive, lets call this topic as Root Topic. The messages once pushed to the Root Topic, can be read by a Kafka Consumer/Listener and based on the project identifier in the message can push that message to a project specific Topic. So any message can end up at Topic A or B or C. Thinking of using websockets to update the message as they arrive on the project users' dashboards. There will be N Consumers/Listeners for the N project Topics. These consumers will push the project specific message to the project specifc websocket endpoints.
Please advise if I can make any improvements to the above design.
Chose Kafka as the messaging system here as it is highly scalable and fault tolerant.
There is no complex transformation or data enrichment before it gets sent to the client. Will it makes sense to use Apache Flink or Hazelcast Jet for the streaming or Kafka streaming is good enough for this simple requirement.
Also, when should I consider using Hazelcast Jet or Apache Flink in my project.
Should i use Flink say when I have to update few properties in the message based on a web service call or database lookup before sending it to the users?
Should I use Hazelcast Jet only when I need the entire dataset in memory to arrive at a property value? or will using Jet bring some benefits even for my simple use case specified above. Please advise.
Kafka Streams are a great tool to convert one Kafka topic to another Kafka topic.
What you need is a tool to move data from a Kafka topic to another system via web sockets.
Stream processor gives you a convenient tooling to build this data pipeline (among others connectors to Kafka and web sockets and scalable, fault-tolerant execution environment). So you might want use stream processor even if you don't transform the data.
The benefit of Hazelcast Jet is it's embedded scalable caching layer. You might want to cache your database/web service calls so that the enrichment is performed locally, reducing remote service calls.
See how to use Jet to read from Kafka and how to write data to a TCP socket (not websocket).
I would like to give you another option. I'm not Spark/Jet expert at all, but I've studying them for a few weeks.
I would use Pentaho Data Integration(kettle) to consume from the Kafka and I would write a kettle step (or User Defined Java Class step) to write the messages to a Hazelcast IMAP.
Then, would use this approach http://www.c2b2.co.uk/middleware-blog/hazelcast-websockets.php to provided the Websockets for the end-users.

Is a message queue like RabbitMQ the ideal solution for this application?

I have been working on a project that is basically an e-commerce. It's a multi tenant application in which every client has its own domain and the website adjusts itself based on the clients' configuration.
If the client already has a software that manages his inventory like an ERP, I would need a medium on which, when the e-commerce generates an order, external applications like the ERP can be notified that this has happened to take actions in response. It would be like raising events over different applications.
I thought about storing these events in a database and having the client make requests in a short interval to fetch the data, but something about polling and using a REST Api for this seems hackish.
Then I thought about using Websockets, but if the client is offline for some reason when the event is generated, the delivery cannot be assured.
Then I encountered Message Queues, RabbitMQ to be specific. With a message queue, modeling the problem in a simplistic manner, the e-commerce would produce events on one end and push them to a queue that a clients worker would be processing as events arrive.
I don't know what is the best approach, to be honest, and would love some of you experienced developers give me a hand with this.
I do agree with Steve, using a message queue in your situation is ideal. Message queueing allows web servers to respond to requests quickly, instead of being forced to perform resource-heavy procedures on the spot. You can put your events to the queue and let the consumer/worker handle the request when the consumer has time to handle the request.
I recommend CloudAMQP for RabbitMQ, it's easy to try out and you can get started quickly. CloudAMQP is a hosted RabbitMQ service in the cloud. I also recommend this RabbitMQ guide: https://www.cloudamqp.com/blog/2015-05-18-part1-rabbitmq-for-beginners-what-is-rabbitmq.html
Your idea of using a message queue is a good one, better than database or websockets for the reasons you describe. With the message queue (RabbitMQ, or another server/broker based system such as Apache Qpid) approach you should consider putting a broker in a "DMZ" sort of network location so that your internal ecommerce system can push events out to it, and your external clients can reach into without risking direct access to your core business systems. You could also run a separate broker per client.