How to implement KAFKA in website activity tracking

How to implement KAFKA in website activity tracking - apache-kafka

I am planning to track all the user activity on my website and kafka seems to be a reliable solution. I am unable to resolve how to send generated events to kafka i.e. how to make my website as a kafka producer.

Which language is your website written in?
In Java/Scala the solution is to import the kafka producer dependencies, create a producer, create messages and send them.
I hope this example will help:
https://cwiki.apache.org/confluence/display/KAFKA/0.8.0+Producer+Example

Related

Which messaging system for a web dashboard?

I would like to make a Web Dashboard system and I am facing a problem. I need to get an information that is in the cache of one of the instances of my program, for this I had thought of doing Pub/Sub with Kafka however I don't know how to do to Publish and get a response from one of my Subscriber. Do you know a pattern that allows this and a service that allows me to do this?
EDIT: I would like to design an infrastructure that follows this pattern:

Attached diagram is showing simple request->response flow, Kafka is designed for different types of architecture, so IMHO you should not focus on Kafka in this case.
However, if you still want to use Kafka for some other reasons I can suggest to you two options:
Stick with request->response flow and use ReplyingKafkaTemplate or AggregatingKafkaTemplate to handle it, second one is an extension of first one, this adds functionality to handle more responses then one. You can send a request to Kafka topic from the Dashboard application, then poll the message by one of the Bot instances, next, send reply to reply topic, and then process reply in Dashboard application.
Use Kafka to implement Event-Carried State Transfer pattern, move state (mutual guilds data) from Bot Instances directly to Dashboard application via Kafka topic. You can use several tools to implement this:
Bot applications send events to Kafka topic via simple KafkaProducer or KafkaTemplate, then use one of the Kafka Connect sink connectors to save data in Dashboards database.
Bot applications send events to Kafka topic via simple KafkaProducer or KafkaTemplate. Run Kafka Streams thread in Dashboard application and build a state using Kafka Streams functionalities - grouping, aggregating etc. Then read the state directly from Kafka Streams internal RocksDB database.

Publish to Apache Kafka topic from Angular front end

I need to create a solution that receives events from web/desktop application that runs on kiosks. There are hundreds of kiosks spread across the country and each one generate time to time automatic events and events when something happens.
Despite this application is a locked desktop application it is built in Angular v8. I mean, it runs in a webview.
I was researching for scalable but reliable solutions and found Apache Kafka seems to be a great solution. I know there are clients for NodeJS but couldn't find any option for Angular. Angular runs on browser, for this reason, it must communicate to backend through HTTP/S.
In the end, I realized the best way to send events from Angular is to create a API that just gets message from a HTTP/S endpoint and publishes to Kafka topic. Or, is there any adapter for Kafka that exposes topics as REST?
I suppose this approach is way faster than store message in database. Is this statement correct?
Thanks in advance.

this approach is way faster than store message in database. Is this statement correct?
It can be slower. Kafka is asynchronous, so don't expect to get a response in the same time-period you could perform a database read/write. (Again, would require some API, and also, largely depends on the database used)
is there any adapter for Kafka that exposes topics as REST?
Yes, the Confluent REST Proxy is an Apache2 licensed product.
There is also a project divolte/divolte-collector for collecting click-data and other browser-driven events.
Otherwise, as you've discovered, create your own API in any language you are comfortable with, and have it use a Kafka producer client.

Process messages pushed through Kafka

I haven't used Kafka before and wanted to know if messages are published through Kafka what are the possible ways to capture that info?
Is Kafka only way to receive that info via "Consumers" or can Rest APIs be also used here?
Haven't used Kafka before and while reading up I did find that Kafka needs ZooKeeper running too.
I don't need to publish info just process data received from Kafka publisher.
Any pointers will help.

Kafka is a distributed streaming platform that allows you to process streams of records in near real-time.
Producers publish records/messages to Topics in the cluster.
Consumers subscribe to Topics and process those messages as they are available.
The Kafka docs are an excellent place to get up to speed on the core concepts: https://kafka.apache.org/intro

Is Kafka only way to receive that info via "Consumers" or can Rest APIs be also used here?
Kafka has its own TCP based protocol, not a native HTTP client (assuming that's what you actually mean by REST)
Consumers are the only way to get and subsequently process data, however plenty of external tooling exists to make it so you don't have to write really any code if you don't want to in order to work on that data

Kafka user - project design advise

I am new to Kafka and data streaming and need some advice for the following requirement,
Our system is expecting close to 1 million incoming messages per day. The message carries a project identifier. The message should be pushed to users of only that project. For our case, lets say we have projects A, B and C. Users who opens project A's dashboard only sees / receives messages of project A.
This is my idea so far on implementing solution for the requirement,
The messages should be pushed to a Kafka Topic as they arrive, lets call this topic as Root Topic. The messages once pushed to the Root Topic, can be read by a Kafka Consumer/Listener and based on the project identifier in the message can push that message to a project specific Topic. So any message can end up at Topic A or B or C. Thinking of using websockets to update the message as they arrive on the project users' dashboards. There will be N Consumers/Listeners for the N project Topics. These consumers will push the project specific message to the project specifc websocket endpoints.
Please advise if I can make any improvements to the above design.
Chose Kafka as the messaging system here as it is highly scalable and fault tolerant.
There is no complex transformation or data enrichment before it gets sent to the client. Will it makes sense to use Apache Flink or Hazelcast Jet for the streaming or Kafka streaming is good enough for this simple requirement.
Also, when should I consider using Hazelcast Jet or Apache Flink in my project.
Should i use Flink say when I have to update few properties in the message based on a web service call or database lookup before sending it to the users?
Should I use Hazelcast Jet only when I need the entire dataset in memory to arrive at a property value? or will using Jet bring some benefits even for my simple use case specified above. Please advise.

Kafka Streams are a great tool to convert one Kafka topic to another Kafka topic.
What you need is a tool to move data from a Kafka topic to another system via web sockets.
Stream processor gives you a convenient tooling to build this data pipeline (among others connectors to Kafka and web sockets and scalable, fault-tolerant execution environment). So you might want use stream processor even if you don't transform the data.
The benefit of Hazelcast Jet is it's embedded scalable caching layer. You might want to cache your database/web service calls so that the enrichment is performed locally, reducing remote service calls.
See how to use Jet to read from Kafka and how to write data to a TCP socket (not websocket).

I would like to give you another option. I'm not Spark/Jet expert at all, but I've studying them for a few weeks.
I would use Pentaho Data Integration(kettle) to consume from the Kafka and I would write a kettle step (or User Defined Java Class step) to write the messages to a Hazelcast IMAP.
Then, would use this approach http://www.c2b2.co.uk/middleware-blog/hazelcast-websockets.php to provided the Websockets for the end-users.

Http Kafka producer

Our application receives events through a HAProxy server on HTTPs, which should be forwarded and stored to Kafka cluster.
What should be the best option for this ?
This layer should receive events from HAProxy & produce them to Kafka cluster, in a reliable and efficient way (and should scale horizontally).
Please suggest.

I'd suggest to write a simple application in Java that just receives events and sends it to Kafka. The Java client for Kafka is the official client thus is the most reliable. The other option is to use an arbitrary language together with the official Kafka REST Proxy.
Every instance of the app should send the messages to all partitions based on some partition key. Then you can run multiple instances of the app and they don't even need to know about each other.

Just write a simple application which consumes the messages from the Proxy
and send the response which you have obtained to the producer by setting the Kafka Configurationsproducer.data(). If the configurations are done successfully. you can able to consume the messages from the Proxy server which you use and see the response output in /tmp/kafka-logs/topicname/00000000000000.log.
this link will help you to tritw enter link description here
Good Day
Keep Coding

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse