Solutions of Kafka project to analyze HTTP requests on web server - apache-kafka

Context:
A Web server that receives millions of HTTP requests every day. Of
course, there must be a project(named handler) who is responsible for handling
these requests and response them with some information.
Seen from the server side, I would like to use Kafka to extract some information from them and analyze it in real time(or each time interval).
Question:
how can I use these requests as the producer of Kafka?
how to build a customer of Kafka?(all this data need to be analyzed and then returned, but Kafka is "just" a message system)
Some imaginations:
A1.1 Maybe I can let the project "handler" call the jar of Kafka then, it can trigger the producer code to send message Kafka.
A1.2 Maybe I can create another project who listens to all the HTTP requests at the server, but there are other HTTP requests at the server.
I tried to think a lot of solutions, but I am not so sure about them, I would like to ask your guys if you have already known some mature ideas or you have some ideas to implement this?

You can use elk . kafka as the log broker

Related

Publish to Apache Kafka topic from Angular front end

I need to create a solution that receives events from web/desktop application that runs on kiosks. There are hundreds of kiosks spread across the country and each one generate time to time automatic events and events when something happens.
Despite this application is a locked desktop application it is built in Angular v8. I mean, it runs in a webview.
I was researching for scalable but reliable solutions and found Apache Kafka seems to be a great solution. I know there are clients for NodeJS but couldn't find any option for Angular. Angular runs on browser, for this reason, it must communicate to backend through HTTP/S.
In the end, I realized the best way to send events from Angular is to create a API that just gets message from a HTTP/S endpoint and publishes to Kafka topic. Or, is there any adapter for Kafka that exposes topics as REST?
I suppose this approach is way faster than store message in database. Is this statement correct?
Thanks in advance.
this approach is way faster than store message in database. Is this statement correct?
It can be slower. Kafka is asynchronous, so don't expect to get a response in the same time-period you could perform a database read/write. (Again, would require some API, and also, largely depends on the database used)
is there any adapter for Kafka that exposes topics as REST?
Yes, the Confluent REST Proxy is an Apache2 licensed product.
There is also a project divolte/divolte-collector for collecting click-data and other browser-driven events.
Otherwise, as you've discovered, create your own API in any language you are comfortable with, and have it use a Kafka producer client.

What happens to subscribers when the Kafka service is down? Do the need to subscribes to a specific topic when it restart?

Currently I have to sent events externally to the client which needs to subscribe to the these events. I have an endpoint that the client calls (subscribe) that follow the Server-Sent Events specifications. This open a HTTP connection, that is kept alive by the server that send "heartbeat" events.
The problem is that when this service need to be redeployed, or it goes down is the responsibility of the client to re-subscribe making a call to this endpoint, to receive again the events in real-time.
I was wondering, if I switch to technology like rabbitMQ or Kafka can I solve this problem? In other word, I would like to remove the responsibility of the client to re-subscribe if something goes wrong on the server side.
If you can attached article/resources to your answer would be great.
With RabbitMQ , the reconnection feature is dependant on the client library. For e.g., Java and .NET clients do provide this ( check here)
With Kafka I see there are configurations to support this. Also it's worth reading the excellent recommendations from Kakfa for surviving broker outages here.

Using Kafka to decouple web-tier from business-logic code

I am having a few conceptual mind-blocks. I am looking at using Kafka as below:
---http-requests---> [Web Tier]
[Web Tier] ----composes message and publishes----> [Kafka Topic]
[Kafka Topic] <----consumes message------ [Engine Logic]
[Engine Logic] -----computes, api-calls, publishes result message----> [Kafka Topic]
[Kafka] ---???? somehow get result message to Web Tier---> [Web Tier]
[Web Tier] -----renders HTML and sends http response--->
Using a non-blocking webserver (such as Jetty) the http request will be 'held open' but won't block. Is it possible to use Kafka in a request/response fashion? That is to say, can the Web Tier publish a message to a Topic and then will Kafka know that it must provide a response? Or is it going to be the Web Tier's responsibility to poll a Topic and see if there is a response for it?
I guess what I am asking is what is the best way to use Kafka as an 'interface' upon which the Web Tier and Engine Logic depend so that there is no direct coupling between the Web Tier and the Engine? Thanks.
I would say that Kafka doesn't fit very naturally into your use case.
It would be the Web Tier's responsibility to poll a topic and see if there is a response for it.
Several problems that I foresee:
Kafka is designed to deliver messages only when requested. In your example, once the data got to the second Kafka Topic. It would sit there until the Web Tier polled for it.
There is no very good way to request a specific message from Kafka, SO Question. In your Web Tier if you polled the Kafka Server, you might get 10 messages, you may get 0. If you get 10 messages, you would have to check and see whether any of those 10 messages are the ones you're looking for. If they're not, you'd have to poll again. Potentially doing that many times depending on how long it took for your particular Engine Logic to complete.
In the SO Question referenced above, in the comments the OP said he went with a Mongo DB so he could query for specific messages.

Http Kafka producer

Our application receives events through a HAProxy server on HTTPs, which should be forwarded and stored to Kafka cluster.
What should be the best option for this ?
This layer should receive events from HAProxy & produce them to Kafka cluster, in a reliable and efficient way (and should scale horizontally).
Please suggest.
I'd suggest to write a simple application in Java that just receives events and sends it to Kafka. The Java client for Kafka is the official client thus is the most reliable. The other option is to use an arbitrary language together with the official Kafka REST Proxy.
Every instance of the app should send the messages to all partitions based on some partition key. Then you can run multiple instances of the app and they don't even need to know about each other.
Just write a simple application which consumes the messages from the Proxy
and send the response which you have obtained to the producer by setting the Kafka Configurationsproducer.data(). If the configurations are done successfully. you can able to consume the messages from the Proxy server which you use and see the response output in /tmp/kafka-logs/topicname/00000000000000.log.
this link will help you to tritw enter link description here
Good Day
Keep Coding

Sending multiple messages to JMS queue in Mule

I am new to Mule. I am using RabbitMQ. In my Mule studio, I have configured AMQP in Mule studio.
I am able to run a flow where I put one message read from HTTP endpoint payload and put into a queue.
Now, I need to send multiple messages, say 1000, to that queue at a time. One option is that I hit the url in the browser that many times but that is very time consuming. I want to send 1000 messages at one go. How can i do that in mule? or How should I proceed with it?
It sounds like your trying to load test your Mule app. I would use something like Apache JMeter. JMeter will allow you to enter the url of your endpoint and set how many times to call it and many other more advanced features.
A good blog post on using JMeter and Mule is available here: http://blogs.mulesoft.org/measuring-the-performance-of-your-mule-esb-application/