Http Kafka producer - apache-kafka

Our application receives events through a HAProxy server on HTTPs, which should be forwarded and stored to Kafka cluster.
What should be the best option for this ?
This layer should receive events from HAProxy & produce them to Kafka cluster, in a reliable and efficient way (and should scale horizontally).
Please suggest.

I'd suggest to write a simple application in Java that just receives events and sends it to Kafka. The Java client for Kafka is the official client thus is the most reliable. The other option is to use an arbitrary language together with the official Kafka REST Proxy.
Every instance of the app should send the messages to all partitions based on some partition key. Then you can run multiple instances of the app and they don't even need to know about each other.

Just write a simple application which consumes the messages from the Proxy
and send the response which you have obtained to the producer by setting the Kafka Configurationsproducer.data(). If the configurations are done successfully. you can able to consume the messages from the Proxy server which you use and see the response output in /tmp/kafka-logs/topicname/00000000000000.log.
this link will help you to tritw enter link description here
Good Day
Keep Coding

Related

Which messaging system for a web dashboard?

I would like to make a Web Dashboard system and I am facing a problem. I need to get an information that is in the cache of one of the instances of my program, for this I had thought of doing Pub/Sub with Kafka however I don't know how to do to Publish and get a response from one of my Subscriber. Do you know a pattern that allows this and a service that allows me to do this?
EDIT: I would like to design an infrastructure that follows this pattern:
Attached diagram is showing simple request->response flow, Kafka is designed for different types of architecture, so IMHO you should not focus on Kafka in this case.
However, if you still want to use Kafka for some other reasons I can suggest to you two options:
Stick with request->response flow and use ReplyingKafkaTemplate or AggregatingKafkaTemplate to handle it, second one is an extension of first one, this adds functionality to handle more responses then one. You can send a request to Kafka topic from the Dashboard application, then poll the message by one of the Bot instances, next, send reply to reply topic, and then process reply in Dashboard application.
Use Kafka to implement Event-Carried State Transfer pattern, move state (mutual guilds data) from Bot Instances directly to Dashboard application via Kafka topic. You can use several tools to implement this:
Bot applications send events to Kafka topic via simple KafkaProducer or KafkaTemplate, then use one of the Kafka Connect sink connectors to save data in Dashboards database.
Bot applications send events to Kafka topic via simple KafkaProducer or KafkaTemplate. Run Kafka Streams thread in Dashboard application and build a state using Kafka Streams functionalities - grouping, aggregating etc. Then read the state directly from Kafka Streams internal RocksDB database.

Kafka for API gateway to store messages

I need to build a secure REST API for different services where client services can post and receive messages from other clients( like mail box. but messages are going to be in JSON form. and should be persistent. I am expecting around 5000 client services. With around 50 message per service per day).
My questions are:
Can I use Kafka for this(I think I will be needing some wrapper over
Kafka to manage other task) ?
If yes then outbox and inbox are going to be a separate topic for
each service?( 2 topics per service. 5000*2 topics. My plan is to
create them dynamically as new client joins in)
what are the alternative technologies to write this kind of gateway.
Any help will be appreciated.
You can't use Kafka to implement REST API because REST API implies request/response while Kafka is just a message queue (Kafka doesn't provide a mechanism to respond to messages). You can use Kafka to produce messages to be consumed by other services. The idea of message queues is to decouple producer from consumer and vice versa. When a consumer receives a message it acts on it, that's it. But when you say inbox/outbox you imply that there's a response for a message which means that producers and consumers pace should be similar which couples them which is against the nature of message queues.
It seems like in your case it makes more sense to use http requests/response or even websockets. If you want to save the request/response data (making it persistent) you can save it either in a database, object storage (like S3), log it or send each message to Kafka so that Kafka stores all of your messages, writes to Kafka will actually be very fast because Kafka is roughly-speaking an append-only log. You can then search messages values using ksqldb.

Process messages pushed through Kafka

I haven't used Kafka before and wanted to know if messages are published through Kafka what are the possible ways to capture that info?
Is Kafka only way to receive that info via "Consumers" or can Rest APIs be also used here?
Haven't used Kafka before and while reading up I did find that Kafka needs ZooKeeper running too.
I don't need to publish info just process data received from Kafka publisher.
Any pointers will help.
Kafka is a distributed streaming platform that allows you to process streams of records in near real-time.
Producers publish records/messages to Topics in the cluster.
Consumers subscribe to Topics and process those messages as they are available.
The Kafka docs are an excellent place to get up to speed on the core concepts: https://kafka.apache.org/intro
Is Kafka only way to receive that info via "Consumers" or can Rest APIs be also used here?
Kafka has its own TCP based protocol, not a native HTTP client (assuming that's what you actually mean by REST)
Consumers are the only way to get and subsequently process data, however plenty of external tooling exists to make it so you don't have to write really any code if you don't want to in order to work on that data

Best way to write to Kafka from web site?

I mean I know how to get data into kafka either by some file agent or programmatically using any of the clients, but speaking from architectural point of view...
It can't just be collecting HTTP logs.
I'm assuming when someone clicks a link or does something of interest, we can use some kind of ajax/javascript call to make a call to some microservice to capture the extra info that we want? But that's not always "reliable" per say, but do we care?
Or while the given "action" posts back to the server we simultaneously write to Kafka and perform the other action?
It’s not clear from your question if you are trying to collect all the clickstream logs from a set of web servers, or if you are trying to selective publish some data to Kafka from your web app, so I will answer both.
The easiest way to collect every web click is to configure your web servers to use Syslog ( see http://archive.oreilly.com/pub/a/sysadmin/2006/10/12/httpd-syslog.html ) and configure your Syslog server to send data to Kafka (see https://www.balabit.com/documents/syslog-ng-ose-latest-guides/en/syslog-ng-ose-guide-admin/html/configuring-destinations-kafka.html). Alternatively there are some more advanced features available in this Kafka Connector for Syslog-NG (see https://github.com/jcustenborder/kafka-connect-syslog). You can also write httpd logs to a file and use a Kafka File Connector to publish to Kafka (see https://docs.confluent.io/current/connect/connect-filestream/filestream_connector.html)
If you just want to enable your apps to send certain log data to Kafka directly you can use the Kafka REST Proxy and publish using a simple HTTP POST from either your client JavaScript or your server side logic (see https://docs.confluent.io/current/kafka-rest/docs/index.html)

Solutions of Kafka project to analyze HTTP requests on web server

Context:
A Web server that receives millions of HTTP requests every day. Of
course, there must be a project(named handler) who is responsible for handling
these requests and response them with some information.
Seen from the server side, I would like to use Kafka to extract some information from them and analyze it in real time(or each time interval).
Question:
how can I use these requests as the producer of Kafka?
how to build a customer of Kafka?(all this data need to be analyzed and then returned, but Kafka is "just" a message system)
Some imaginations:
A1.1 Maybe I can let the project "handler" call the jar of Kafka then, it can trigger the producer code to send message Kafka.
A1.2 Maybe I can create another project who listens to all the HTTP requests at the server, but there are other HTTP requests at the server.
I tried to think a lot of solutions, but I am not so sure about them, I would like to ask your guys if you have already known some mature ideas or you have some ideas to implement this?
You can use elk . kafka as the log broker