Alternative of Confluent REST Proxy - apache-kafka

We have some applications which want to communicate with Kafka using REST API calls to both consume and produce messages. If we do not want to use Confluent REST Proxy, what are the options ?

One possible alternative is the Strimzi Kafka Bridge (https://github.com/strimzi/strimzi-kafka-bridge).
It's part of the broader Strimzi project about running Kafka on Kubernetes but work even running as standalone (when your Kafka cluster is on bare metal).
Of course it's open source and Apache 2.0 licensed.

the reason [not to use it] is monetary
You can use the Confluent REST Proxy with no software/licensing costs.
We are thinking of not buying any additional hardware for this new request and use existing configuration to meet the requirement.I am mostly interested to know if consumer/producer can be created to meet this requirement
You don't need extra hardware.
Pick an existing server with at least 2GB available of memory, and run kafka-rest-start and see how well it works
if we can create Rest-API calls which will be used by other applications to consume data from Kafka and push data to Kafka
That's the main purpose of REST Proxy, yes.

Related

Kafka - increase partition count of existing topic through Confluent REST Api

I need to add partitions to an existing Kafka topic.
I'm aware that it is possible to use the /bin/kafka-topics.sh script to achieve this, but I would prefer to do this through the Confluent REST api.
As far as I see there is no documented endpoint in the api reference, but I wonder if someone else here was able to make this work.
Edit: As it does seem to be impossible to use the REST api here, I wonder what the best practice is for adding partitions to an existing topic in a containerized setup. E.g. if there is a custom partioning scheme that maps customer ids to specific partitions. In this case the app container would need to adjust the partition count of the kafka container.
The Confluent REST Proxy has no such endpoint for topic update administration.
You would need to use the shell script or the corresponding AdminClient class that the shell script uses
This is the solution I ended up with:
Create a small http service that is is deployed within the kafka docker image
The http service accepts requests to increase the partition count and directs the requests to the kafka admin scripts (bin/kafka/kafka-topics.sh)
Something similar could have been achieved by using the AdminClient NewPartitions api in the Java Kafka lib. This solution has the advantage that the kafka docker image does not have to be changed, because the AdminClient can connect through network from another container.
For a production setup the AdminClient is preferrable, I decided for the integrated script approach because of the chosen language (rust).

How Zookeper Authorization works?

I am new to Zookeper and am wondering, I understand Zookeper can be used as configuration storage, and considering that what if I have one client of Zookeeper should not have access to certain configurations? How do I restrict that access?
Scenario: I want to use it as a configuration service, from where my application retrieves its configurations, database endpoint lists etc. Can I do that with Zookeper ? If I can how do I restrict access, so one application doesn't access configurations from another?
ZooKeeper is a distributed co-ordination service to manage large set of hosts. Co-ordinating and managing a service in a distributed environment is a complicated process. ZooKeeper solves this issue with its simple architecture and API. ZooKeeper allows developers to focus on core application logic without worrying about the distributed nature of the application.
The ZooKeeper framework was originally built at “Yahoo!” for accessing their applications in an easy and robust manner. Later, Apache ZooKeeper became a standard for organized service used by Hadoop, HBase, and other distributed frameworks. For example, Apache HBase uses ZooKeeper to track the status of distributed data.
It's not a key-value storage

Ingest Streaming Data to Kafka via http

I am very new with Kafka and Streaming Data in general. What I am trying to do is to ingest data which is to be sent via http to kafka. My research has brought me to the confluent REST proxy but I can't get it to work.
What I currently have is kafka running with a single node and single broker with kafkamanager in docker containers.
Unfortunately I can't run the full confluent platform with docker since I don't have enough memory available on my machine.
In essence my question is: How to setup a development environment where data is ingested by kafka through http?
Any help is highly appreciated!
You don't need the "full Confluent Platform" (KSQL, Control Center, included)
Zookeeper, Kafka, the REST proxy, and optionally the Schema Registry, should all only take up-to 4 GB of RAM total. If you don't even have that, then you'll need to go buy more RAM.
Note that Zookeeper and Kafka do not need to be running on the same machines as the Schema Registry or REST proxy, so if you have multiple machines, then you can save some resources that way as well.
To run one Kafka broker, zookeeper and schema registry, 1Gb is usually enough (in dev).
If you do not want for some reason to use Confluent REST proxy, you can write your own. It's quite straightforward: "on request, parse your incoming JSON, validate data, construct your message (in Avro?) and produce it to Kafka".
In this article, you'll find some configuration to press Kafka and ZK on heap memory: https://medium.com/#saabeilin/kafka-hands-on-part-i-development-environment-fc1b70955152
Here you can read how to produce/consume messages with Python:
https://medium.com/#saabeilin/kafka-hands-on-part-ii-producing-and-consuming-messages-in-python-44d5416f582e
Hope these help!

Monitoring UI for Apache kafka - kafka manager vs kafka monitor [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Closed 3 years ago.
Locked. This question and its answers are locked because the question is off-topic but has historical significance. It is not currently accepting new answers or interactions.
I am new to kafka. We want to monitor and manage kafka topics. We tried different open source monitoring tools like
kafka-monitor
kafka-manager
Both tools are good. But we are unable to make a decision which should be included in our deployment stack. Which one is better and why, and in which scenario?
'kafka manager' from yahoo looks the older one and 'kafka monitor' from LinkedIn is newer one
Kafka Monitor-
Lenses
Lenses (ex Landoop) enhances Kafka with User Interface, streaming SQL engine and cluster monitoring. It enables faster monitoring of Kafka data pipelines.
They provide a free all-in-one docker (Lenses Box) which can serve a single broker for up to 25M messages. Note that this is recommended for development environments.
Cloudera SMM
Streams Messaging Manager is the solution for monitoring and managing clusters running Cloudera or Hortonworks kafka. It also comes with replication capability.
Confluent
Another option is Confluent Enterprise which is a Kafka distribution for production environments. It also includes Control Centre, which is a management system for Apache Kafka that enables cluster monitoring and management from a User Interface.
Yahoo CMAK (Cluster Manager for Apache Kafka, previously known as Kafka Manager)
Kafka Manager or CMAK is a tool for monitoring Kafka offering less functionality compared to the aforementioned tools.
KafDrop
KafDrop is a UI for monitoring Apache Kafka clusters. The tool displays information such as brokers, topics, partitions, and even lets you view messages. It is a lightweight application that runs on Spring Boot and requires very little configuration.
LinkedIn Burrow
Burrow is a monitoring companion for Apache Kafka that provides consumer lag checking as a service without the need for specifying thresholds. It monitors committed offsets for all consumers and calculates the status of those consumers on demand. An HTTP endpoint is provided to request status on demand, as well as provide other Kafka cluster information. There are also configurable notifiers that can send status out via email or HTTP calls to another service.
Kafka Tool
Kafka Tool is a GUI application for managing and using Apache Kafka clusters. It provides an intuitive UI that allows one to quickly view objects within a Kafka cluster as well as the messages stored in the topics of the cluster. It contains features geared towards both developers and administrators.
If you cannot afford licenses, then go for Yahoo Kafka Manager, LinkedIn Burrow or KafDrop. Confluent's and Landoop's products are the best out there, but unfortunately, they require licensing.
For more details, you can refer to my blog post Overview of UI Monitoring tools for Apache Kafka Clusters.
If you want to pay for licensing and Kafka cluster support, then you can use Confluent Control Center
Alternatively, the free route would be to use JMX exporters from Datadog and/or Prometheus/Influxdb (with Grafana dashboards) to see overall system health checks (CPU, network, memory, etc)... Much more information than what you get only by monitoring Kafka processes with Kafka tools
At my company, we used the Yahoo product, we investigated the LinkedIn product, and several others mentioned. My company ultimately chose to use Prometheus+Grafana. Everyone loves it and I'd highly recommend it.
There are two big advantages to Prometheus+Grafana. The first is it does full featured Kafka metrics ingestion+visualization+alerting but it's not limited to Kafka. While our initial needs were just to monitor Kafka, we also wanted metrics on HTTP servers+traffic, server utilization (cpu/ram/disk), and custom application level metrics. Prometheus handles all of the above. Secondly, Prometheus + Grafana are very high quality, well designed, and easy to use. A lot of other products in this space are old and complicated to work with. Prometheus + Grafana are both excellent to work with, they are very customizable, polished, and easy to use. Grafana has a very flashy + functional JavaScript interface that lets you make exactly the customized dashboards that you want. Prometheus has a very polished metric collection engine, storage engine, query language, and alerting system. Something like Yahoo Kafka Manager has much more limited functionality in all of these categories.
If you want to try Prometheus, you need to do two things:
1) install+configure the JMX->Prometheus exporter on your Kafka brokers:
https://github.com/prometheus/jmx_exporter
2) Setup a Prometheus server to collect metrics + and setup a Grafana dashboard to display the graphs that you want.
I'd also say that this is just for monitoring+dashboards+alerting. For management functions, you still need other tools.
The kafka-monitor is (despite the name) a load generation and reporting tool. Yahoo's kafka-manager is an overall monitoring tool.

Is a web frontend producing directly to a Kafka broker a viable idea?

I have just started learning Kafka. So trying to build a social media web application. I am fairly clear on how to use Kafka for my backend ( communicating from backend to databases and other services).
However, I am not sure how should frontend communicate with backend. I was considering an architecture as: Frontend -> Kafka -> Backend.
Frontend acts as producer and backend as consumer. In this case, frontend would supposedly have all required resources to publish to Kafka broker (even if I implement security on Kafka). Now, is this scenario possible:
Lets say I impersonate the frontend and send absurd/invalid messages to my Kafka broker. Now I can handle and filter these messages when they reach to my backend. But I know that Kafka stores these messages temporarily. Wouldn't my Kafka server face DDOS problems if such "fake" messages are published to it in high volume, since it is gonna store them anyway as they dont get filtered out until they actually get consumed by backend?
If so, how can I prevent this?
Or is this not a good option? I can also try using REST for frontend/backend communication and then Kafka will be used from backend to communicate with database(s) and other stuff.
Or I can have a middleware (again, REST) that detects and filters out such messages.
Easiest way is to have the front end produce to the Kafka REST Proxy
See details here https://docs.confluent.io/1.0/kafka-rest/docs/intro.html
That way there is no kafka client code required in your front end and you can use HTTP(S) with standard off the shelf load balancers, and API Management tools.
Could you not consider the other direction, to use Kafka as a transport system for updating assets available to frontend ? This has been proposed for hybrid React / NodeJS/Express solutions.