Apache Kafka : Can be used to push data to IoT devices? - apache-kafka

I have some IoT devices which need to be updated sometimes, based on configuration done in web or mobile clients. So I need to give capability to be updated based on a configuration.
I have the following architecture when Clients communicates over HTTPS to an API Gateway. This Gateway is responsible to fetch data from several micro-services that interact with Kafka and some databases.
In this context, it is a good idea to create a Kafka consumer in IoT devices that will consumes messages from a Kafka Configuration Topic ?
Based on each new message received on this topic, the IoT device will be responsible to apply the change on the configuration.
Any advise ?

Usually, IoT devices have strong CPU/RAM and/or battery restrictions. The most widely used solution for messaging over IoT is MQTT and https://mosquitto.org/ the currently most widespread MQTT broker, so I would try to use https://mosquitto.org/ on the IoT devices and link it with Kafka through a "Confluent MQTT Proxy", you have more information at https://www.confluent.io/confluent-mqtt-proxy/
It is also not difficult to create your own "MQTT proxy" in python (or the language you prefer)

Kafka does not push. Consumers poll.
You can embed Kafka consumers in IoT devices, yes (assuming you are able to deploy such apps into them), however, MQTT is often documented as more used in those environments, and you could route Kafka events to an MQTT broker through various methods.

Related

Alternative of Confluent REST Proxy

We have some applications which want to communicate with Kafka using REST API calls to both consume and produce messages. If we do not want to use Confluent REST Proxy, what are the options ?
One possible alternative is the Strimzi Kafka Bridge (https://github.com/strimzi/strimzi-kafka-bridge).
It's part of the broader Strimzi project about running Kafka on Kubernetes but work even running as standalone (when your Kafka cluster is on bare metal).
Of course it's open source and Apache 2.0 licensed.
the reason [not to use it] is monetary
You can use the Confluent REST Proxy with no software/licensing costs.
We are thinking of not buying any additional hardware for this new request and use existing configuration to meet the requirement.I am mostly interested to know if consumer/producer can be created to meet this requirement
You don't need extra hardware.
Pick an existing server with at least 2GB available of memory, and run kafka-rest-start and see how well it works
if we can create Rest-API calls which will be used by other applications to consume data from Kafka and push data to Kafka
That's the main purpose of REST Proxy, yes.

High Availability for MQTT Based C++ Services

I have written few c++ services which have the MQTT Client. Based on the message received on the MQTT topic the c++ service will take some actions like sending an MQTT message to another topic or saving the message to the database etc.
I have set up a few MQTT Brokers on Dockers and attached those MQTT Brokers to an HA Load balancer. All these MQTT Brokers also clustered.
So, if client 1 connected broker-1 ( through Load balancer ) can send message to client x connected broker -x. Due to the clustering of the MQTT Brokers.
So, How can I set the load balancer to my c++ services with HA or similar load balancers?
Update:
In the case of HTTP / REST APIs, the request will be transferred to only one web application at any point of time. But in case of MQTT, the message will be published, and If I run multiple c++ service of Same ABC then all the services will process that message. How I should make sure only one service will process the message. I want to establish High Availability for the C++ service
This is not possible under MQTT 3.x. The reason being that prior to MQTT 5, every message is sent to every subscriber to that topic making it very difficult to load balance correctly. Subscribers would need receive everything then discard decide for themselves to discard some messages, leaving them for other subscribers. It's one of the limitations of MQTT 3.x.
There are those who have worked around this by connecting their MQTT broker into an Apache Kafka cluster, routing all messages from MQTT to Kafka and then attaching their subscribers (like your c++ services) to Kafka instead of MQTT. Kafka supports the type of load balancing you are asking for.
This may be about to change with MQTT 5.0. There are still a lot of clients and brokers which don't support this. However if both your client and broker support MQTT version 5 then there is a new 1 concept of "Shared Subscriptions":
Shared Subscriptions – If the message rate on a subscription is high, shared subscriptions can be used to load balance the messages across a number of receiving clients
You haven't stated your client library. But your first steps should be:
investigate if both your broker and subscriber support MQTT 5
Check the API for your client to discover how to use subscriber groups
1 New to MQTT, Kafka already has it.

Monitoring UI for Apache kafka - kafka manager vs kafka monitor [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Closed 3 years ago.
Locked. This question and its answers are locked because the question is off-topic but has historical significance. It is not currently accepting new answers or interactions.
I am new to kafka. We want to monitor and manage kafka topics. We tried different open source monitoring tools like
kafka-monitor
kafka-manager
Both tools are good. But we are unable to make a decision which should be included in our deployment stack. Which one is better and why, and in which scenario?
'kafka manager' from yahoo looks the older one and 'kafka monitor' from LinkedIn is newer one
Kafka Monitor-
Lenses
Lenses (ex Landoop) enhances Kafka with User Interface, streaming SQL engine and cluster monitoring. It enables faster monitoring of Kafka data pipelines.
They provide a free all-in-one docker (Lenses Box) which can serve a single broker for up to 25M messages. Note that this is recommended for development environments.
Cloudera SMM
Streams Messaging Manager is the solution for monitoring and managing clusters running Cloudera or Hortonworks kafka. It also comes with replication capability.
Confluent
Another option is Confluent Enterprise which is a Kafka distribution for production environments. It also includes Control Centre, which is a management system for Apache Kafka that enables cluster monitoring and management from a User Interface.
Yahoo CMAK (Cluster Manager for Apache Kafka, previously known as Kafka Manager)
Kafka Manager or CMAK is a tool for monitoring Kafka offering less functionality compared to the aforementioned tools.
KafDrop
KafDrop is a UI for monitoring Apache Kafka clusters. The tool displays information such as brokers, topics, partitions, and even lets you view messages. It is a lightweight application that runs on Spring Boot and requires very little configuration.
LinkedIn Burrow
Burrow is a monitoring companion for Apache Kafka that provides consumer lag checking as a service without the need for specifying thresholds. It monitors committed offsets for all consumers and calculates the status of those consumers on demand. An HTTP endpoint is provided to request status on demand, as well as provide other Kafka cluster information. There are also configurable notifiers that can send status out via email or HTTP calls to another service.
Kafka Tool
Kafka Tool is a GUI application for managing and using Apache Kafka clusters. It provides an intuitive UI that allows one to quickly view objects within a Kafka cluster as well as the messages stored in the topics of the cluster. It contains features geared towards both developers and administrators.
If you cannot afford licenses, then go for Yahoo Kafka Manager, LinkedIn Burrow or KafDrop. Confluent's and Landoop's products are the best out there, but unfortunately, they require licensing.
For more details, you can refer to my blog post Overview of UI Monitoring tools for Apache Kafka Clusters.
If you want to pay for licensing and Kafka cluster support, then you can use Confluent Control Center
Alternatively, the free route would be to use JMX exporters from Datadog and/or Prometheus/Influxdb (with Grafana dashboards) to see overall system health checks (CPU, network, memory, etc)... Much more information than what you get only by monitoring Kafka processes with Kafka tools
At my company, we used the Yahoo product, we investigated the LinkedIn product, and several others mentioned. My company ultimately chose to use Prometheus+Grafana. Everyone loves it and I'd highly recommend it.
There are two big advantages to Prometheus+Grafana. The first is it does full featured Kafka metrics ingestion+visualization+alerting but it's not limited to Kafka. While our initial needs were just to monitor Kafka, we also wanted metrics on HTTP servers+traffic, server utilization (cpu/ram/disk), and custom application level metrics. Prometheus handles all of the above. Secondly, Prometheus + Grafana are very high quality, well designed, and easy to use. A lot of other products in this space are old and complicated to work with. Prometheus + Grafana are both excellent to work with, they are very customizable, polished, and easy to use. Grafana has a very flashy + functional JavaScript interface that lets you make exactly the customized dashboards that you want. Prometheus has a very polished metric collection engine, storage engine, query language, and alerting system. Something like Yahoo Kafka Manager has much more limited functionality in all of these categories.
If you want to try Prometheus, you need to do two things:
1) install+configure the JMX->Prometheus exporter on your Kafka brokers:
https://github.com/prometheus/jmx_exporter
2) Setup a Prometheus server to collect metrics + and setup a Grafana dashboard to display the graphs that you want.
I'd also say that this is just for monitoring+dashboards+alerting. For management functions, you still need other tools.
The kafka-monitor is (despite the name) a load generation and reporting tool. Yahoo's kafka-manager is an overall monitoring tool.

Is a web frontend producing directly to a Kafka broker a viable idea?

I have just started learning Kafka. So trying to build a social media web application. I am fairly clear on how to use Kafka for my backend ( communicating from backend to databases and other services).
However, I am not sure how should frontend communicate with backend. I was considering an architecture as: Frontend -> Kafka -> Backend.
Frontend acts as producer and backend as consumer. In this case, frontend would supposedly have all required resources to publish to Kafka broker (even if I implement security on Kafka). Now, is this scenario possible:
Lets say I impersonate the frontend and send absurd/invalid messages to my Kafka broker. Now I can handle and filter these messages when they reach to my backend. But I know that Kafka stores these messages temporarily. Wouldn't my Kafka server face DDOS problems if such "fake" messages are published to it in high volume, since it is gonna store them anyway as they dont get filtered out until they actually get consumed by backend?
If so, how can I prevent this?
Or is this not a good option? I can also try using REST for frontend/backend communication and then Kafka will be used from backend to communicate with database(s) and other stuff.
Or I can have a middleware (again, REST) that detects and filters out such messages.
Easiest way is to have the front end produce to the Kafka REST Proxy
See details here https://docs.confluent.io/1.0/kafka-rest/docs/intro.html
That way there is no kafka client code required in your front end and you can use HTTP(S) with standard off the shelf load balancers, and API Management tools.
Could you not consider the other direction, to use Kafka as a transport system for updating assets available to frontend ? This has been proposed for hybrid React / NodeJS/Express solutions.

What are the benefits of Apache Kafka's native binary TCP protocol over it's restful API?

As per Apache Kafka's documentation, Kafka uses binary TCP protocol in it's native API's communication but they have also provided URL based restful API for the languages which don't support Apache Kafka's native API. I was just wondering if there is any benefit of native binary TCP protocol (supported in native API) over restful URL based communication with broker node? And I was also thinking that will restful API still maintain only once property?
Edit:
The restful API guide is here: https://www.confluent.io/blog/a-comprehensive-open-source-rest-proxy-for-kafka which explains how to produce and consume Kafka's message over restful API
There is no REST API included in Apache Kafka for producing or consuming messages as with the native Kafka protocol client implemented in Java.
There is a REST APIs in Apache Kafka for configuring Kafka Connect.
There are a number of third party REST Proxy implementations (such as the Confluent Kafka REST Proxy) which allow pub/sub over a REST interface but these are separate open source projects outside of Apache Kafka.
If you mean to ask what are the advantages to use the native Kafka Java Producer/Consumer API rather than these third party REST/HTTP Proxy implementation then these are some reasons:
One benefit is greater parallelism. A Kafka client will typically open up TCP connections to multiple brokers in the cluster and send or fetch data in parallel across multiple partitions of the same topic.
Another benefit is better network utilization as HTTP headers can add a lot of size to otherwise small messages while Kafka’s wire protocol is a compact binary protocol.
Kafka clients handle load balancing, failover, and cluster expansion or contraction automatically while REST clients typically require a third party load balancer to achieve the same functionality.
Kafka client can send their own authentication credentials for access control and bandwidth throttling (quotas) while all REST clients look to the kafka cluster as one Kafka client and therefore have common ACL privileges.
Kafka client libraries buffer and batch messages together into smaller numbers of Kafka produce or fetch requests while HTTP can only batch data if the programmer thought to publish them as a single batch.
The native Kafka protocol supports more than just what the producer/consumer api exposes. There is also an Admin API for creating topics, and modifying topic configurations. These functions are not (yet) exposed through the most popular REST Proxy implementations.