Protecting Kafka Cluster against remote access thru shell - apache-kafka

I have a Kafka cluster and would like to restrict access for the Non Java clients. Meaning, I can let clients connect to my kafka cluster thru SASL_SSL using the Java clients or Python client. But, I dont like them to access my cluster using the shell scripts available under /bin. I know, even the shell scripts are calling the scala library behind the scene. Is there any kafka config to restrict this?
Thanks

Related

How to get dedicated Apache Kafka MirrorMaker 2.0 Rest API exposed

I am trying to reach a dedicated MirrorMaker 2.0 cluster to see the status of connectors/tasks etc. On this README in their git Apache kafka people claims that when used with dedicated.mode.enable.internal.rest=true MirrorMaker nodes are starting with an internal listener port to communicate with each other.
My question is; Is there a way to advertise this port to outside so I can send curl requests to the dedicated MirrorMaker nodes as we do in general like curl http://localhost:8083/connectors to see the connectors running etc?
I have already tried multiple solutions I've found online they simply do not work. It seems to me this is impossible when you start mirrormaker 2.0 with ./bin/connect-mirror-maker. I know this is possible, If I add every single required connector manually to an existing Kafka Connect cluster, but thats not what I am looking for.
I am also curious if there is a way to add the dedicated MirrorMaker cluster connectors into a already running kafka connect cluster.
This is important because we would like to get curl responses to check tasks status for MirrorMaker.
Thanks.
You should be able to run connect-distributed like normal, have its REST API available, then configure and monitor MM2 without using its dedicated scripts. Similarly, this is how you'd add to other existing Connect cluster.
Ideally, you should monitor from JMX, instead, where you get count of the running tasks, not use curl. Or, add Jolokia or Prometheus JMX Exporter to run their own http server, then curl that, and grep for the tasks metric

Apache-Ignite: Kafka-connect data replication hosts issue

Is it compulsory to run kafka sink connector and ignite node on the same host?
If not, what changes I have to make in ignite configuration (XML) file to make it accessible from another node.
Thanks in advance.
Assuming the connector is a thin client, it doesn't need to be, however Connect shouldn't run on the brokers either, and if you're in a highly latent environment (consuming from the cloud / remote data center), then it's recommended that you "produce locally"
You'd change the Ignite server bind address (localAddress in the communications spi) to make it remotely accessible from any client, including other nodes in a cluster, assuming it isn't already

Kafka Connect instead of Flume Ingestion

I have been looking into the concepts and application of Kafka Connect, and I have even touched one project based on it in one of my intern. Now in my working scenario, now I am considering replacing the architecture of the our real time data ingestion platform which is currently based on flume -> Kafka with Kafka Connect and Kafka.
The reason why I am considering the switch can be concluded mainly into:
But if we use flume we need to install the agent on each remote machine which generates tons of workload for further devops, especially at the place where I am working where the authority of machines is managed in a rigid way that maintaining utilities on machines belonging to other departments.
Another reason for the consideration is that the machines' os environment varies, if we install flumes on a variety of machines , some machine with different os and jdks(I have met some with IBM jdk) just cannot make flume work well which in worst case can result in zero data ingestion.
It looks with Kafka Connect we can deploy it in a centralized way with our Kafka cluster so that the develops cost can go down. Beside, we can avoid installing flumes on machines belonging to others and avoid the risk of incompatible environment to ensure the stable ingestion of data from every remote machine.
Besides, the most ingestion scenario is only to ingest real-time-written log text file on remote machines(on linux and unix file system) into Kafka topics, that is it. So I won't need advanced connectors which is not supported in apache version of Kafka.
But I am not sure if I am understanding the usage or scenario of Kafka Connect the right way. Also I am wondering if Kafka Connect should be deployed on the same machine with the data source machines or if it is ok they resides on different machines. If they can be different then why flume requires the agent to be run on the same machine with the data source? So I wish someone more experienced can give me some lights on that.
Is Kafka Connect appropriate for ingesting data to Kafka? yes
Does Kafka Connect run local to the data source? only if it has to (e.g. reading a local file with Kafka Connect spooldir plug, FilePulse plugin, etc ).
Should you rip out something that works and replace it with Kafka Connect? not unless it's fixing a problem that you have
If you're not using either yet, should you use Kafka Connect instead of Flume? Quite possibly.
Learn more about Kafka Connect here: https://dev.to/rmoff/crunchconf-2019-from-zero-to-hero-with-kafka-connect-81o
For file ingest alone there's other tools too like Filebeat too

How Zookeper Authorization works?

I am new to Zookeper and am wondering, I understand Zookeper can be used as configuration storage, and considering that what if I have one client of Zookeeper should not have access to certain configurations? How do I restrict that access?
Scenario: I want to use it as a configuration service, from where my application retrieves its configurations, database endpoint lists etc. Can I do that with Zookeper ? If I can how do I restrict access, so one application doesn't access configurations from another?
ZooKeeper is a distributed co-ordination service to manage large set of hosts. Co-ordinating and managing a service in a distributed environment is a complicated process. ZooKeeper solves this issue with its simple architecture and API. ZooKeeper allows developers to focus on core application logic without worrying about the distributed nature of the application.
The ZooKeeper framework was originally built at “Yahoo!” for accessing their applications in an easy and robust manner. Later, Apache ZooKeeper became a standard for organized service used by Hadoop, HBase, and other distributed frameworks. For example, Apache HBase uses ZooKeeper to track the status of distributed data.
It's not a key-value storage

Can you run KSQL from a remote host?

I have confluent-ksql-server running on one of the nodes of my cluster .
Can we make the ksql to be connected by a specific host/machine outside the kafka cluster ?
PS- this is to provide ksql access to developers
Thanks !
Yes, you can. KSQL supports Client-Server architecture. It has ksql server which runs on one machine and client can be independently run on another machine.
When you start the ksql-server on your cluster nodes, you need to configure the listeners in ksql-server.properties. Listeners should be exposed as 0.0.0.0: in order to make it accessible from other machine.
From your local machines, you can access via ksql-cli in following way:
./bin/ksql-cli remote http://<kafka Node Listern IP>:8080
You can read more about KSQL Client Server setup here : https://docs.confluent.io/current/ksql/docs/index.html