Apache-Ignite: Kafka-connect data replication hosts issue

Apache-Ignite: Kafka-connect data replication hosts issue - apache-kafka

Is it compulsory to run kafka sink connector and ignite node on the same host?
If not, what changes I have to make in ignite configuration (XML) file to make it accessible from another node.
Thanks in advance.

Assuming the connector is a thin client, it doesn't need to be, however Connect shouldn't run on the brokers either, and if you're in a highly latent environment (consuming from the cloud / remote data center), then it's recommended that you "produce locally"
You'd change the Ignite server bind address (localAddress in the communications spi) to make it remotely accessible from any client, including other nodes in a cluster, assuming it isn't already

Related

Kafka Post Deployment - Handling ever-growing clients

We have setup a Kafka Cluster for High Availability and distributed data load. The current consumers and producers specify all the broker IP addresses to connect to the cluster. In the future, there will be the need to continuosly monitor the cluster and add a new broker based on collected metrics and overall system performance. In case a broker crashes, as soon as possible we have to add a new broker with a different IP.
In these scenarios, we have to change all client configurations, a time consuming and stressful operation.
I think we can setup a Config Server (e.g. Spring Cloud Config Server) to specify all the broker IP addresses in a centralized manner, so we have to change all in one place, without touching all the clients, but I don't know if it is the best approach. Obviously, the clients must be programmed to get broker list from config server.
There's a better approach?

Worth pointing out that the "bootstrap" process doesn't require giving every single broker address to the clients, really only the first available address in the list is used for the initial connection, then the advertised.listeners on all the broker configs in the cluster, is what the clients actually use
The answer to your question is to use service discovery, yes. That could be Spring Could Config, but the more general option would be Hashicorp Consul or other service that uses DNS (Kubernetes uses CoreDNS, by default, for example, or AWS Route53).
Then you edit the /etc/resolv.conf of each machine (assuming Linux) the client is running on to include the DNS servers, and you can simply refer to kafka.your.domain:9092 rather than using IP addresses

You could use a load balancer (with a friendly dns like kafka.domain.com), which points to all of your brokers. We do this in our environment. Your clients then connect to kafka.domain.com:9092.
As soon as you add new brokers, you only change the load balancer endpoints and not the client configuration.
Additionally please note that you only need to connect to one bootstrap broker and don't have to list all of them in the client configuration.

How to expand confluent cloud kafka cluster?

I have set up a confluent cloud multizone cluster and it got created with just one bootstrap server. There was no setting for choosing number of servers while creating the cluster. Even after creation, I can’t edit the number of bootstrap servers.
I want to know how to increase the number of servers in confluent cloud kafka cluster.

Under the hood, the Confluent Cloud cluster is already running multiple brokers. Depending on your cluster configuration (specifically, whether you're running Standard or Dedicated, and what region and cloud you're in), the cluster will have between six and several dozen brokers.
The way a Kafka client bootstrap server config works is that the client reaches out to the bootstrap server and requests a list of all brokers, and then uses those broker endpoints to actually produce/consume from Kafka (reference: https://jaceklaskowski.gitbooks.io/apache-kafka/content/kafka-properties-bootstrap-servers.html)
In Confluent Cloud, the provided bootstrap server is actually a load balancer in front of all of the brokers; when the client connects to the bootstrap server it'll receive the actual endpoints for all of the actual brokers, and then use that for subsequent connections.
So TL;DR, in your client, you only need to specify the one bootstrap server; under the hood, the Kafka client will connect to the (many) brokers running in Confluent Cloud, and it should all just work.
Source: I work at Confluent.

ArtemisMQ Connector

I'm new to ArtemisMQ and absolutely don't understand the sense of connectors.
Why is connector essential, as we already specify accepter of Broker Server in broker.xml -> we know which port (it is accepter port) to send a request to if we want to connect to this server. Even if this server is part of cluster, what is a role of connector? There is also information from other part of documentation about "Clusters", but there is words about cluster connections :
The cluster is formed by each node declaring cluster connections to other nodes in the core configuration file broker.xml. When a node forms a cluster connection to another node, internally it creates a core bridge (as described in Core Bridges) connection between it and the other node, this is done transparently behind the scenes - you don't have to declare an explicit bridge for each node. These cluster connections allow messages to flow between the nodes of the cluster to balance load.
From documentation "Understanding Connectors":
connectors are used by a client to define how it connects to a server.
What does it mean "define how"?
I've already read and another question about connector, but it doesn't help me.
Additional questions:
Is connector always the same as acceptor(I've downloaded some official examples and all of them(that i've seen) have both same acceptor and connector )?
What information does connector encapsulates, if it only consists of host+port (and it is same as acceptor's (if we omit that acceptor host can me 0.0.0. or localhost))?
Why does stand-alone Broker have connector, for example by default creation ./artemis create?
What should we write in connector?
Can you give a simple example when acceptor and connector are
different?

Two important points to note:
A connector is not essential depending on your use-case. You'll find that the default broker.xml doesn't have any connector elements defined. For example, if you just run ./artemis create the generated broker.xml will not have any connector elements.
The documentation you cited is quite old (from the very first release of Artemis). You may benefit from reading the latest documentation which has been updated for clarity in many places.
As noted in both the documentation and the other Stack Overflow answer you cited, certain components in the broker need to connect to other brokers (e.g. core bridges, cluster-connections, etc.). A connector encapsulates the information necessary for these other components to make the connections they need. It's really as simple as that.
Now regarding your individual questions...
Even if this server is part of cluster, what is a role of connector?
In the case of a cluster using a broadcast-group and a discovery-group each node in the cluster needs to broadcast to all the other nodes in the cluster how the other nodes can connect to itself. It does this by broadcasting a connector which is referenced in the cluster-connection configuration. When the other nodes in the cluster receive this broadcast they take the connector information and use it to connect back to the node which broadcast it originally. In this way nodes can dynamically discover and connect to each other. It's also worth noting that in this case the connector configuration will essentially mirror one of the broker's acceptor configurations (since the connector will be used by other nodes to connect to the broadcasting node's acceptor). This is discussed further in the cluster documentation.
...connectors are used by a client to define how it connects to a server...
This bit of documentation you quoted is accurate but may be a bit confusing. Keep in mind that that a client can run anywhere, even within the broker itself. In the case of core bridges and cluster connections there is a client running in the broker which use the connector to determine how to connect to another broker. For what it's worth the updated documentation doesn't have this specific wording.
What does it mean "define how"?
A connector is the URL that the client needs to connect to the broker. The URL can simply include the host and port or it can contain lots of configuration details for the connection (e.g. SSL config).
Is connector always the same as acceptor..?
No, not always. In the case of a cluster they will be the same (or very close) for the reasons I already outlined, but in the case of a bridge they won't be the same.
What information does connector encapsulates..?
See above.
Why does stand-alone Broker have connector, for example by default creation ./artemis create?
It doesn't. See above.
What should we write in connector?
The URL needed to connect.
Can you give a simple example when acceptor and connector are different?
As mentioned previously, bridging is an example where different acceptors and connectors are used. ActiveMQ Artemis ships with a "core-bridge" example in the examples/features/standard directory which demonstrates different acceptors and connectors. The example involves 2 different brokers with one broker having a core bridge configured to send messages to the other broker. Here's the broker.xml with the bridge defined. You can see the acceptor listening on the localhost:61616 and the connector for localhost:61617. This connector points to the other broker which is listening on localhost:61617.

Can you run KSQL from a remote host?

I have confluent-ksql-server running on one of the nodes of my cluster .
Can we make the ksql to be connected by a specific host/machine outside the kafka cluster ?
PS- this is to provide ksql access to developers
Thanks !

Yes, you can. KSQL supports Client-Server architecture. It has ksql server which runs on one machine and client can be independently run on another machine.
When you start the ksql-server on your cluster nodes, you need to configure the listeners in ksql-server.properties. Listeners should be exposed as 0.0.0.0: in order to make it accessible from other machine.
From your local machines, you can access via ksql-cli in following way:
./bin/ksql-cli remote http://<kafka Node Listern IP>:8080
You can read more about KSQL Client Server setup here : https://docs.confluent.io/current/ksql/docs/index.html

Kafka connect cluster setup or launching connect workers

I am going through kafka connect, and i am trying to get the concepts.
Let us say I have kafka cluster (nodes k1, k2 and k3) setup and it is running, now i want to run kafka connect workers in different nodes say c1 and c2 in distributed mode.
Few questions.
1) To run or launch kafka connect in distributed mode I need to use command ../bin/connect-distributed.sh, which is available in kakfa cluster nodes, so I need to launch kafka connect from any one of the kafka cluster nodes? or any node from where I launch kafka connect needs to have kafka binaries so that i will be able to use ../bin/connect-distributed.sh
2) I need to copy the my connector plugins to any kafka cluster node( or to all cluster nodes?) from where I do the step 1?
3) how does kafka copies these connector plugins to worker node before starting jvm process on the worker node? because the plugin is the one which has my task code and it needs to be copied to worker in order to start the process in worker.
4) Do i need to install anything in connect cluster nodes c1 and c2, like need to install java or any kafka connect related?
5) In some places it says use confluent platform but i would like to start it with apache kafka connect alone first.
can some one please through some light or even pointer to some resources would also help.
Thank you.

1) In order to have a highly available kafka-connect service you need to run at least two instances of connect-distributed.sh on two distinct machines that have the same group.id. You can find more details regarding the configuration of each worker here. For improved performance, Connect should be ran independently of the broker and Zookeeper machines.
2) Yes, you need to place all your connectors under plugin.path (normally under /usr/share/java/) on every machine that you are planning to run kafka-connect.
3) kafka-connect will load the connectors on startup. You don't need to handle this. Note that if your kafka-connect instance is running and a new connector is added, you need to restart the service.
4) You need to have Java installed on all your machines. For Confluent Platform particularly:
Java 1.7 and 1.8 are supported in this version of Confluent Platform
(Java 1.9 is currently not supported). You should run with the
Garbage-First (G1) garbage collector. For more information, see the
Supported Versions and Interoperability.
5) It depends. Confluent was founded by the original creators of Apache Kafka and it comes as a more complete distribution adding schema management, connectors and clients. It also comes with KSQL which is quite useful if you need to act on certain events. Confluent simply adds on top of the Apache Kafka distribution, it's not a modified version.

Answer given by Giorgos is correct. I ran few connectors and now I understand it better.
I am just trying to put it differently.
In Kafka connect there are two things involved one is Worker and second is connector.Below is on details about running distributed Kafka connect.
Kafka connect Worker is a Java process on which the connector/connect task will run. So first thing is we need to launch worker, to run/launch a worker we need java installed on that machine then we need Kafka connect related sh/bat files to launch worker and kafka libs which will be used by kafka connect worker, for this we will just simply copy/install Kafka in the worker machine, also we need to copy all the connector and connect-task related jars/dependencies in "plugin.path" as defined in the below worker properties file, now worker machine is ready, to start worker we need to invoke ./bin/connect-distributed.sh ./config/connect-distributed.properties, here connect-distributed.properties will have configuration for worker. The same thing has to be repeated in each machine where we need to run Kafka connect.
Now the worker java process is running in all machines, the woker config will have group.id property, the workers which have this same property value will be forming a group/cluster of workers.
Each worker process will expose rest endpoint (default http://localhost:8083/connectors), to launch/start a connector on the running workers, we need do http-post a connector config json, based on the given config the worker will start the connector and the number of tasks in the above group/cluster workers.
Example: Connect post,
curl -X POST -H "Content-Type: application/json" --data '{"name": "local-file-sink", "config": {"connector.class":"FileStreamSinkConnector", "tasks.max":"3", "file":"test.sink.txt", "topics":"connect-test" }}' http://localhost:8083/connectors

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse