Should Zookeeper be used to report process status? - apache-zookeeper

In my case, Process P1 spawns P2, P3, P4 and many other processes. These child processes could be on other machines. They can be spawned using orchestration systems such as Kubernetes. After spawning the process, P1 wants to know the status of P2 and the other processes. Should ZooKeeper be used so that P2 can send a heartbeat and other status messages to P1? Is that one of the use cases for Zookeeper?

For the case of single node, I guess not. The processes you spawned are all in a single machine. There is no need to use ZK (which is typically used to maintaining state or metadata for cluster)
You could use IPC (e.g., signal, socket) to check the child processes's status in parent process.
Updated Here
If the processes across machines, we could use ZK (use ephemeral and sequential nodes) to maintain group membership, which is a typical usage of ZK, and you could refer to the below link for more details.
By the way, we do not need to send heartbeat ourselves, when a client connects to ZK, a session is established (ZK client library would automatically send a heartbeat after the session has been idle for session timeout / 3).
ZooKeeper: Wait-free coordination for Internet-scale systems, Section 2.4 Group Membership
and Apache Curator Group Membership

Related

multiple connectors in kafka to different topics are going to same node

I have created two kafka connectors in kafka-connect which use the same Connector class but have different topics they listen to.
When I launch the process on my node, both the connectors end up creating tasks on this process. However, I would like one node to only handle one connector/topic. How can I limit a topic/connector to a single node? I don't see any configuration in connect-distributed.properties where a process could specify which connector to use.
Thanks
Kafka Connect in distributed mode can run as a cluster of one or more workers. Each worker can run multiple tasks. Depending on how many connectors and workers you are running, you will have tasks running on the same worker. This is deliberate - the idea is that Kafka Connect will manage your tasks and workload for you, across the available workers.
If you want to isolate your processing you can run Kafka Connect as separate Connect clusters, either on the same machine (make sure to use different REST ports), or separate machines.
For more info, see architecture and config for steps to configure separate clusters. Note that a cluster can actually be a single worker, but then you don't have any redundancy in the event of failure.

Kafka: Killing a consumers connection

With EMS it is possible to see all connections to a particular EMS server, and kill any unwanted connections.
As far as I can tell, I have an unwanted process somewhere that is subscribing to my Kafka topic with the same consumer name as my process.
Therefore, my process is not receiving any messages and I don't know where this "rogue" process is located.
Is there any command I can run to kill such connections?
I am running Kafka 0.9
If you use the Confluent Control Center you can see each consumer group and all the clients in each consumer group. That might help you identify the "rogue" consumer.
Otherwise you might have to just pick a new group id so it won't matter what the other client is subscribing to (because it will be in another consumer group).
It sounds like you should also configure some security and ACLs so that rogue apps can't authenticate and subscribe to topics they are not allowed to access.

how does storm leverage zookeeper for resilience?

from the description of Storm, it is based on Zookeeper, and whenever a worker node dies, it can be recovered and get its state from zookeeper.
Does any one know how that is done? specifically
how does the failed worker node get recovered?
how does zookeeper keep its state. AFAIK, each zone can only store a small amount to data.
Are you talking about workers or supervisors? Each storm worker node runs a storm "supervisor" daemon which manages worker processes.
You need to setup supervision (something like daemontools or supervisord, which is unrelated to storm supervisors) to monitor and restart nimbus and supervisor daemons in case they take an exception. Both nimbus and supervisors are fail fast and stateless. Zookepeer is used for coordination between nimbus and supervisors along with holding state information, which is in zookeeper or on disk so as to not lose state information.
State data isn't large and Zookeeper should be run supervised too.
Check this for more fault tolerance details.

JBOSS messaging replicated queue

I am using JBOSS messaging in the following way:
1) Two JBOSS instances using 'all' config (i.e. clustered config)
2) One replicated queue created on each JBOSS instance wiht same JNDI name (clustered = true)
3) one producer attach locally to the queue on each instance (i.e. both the producer on both the nodes keep on adding messages to this replicated queue)
4) One JBOSS instance is marked as "consumer node" and queue message consumer is started on only this node (i.e. messages will be consumed on only one node). There is a logic which will decide which JBOSS instance is marked as "consumer node"
5) PostOffice used is clustered
6) server peer configured to not enforce message sequencing.
7) produced messages are non-persistent (deliveryMode = NON_PERSISTENT)
But I am facing problem with this. Messages produced on "non consumer node" do not get replicated to the queue on the "consumer node" and hence not available for consumption.
I enabled the logging and checked that postoffice finds two queues but only delivers to the local queue as it discovers that the remote queue is recoverable.
Any idea how to set it working?
FYI: I believe a message can be delivered to only one queue (local or remote). So, I want only one queue which is distributed but I am currently getting 2 different distributed queues (however their JNDI name is same). Is this a problem? If yes, how to solve this? Weblogic provides the option of creating a queue on admin server and thus a shared queue is possible there. What is the similar mechanism in JBOSS messaging? Or should I need to approach this problem as 2 queues which are synchronized. If yes, then how to achieve synchronization between them?
Thanks for taking out sometime to help me!!
Regards

MSMQ Cluster losing messages on failover

I've got a MSMQ Cluster setup with nodes (active/passive) that share a drive.
Here are the tests I'm performing. I send messages to the queue that are recoverable. I then take the MSMQ cluster group offline and then bring it online again.
Result: The messages are still there.
I then simulate failover by moving the group to node 2. Moves over successfully, but the messages aren't there.
I'm sending the messages as recoverable and the MSMQ cluster group has a drive that both nodes can access.
Anyone?
More Info:
The Quorum drive stays only on node 1.
I have two service/app groups. One MSMQ and one that is a generic service group.
Even more info:
When node 1 is active, I pump it full of messages. Failover to node 2. 0 message in the queue for 02. Then I failover back to 01, and the messages are in 01.
You haven't clustered MSMQ or aren't using clustered MSMQ properly.
What you are looking at are the local MSMQ services.
http://blogs.msdn.com/b/johnbreakwell/archive/2008/02/18/clustering-msmq-applications-rule-1.aspx
Cheers
John
==================================
OK, maybe the drive letter being used isn't consistently implemented.
What is the storage location being used by clustered MSMQ?
If you open this storage location up in Explorer from Node 1 AND Node 2 at the same time, are the folder contents exactly the same? If you create a text file via Node 1's Explorer window, does it appear after a refresh in Node 2's Explorer window?