Can we use zookeper as configuration management? - apache-zookeeper

I'm working with distributed tools quite some time now. I understand zookeeper and the importance of maintaining the metadata and so.
Would like to know if Zookeeper could be used as a configuration management? I mean I set some key-value pairs like I do in .properties and could be referred from Kafka or Nifi or elsewhere(with in the distributed environment).
In simple terms, could I use zookeeper as key-value pair map? Help me understand.

May be what you need is a Zookeeper watch.
A watch is mechanism for sending notification to the Zookeeper client when a change happens on the value of a ZNode.
To simplify the implementation, you can use one of the Curator recipes to get notification on value change/deletion of a Zookeeper node see example here

Related

Starting RabbitMQ in K8s with Pre-defined Topics

This might be a dumb question but I've never used RabbitMQ in K8s before and am trying to figure out the best way to accomplish what I'm about to ask. I also haven't been able to find as part of my searching any results that address this particular question.
I have a series of deployments/pods, of which a few need to communicate using RabbitMQ via a topic (or topics). Is there any way to spin up a RabbitMQ cluster such that, when it starts up, it auto-creates the specified topics, exchanges, etc.?
I've yet to see any way to actually do this, which leads me to believe that the "best" way to accomplish this is to use an init container (or some other startup script elsewhere) that programmatically uses the RabbitMQ API to create and configure the topics. I could use pre-determined topic names to avoid having to update things at runtime.

Monitor if Kafka is up?

I need to simply monitor if my Kafka cluster is up. Occasionally the machines running Kafka were shutdown. I want to send an email alert if the cluster is not available.
I can create a producer and consumer to send and receive dummy messages periodically. Is there a simpler way to do it?
You can use https://github.com/obsidiandynamics/kafdrop
It won't send you emails, but it much easier than send dummy messages
Actually knowing if cluster is up is not so easy at all, there is discussion with community what is the best practice to decide if kafka cluster is up and active but there is no current good way to get this information, as kafka architecture is distributed system, you might have big clusters and while one or more brokers are down , still having your cluster to give high available service, not effecting the integrity of data. Also you might have problems with one topic while on other topics it might work fine.
One suggestion I read which might give you the most certain approach is to produce "dummy" msgs to your applicative topics, and "skip" these msgs on consumption, that guarantee you that your application would work. I don't like this approach very much as it requires to "send junk to your main topics"
Other approaches are like you say "produce/consume to/from test/healthcheck topic" but it is might not give full guarantee that your application would work, this is a lot like select from dummy in other db approaches... if for them is good enough....
Another suggestion is to use AdminClient to read the metrics of cluster, if metrics are provided that usually means the cluster is healthy , also not very good guarantee...
I asked in comment which language are you using, maybe you are using something like spring which has HealthIndicator to check component status, but for your case it would be little different.
First of all, you should know that Kafka by default should be High
Available, so while building the cluster you should follow the bold
lines of best practices, you should ensure that you have replicas of
machines. This is good assumption that will make you satisfied over implementing all of this.
But, if you want to check health of a cluster, you can use admin process, you can use AdminClient, with help of some utilities; you can check list of topics, groups, etc that you have. But this not 100% guarantee for you although it is good workaround.
You can do that using as you mentioned periodic scheduler, and send email based on the findings you get. But again this is not the ideal solution, and HA cluster infrastructure should save lots of time for you if you build it correctly from the beginning.

Redis's message broker, when to use

I have been reading about message broker lately and recently found Redis also has its own message broker just like RabbitMQ, Kafka, beanstalk etc. Redis also has pub/sub mechanism built-in.
I am also a hardcore socket.io user, so what I am confused about it:
Is Redis's message broker works in the similar manner as others like RabbitMQ, Kafka, beanstalk?
When to use a Message Broker vs Pub/Sub vs socket.io? Please share example if possible.
Thanks in advance
I have done R&Ds by using the Kafka message system and Redis.
Kafka is a distributed, partitioned, and replicated commit log service that provides a messaging functionality as well as a unique design.
Please refer to this article
Redis is a bit different from Kafka in terms of its storage and various functionalities. At its core, Redis is an in-memory data store that can be used as a high-performance database and a cache. It is perfect for real-time data processing.
The various data structures supported by Redis are strings, hashes, lists, sets, and sorted sets.

Leader Election : Consul vs ZooKeeper

We are choosing the best option for implementing a leader election to achieve high availability. Our goal is to have only a single instance active at any given time. We are using Spring Boot to develop application which is getting deployed by default on Tomcat. Would be great to hear your opinion about the following options:
Does Zookeeper provide better CP than Consul ?
View on maintenance/complexity ?
ZooKeeper is based on ZAB & Consul is based on Raft. Both are very similar atomic broadcast algorithms at a high level. So, as far as "Consistancy" of CAP (which is actually linearizability, a very strong form of consistancy) is concerned, both will provides similar guarantees. Both of them have linearizable writes to quorum (majority). The other nodes (not in quorum) may lag in updates by default resulting in stale reads. This is done this way because complete linearizability makes things slow and many applications are good with a little stale reads. However, if that is not acceptable in a particular usecase, it is always possible to use sync call before read in ZooKeeper and Consistent mode in Consul to acheive complete linearizability.
For service discovery, however, Consul seems to provide higher level constructs that are not out-of-the-box in ZooKeeper.
In terms of leader election use case, both can be used.
But given that ZooKeeper is used by many top level apache projects and it is also older than the Raft and therefore Consul, I hope it will have better community support and documentation. Also the Apache documentation providing various recepes is great.
Finally, if you go with ZooKeeper, you may also want to use Apache Curator which provides higher level APIs on top of ZooKeeper.

Kafka instead of Zookeeper for cluster management

I am writing a clustered application sitting on top of Kafka -- it uses Kafka exclusively for interprocess communications and coordination. I could use Zookeeper to manage my cluster -- but it would not be very difficult to use Kafka topics to manage the cluster. And the more I think about it, other than for historical reasons, it seems like Kafka could drop Zookeeper and just use a topic-based solution
For example, there could be a special topic or topics in Kafka where you publish all of the same data currently kept track of in Zookeeper. Brokers, Topics, Partitions, Leaders, etc -- seems like this is just as easily tracked via Kafka topics as via Zookeeper.
I know in Kafka 0.9.0 there's some movement away from Zookeeper, more towards this model, and remember my question is less about Kafka development or more me trying to figure out which direction to go in my application.
I'm not asking for an opinion -- what I want to know is are there any specific functions provided by Zookeeper that are going to be difficult with a Kafka/topic-based approach to coordination. But I can't think of anything.
Even heartbeat monitoring -- which was the reason I started looking at Zookeeper in the first place -- you could have a client connection topic, and clients could publish to it when they join the cluster, publish heartbeats at a given interval, and publish as they leave it.
Let us start from a space eyed view: You have two distributed
systems which store data. Zookeeper organizes it's data in nodes in some kind
of directory like structure. Kafka stores messages within topics.
From a bird eye view kafka is build for high-throughput and scalability while one of zookeepers
main design goal is consistency. Zookeeper is mean to be a a Distributed Coordination Service for
Distributed Applications while Kafka can be thought as a distributed commit log.
So the answer to your question is surprisingly: 'It depends'. For coordinating
a distributed system I would use zookeeper: Thats what it was build for. You could
do this also with kafka but there are couple of things which needs to be done
manualy which comes out of the box if you are using zookeeper.
Some examples:
Consistency: The ZK-Client can choose if he needs strong or a eventual consistency
Ephemeral nodes: Together with ZK-Watches a great thing to react on failing services
Sequential Consistency: It's not granted that you recieve the kafka-message in the order you wrote it to the broker (it's only granted that messages within a partion are ordered)
ACLs: Never used it but its at least something which is not offered out of the box by kafka
Sequence Nodes
A pretty nice overview about what you can do with zookeeper are the zookeeper-recipes: https://zookeeper.apache.org/doc/trunk/recipes.html
[EDIT]: Heartbeating an application using kafka is of course possible. But ephemeral nodes in zookeeper are in my eyes the easier option.
This is currently being worked on in scope of KIP-500.