What is best way to manage a Druid cluster? - druid

Is there any tool/application available to manage a Druid cluster like YARN.
Any suggestion for how to manage a Druid cluster? YARN is not available for Druid?
Imply seems not to manage like YARN cluster management.

It depends what you call management of the cluster. But with HDP distribution, we use Ambari to manage Druid cluster. By managing I mean start/stop nodes, manage configs and related rolling updates.
Here is a 30 min video about how to use HDP to run Druid.

Related

Way to make sure Strimzi Kafka cluster replicas in different data center?

I am working with Apache Kafka and looking to start using Strimzi Kafa. But I am having trouble finding if there is a way to make sure replicas are in a separate data centers. I know kafka has strech clusters which has a single cluster that can be across multiple data centers, but Strimzi doesn't support that from what I can tell.
Is there any way to do this with Strimzi Kafka?
It always depends what you mean with different DC. It could mean different Availability Zones in AWS which will be close to each other and have a good latency. Or some DCs on different continents which would be something Apache Kafka cannot handle. Or maybe something in between.
In general, Strimzi lets you deploy a Kafka cluster within a single Kubernetes cluster. But as long as the latency is good enough, it does not really care whether it is all in one DC or in multiple DCs.
Strimzi gives you too tools how to control this:
You can configure the pod scheduling to distribute your Kafka broker pods across any zones of your cluster
It lets you configure the Rack awareness which configured the broker.rack option in Apache Kafka to make sure your replicas are distributed across the racks / zones.
You can also use Cruise Control to automatically reassign any replicas which would not be distributed.

Best practise and methods for Kafka parameters and monitoring

I am going to implement a Snowflake Kafka connector with the continuous ingestion of data to target database snowflake.
What are the best practices for :
Kafka for its clusters
Kafka and its related parameters
Monitoring resources
Kafka for its clusters
Run at least 3 brokers
Kafka and its related parameters
That's too broad and has nothing to do with running a Connect cluster or implementing one. The defaults are mostly fine. You can find the production recommendations in the Kafka documentation.
Monitoring resources
Use JMX. https://docs.confluent.io/platform/current/kafka/monitoring.html
going to implement a Snowflake Kafka connector
Snowflake already has a connector... I'd start by forking rather than making your own

What is the difference to deploy Kafka with Debezium on Kubernetes clusters in terms of managing scalability?

I am going to start a big open source project on github. I'd like to create sсalable failure tolerate cloud order matching engine with as high TPS as possible. I chose to use event-driven microservice architecture backed by kafka topics. Services are to be written with Go. It is supposed that engine is to be deployed on kubernetes clusters.
I know that there are Bitnami and Confluent zoo/kafka helm charts. Bitnami kafka chart is working quit well for me on Minikube.
Also I'd like to use Debezium connect.
So I would be very grateful for any experience of deploying and using Kafka with Debezium on Kubernetes clusters especially in terms of scalability. Thanks.

High availability configuration for Kafka Connect Mongodb source connector

I've been looking for specific information about high availability deployments of Kafka Connect connectors but found nothing.
In my case I have a Mongodb source connector deployed using the Confluent Helm chart. This chart supports setting the number of replicas.
Is setting replicaCount to a value >1 enough or there are other factors to consider (tasks.max, ...)?
If you want highly available workers, then it's pod replicas, yes.
If you want distributed tasks across workers, that's tasks.max; if one worker dies, then tasks get rebalanced

Best Practices for Kafka Cluster Deployment Configuration?

I'm asking for general best practices here:
If I want a five node cluster, do all five nodes run the Confluent Platform Umbrella Packages that include Zookeeper, Kafka, schema-registry?
Is it ever recommended to run the zookeper cluster on separate servers from the Kafka cluster?
If I want to run the Kafka Connect distributed worker, do I run that on all cluster nodes? Do I ever want to run on separate servers? Is Docker recommended for this or is Docker unnecessary?
With Kafka Streaming apps, should they be run on all cluster nodes? Should they be dockerized? Should they ever run on separate nodes?
Is something like Mesos recommended?
It is a best practice to run Kafka Brokers on dedicated servers (or virtual servers). The same is true of Zookeeper.
All the other components of the Confluent Platform can run colocated on common servers or on separate machines.
You would typically run only one Schema Registry (or two if you want fault tolerance). They can run on any machine that can connect back to the Kafka Brokers.
Kafka Connect distributed workers only need to run on machines that you want to host Kafka Connectors. They just need to be able to connect back to the Kafka Brokers.
Kafka Streams apps can run anywhere you want so long as they can connect back to the Kafka Brokers.
All components can run inside docker containers or without docker.
You can use whatever microservices or data center resource management tools you want (or none at all) - it is your choice.