High CPU utilization KSQL db - apache-kafka

We are running KSQL db server on kubernetes cluster.
Node config:
AWS EKS fargate
No of nodes: 1
CPU: 2 vCPU (Request), 4 vCPU (Limit)
RAM: 4 GB (Request), 8 GB (Limit)
Java heap: 3 GB (Default)
Data size:
We have ~11 source topic with 1 partition, some one them having 10k record few has more than 100k records. ~7 sink topic but to create those 7 sink topic have ~60 ksql table, ~38 ksql streams & ~64 persistent queries because of joins and aggregation. So heavy computation.
KSQLdb version: 0.23.1 and we are using confluent official KSQL docker image
The problem:
When running our KSQL script we are seeing spike in CPU to 350-360% and memory 20-30%. And when that happen kubernetes restarting the server instance. Which is resulting ksql-migration to fail.
Error:
io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection
refused:
<deployment-name>.<namespace>.svc.cluster.local/172.20.73.150:8088
Error: io.vertx.core.VertxException: Connection was closed
We have 30 migration files, and each file has multiple table and stream creation.
And its always failing on v27.
What we have tried so far:
Running it alone. And in that case it pass with no error.
Increase initial CPU to 4 vCPU but no change in CPU utilization
Had 2 nodes with 2 partition in kafka, but that also had same issue with addition few data columns having no data.
So something is not right in our configuration or resource allocation.
What's the standard of deployment for KSQL in kubernetes? maybe its not meant for kubernetes.

Related

kafka with jbod disks + what the max numbers of disks that we can set on kafka?

we are planing to build 17 kafka machines
since we need a huge storage , we are thinking to use jbod disks for each kafka machine
so the plan is like this
number of kafka machines -17
kafka version - 2.7
number of disks on jbod - 44 disks ( when the size of each disk is 2.4T )
so just to give here more perspective from kafka side and kafka configuration
in server.properies file we need to set the logs.dir with 44 disks
based on that we are just thinking if huge number of disks like 44 , is maybe higher then threshold
actually we search a lot to find some useful post that talk about this
but without success
so lets summary:
what is the limit about number of disks ( jbod disks ) that we can connect to kafka machine?

kafka broker with inconsistent data - NotLeaderForPartitionError

We have a 13 node Kafka cluster and each broker has multiple disks and all topics have replication factor 3. 
Broker 6 had a hardware issue and required a complete OS reload (Linux) and 2 disk replacements. Now I installed Kafka again on this node with the same broker id 6 but started to get an exception from all producers -
[Error 6] NotLeaderForPartitionError: ProduceResponsePayload(topic=u'amyTopic', partition=7, error=6, offset=-1)
I am assuming that since I am using the same broker ID, it (zookeeper? or controller broker?) is expecting data in the disk which got replaced or some other meta info that might get wiped out during OS reload.
What are the options I have to add this node back to the cluster without much disturbance to the cluster and without data loss? Should I use a new broker ID for this node and then repartition the data of every topic as we do after adding a new node? We have a lot of data (a few hundred TB) in the cluster and I am trying to avoid huge data movement caused by the repartition of data and it may choke the entire cluster. Please suggest.

Does Zookeeper require SSD disks for Apache Kafka Clusters?

we want to install kafka cluster and 3 zookeeper servers
kafka should use the zookeeper servers in order to save the metadata on the zookeeper servers
ZK Data and Log files should be on disks, which have least contention from other I/O activities. Ideally the ZK data and ZK transaction log files should be on different disks, so that they don't contend for the IO resource.
Note that, it isn't enough to just have partitions but they have to be different disks to ensure performance.
So dose zookeeper server must use SSD disks ?
if yes what are the minimum requirements for zoo disks as IO ,etc.
Confluent recommends the following configuration when running Zookeeper in Production environments:
Disks
Disk performance is vital to maintaining a healthy ZooKeeper cluster.
Solid state drives (SSD) are highly recommended as ZooKeeper must have
low latency disk writes in order to perform optimally. Each request to
ZooKeeper must be committed to to disk on each server in the quorum
before the result is available for read. A dedicated SSD of at least
64 GB in size on each ZooKeeper server is recommended for a production
deployment. You can use autopurge.purgeInterval and
autopurge.snapRetainCount to automatically cleanup ZooKeeper data and
lower maintenance overhead.

FlinkKafkaConsumer010 can not consume data with full parallelism

I have a Kafka(0.10.2.0) cluster with 10 partitions(with 10 individual kafka server ports) on one machine which holded 1 topic named "test"
And have a Flink Cluster with 294 task slots on 7 machines, a Flink app with parallelism 250 runs on this Flink Cluster using FlinkKafkaConsumer010 to consume data from Kafka Server with one group id "TestGroup".
But I found that there are only 2 flink ips with 171 tcp connection has been establised with kafka cluster, and more worse, only 10 connections are transfer data, only these 10 connections had data transfered from beginning to end.
I have checked this Reading from multiple broker kafka with flink, but not work in my case.
Appreciated for any information, thank you.

How many is the minimum server composition of HBase?

How many is the minimum server composition of HBase?
Full-distributed, use sharding, but not use Hadoop.
It's for production environment.
I'm looking forward to explain like this.
Server 1: Zookeeper
Server 2: Region server
... and more
Thank you.
The minimum is one- see pseudo-distributed mode. The moving parts involved are:
Assuming that you are running on HDFS (which you should be doing):
1 HDFS NameNode
1 or more HDFS Secondary NameNode(s)
1 or more HDFS DataNode(s)
For MapReduce (if you want it):
1 MapReduce JobTracker
1 or more MapReduce TaskTracker(s) (Usually same machines as datanodes)
For HBase itself
1 or more HBase Master(s) (Hot backups are a good idea)
1 or more HBase RegionServer(s) (Usually same machines as datanodes)
1 or more Thrift Servers (if you need to access HBase from the outside the network it is on)
For ZooKeeper
3 - 5 ZooKeeper node(s)
The number of machines that you need is really dependent on how much reliability you need in the face of hardware failure and for what kind of nodes. The only node of the above that does not (yet) support hot failover or other recovery in the face of hardware failure is the HDFS NameNode, though that is being fixed in the more recent Hadoop releases.
You typically want to set the HDFS replication factor of your RegionServers to 3, so that you can take advantage of rack awareness.
So after that long diatribe, I'd suggest at a minimum (for a production deployment):
1x HDFS NameNode
1x JobTracker / Secondary NameNode
3x ZK Nodes
3x DataNode / RegionServer nodes (And if you want to run MapReduce, TaskTracker)
1x Thrift Server (Only if accessing HBase from outside of the network it is running on)