I am trying to add a template to my monitoring system to monitor Mirror Maker 2.0
From the documentation i know these metrics are supplied in jmx
# MBean: kafka.connect.mirror:type=MirrorSourceConnector,target=.
([-.w]+),topic=([-.w]+),partition=([0-9]+)
record-count # number of records replicated source -> target
record-age-ms # age of records when they are replicated
record-age-ms-min
record-age-ms-max
record-age-ms-avg
replication-latecny-ms # time it takes records to propagate source->target
replication-latency-ms-min
replication-latency-ms-max
replication-latency-ms-avg
byte-rate # average number of bytes/sec in replicated records
If i wanted to monitor the lag of the replication between clusters, is it supposed to be inferred from record-age-ms? (ie if that age ms continues to grow then the delay continues to grow?)
Thanks
Related
We are running KSQL db server on kubernetes cluster.
Node config:
AWS EKS fargate
No of nodes: 1
CPU: 2 vCPU (Request), 4 vCPU (Limit)
RAM: 4 GB (Request), 8 GB (Limit)
Java heap: 3 GB (Default)
Data size:
We have ~11 source topic with 1 partition, some one them having 10k record few has more than 100k records. ~7 sink topic but to create those 7 sink topic have ~60 ksql table, ~38 ksql streams & ~64 persistent queries because of joins and aggregation. So heavy computation.
KSQLdb version: 0.23.1 and we are using confluent official KSQL docker image
The problem:
When running our KSQL script we are seeing spike in CPU to 350-360% and memory 20-30%. And when that happen kubernetes restarting the server instance. Which is resulting ksql-migration to fail.
Error:
io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection
refused:
<deployment-name>.<namespace>.svc.cluster.local/172.20.73.150:8088
Error: io.vertx.core.VertxException: Connection was closed
We have 30 migration files, and each file has multiple table and stream creation.
And its always failing on v27.
What we have tried so far:
Running it alone. And in that case it pass with no error.
Increase initial CPU to 4 vCPU but no change in CPU utilization
Had 2 nodes with 2 partition in kafka, but that also had same issue with addition few data columns having no data.
So something is not right in our configuration or resource allocation.
What's the standard of deployment for KSQL in kubernetes? maybe its not meant for kubernetes.
I use distributed jmeter for load tests in Kubernetes. To maximize the number of concurrent threads I use several Jmeter server instances.
My test plan has 2000 users, so I have 10000 concurrent users with 5 Jmeter servers. Each user send 1 request per second to Kafka. This runs without any problems.
But if I increase the number of server instances up to 10, Jmeter gets a lot of errors when sending requests and is not able to send the required rate of requests per second.
Is there a way to use more than 5 server instances in Jmeter (my cluster has 24vCPUs and 192gb ram)?
Theoretical maximum value of slaves is very high, you can have as many as 2 147 483 647 slaves which is a little bit more than 5.
So my expectation is that the problem is somewhere else, i.e. there is a maximum number of connections per IP defined or the broker is running out of resources
We cannot give meaningful advices unless we have more details, in the meantime you can check:
jmeter.log files for master and slaves
response data
kafka logs
resource consumption on kafka broker(s) side, it can be done using i.e. JMeter PerfMon Plugin
I am running my Kafka connect for Elastic search in distributed mode .
Currently i have 2 EC2 instance (Instance type t2.2xlarge)
Number of vCPUs 8
Memory 32 Gb
I am running kafka connect on above instance type with max task as 2 .
I am planning to put heavy puts from producers which will put records into ElastciSearch using Kafka connect .
Heavy puts mean 10000 records per seconds .
Keeping this in mind how should i create Kafka connect .
For example
how many task should be required to handle this so that records can go Faster into ES?
Is 2 EC2 instance is enough for this load or do i need more ?
How many task is good to create for one EC2 instance ?
Having one bigger EC2 is better or having multiple EC2 with smaller instance is better ?
How i can confirm that all the records are de qued from Kafka topic to ES using Kafka connect ?
How shall i benchmark my Kafka connect performance ?
I am not using any schema registry as of now .
Please suggest
i have 2 EC2 instance
So you can only run 2 workers. Add more (in different AZs) for better fault tolerance. You need to add CPU and memory monitoring to know if you should add more instances.
running kafka connect on above instance type with max task as 2 .
You can have up to as many tasks as input topic-partitions.
10000 records per seconds
Kafka can certainly handle that. You need to benchmark your ES indexers separately.
How i can confirm that all the records are de qued from Kafka topic to ES using Kafka connect
You would monitor consumer group lag, same as any other consumer task
Having one bigger EC2 is better or having multiple EC2 with smaller instance is better
"Better" is relative. If you want performance over cost, then pick larger instances and allocate more heap space.
I'm expecting our influx in to Kafka to raise to around 2 TB/day over a period of time. I'm planning to setup a Kafka cluster with 2 brokers (each running on separate system). What is the recommended hardware configuration for handling 2 TB/day ?
To use as a base you could look here: https://docs.confluent.io/4.1.1/installation/system-requirements.html#hardware
You need to know the amount of messages you get per second/hour because this will determine the size of your cluster. For HD, it's not necessary to get SSD because the system will use RAM to store the data first. Still you could need quite speed hard disk to ensure that the flushing process of the queue will not slow your system.
I would also recommend to use 3 kafka broker and 3 or 4 zookeeper server too.
How many is the minimum server composition of HBase?
Full-distributed, use sharding, but not use Hadoop.
It's for production environment.
I'm looking forward to explain like this.
Server 1: Zookeeper
Server 2: Region server
... and more
Thank you.
The minimum is one- see pseudo-distributed mode. The moving parts involved are:
Assuming that you are running on HDFS (which you should be doing):
1 HDFS NameNode
1 or more HDFS Secondary NameNode(s)
1 or more HDFS DataNode(s)
For MapReduce (if you want it):
1 MapReduce JobTracker
1 or more MapReduce TaskTracker(s) (Usually same machines as datanodes)
For HBase itself
1 or more HBase Master(s) (Hot backups are a good idea)
1 or more HBase RegionServer(s) (Usually same machines as datanodes)
1 or more Thrift Servers (if you need to access HBase from the outside the network it is on)
For ZooKeeper
3 - 5 ZooKeeper node(s)
The number of machines that you need is really dependent on how much reliability you need in the face of hardware failure and for what kind of nodes. The only node of the above that does not (yet) support hot failover or other recovery in the face of hardware failure is the HDFS NameNode, though that is being fixed in the more recent Hadoop releases.
You typically want to set the HDFS replication factor of your RegionServers to 3, so that you can take advantage of rack awareness.
So after that long diatribe, I'd suggest at a minimum (for a production deployment):
1x HDFS NameNode
1x JobTracker / Secondary NameNode
3x ZK Nodes
3x DataNode / RegionServer nodes (And if you want to run MapReduce, TaskTracker)
1x Thrift Server (Only if accessing HBase from outside of the network it is running on)