I am working won a project involving Kafka Connect. We have a Kafka Connect cluster running on Kubernetes with some Snowflake connectors already spun up and working. The part we are having issues with now is trying to get the JMX metrics from the Kafka Connect cluster to report in Datadog. From my understanding of the Docs (https://docs.confluent.io/home/connect/monitoring.html#using-jmx-to-monitor-kconnect) the workers are already emitting metrics by default and we just need to find a way to get it reported to Datadog.
In our K8 Configmap we have these values set:
CONNECT_KAFKA_JMX_PORT: "9095"
KAFKA_JMX_PORT: "9095"
JMX_PORT: "9095"
I have included this launch script where we are setting the KAFKA_JMX_PORT env var:
export KAFKA_JMX_OPTS="-Dcom.sun.management.jmxremote=true -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Djava.rmi.server.hostname=<redacted> -Dcom.sun.management.jmxremote.rmi.port=${JMX_PORT}"
I’ve been looking online and all over Stackoverflow and haven’t actually seen an example of people getting JMX metrics reporting to Datadog and standing up a dashboard there so I was wondering if anyone had experience with this.
Firstly, your Datadog agents need to have Java/JMX integration.
Secondly, use Datadog JMX integration with auto-discovery, where kafka-connect must match the container name.
annotations:
ad.datadoghq.com/kafka-connect.check_names: '["jmx"]'
ad.datadoghq.com/kafka-connect.init_configs: '[{}]'
ad.datadoghq.com/kafka-connect.instances: |
[
{
"host": "%%host%%",
"port": 9095,
"conf": [
{
"include": {
"domain": "kafka.connect",
"type": "connector-task-metrics",
"bean_regex": [
"kafka.connect:type=connector-task-metrics,connector=.*,task=.*"
],
"attribute": {
"batch-size-max": {
"alias": "jmx.kafka.connect.connector.batch_size_max"
},
"status": {
"metric_type": "gauge",
"alias": "jmx.kafka.connect.connector.status",
"values": {
"running":0,
"paused":1,
"failed":2,
"destroyed":3,
"unassigned":-1
}
},
"batch-size-avg": {
"alias": "jmx.kafka.connect.connector.batch_size_avg"
},
"offset-commit-avg-time-ms": {
"alias": "jmx.kafka.connect.connector.offset_commit_avg_time"
},
"offset-commit-max-time-ms": {
"alias": "jmx.kafka.connect.connector.offset_commit_max_time"
},
"offset-commit-failure-percentage": {
"alias": "jmx.kafka.connect.connector.offset_commit_failure_percentage"
}
}
}
},
{
"include": {
"domain": "kafka.connect",
"type": "source-task-metrics",
"bean_regex": [
"kafka.connect:type=source-task-metrics,connector=.*,task=.*"
],
"attribute": {
"source-record-poll-rate": {
"alias": "jmx.kafka.connect.task.source_record_poll_rate"
},
"source-record-write-rate": {
"alias": "jmx.kafka.connect.task.source_record_write_rate"
},
"poll-batch-avg-time-ms": {
"alias": "jmx.kafka.connect.task.poll_batch_avg_time"
},
"source-record-active-count-avg": {
"alias": "jmx.kafka.connect.task.source_record_active_count_avg"
},
"source-record-write-total": {
"alias": "jmx.kafka.connect.task.source_record_write_total"
},
"source-record-poll-total": {
"alias": "jmx.kafka.connect.task.source_record_poll_total"
}
}
}
}
]
}
]
Related
I have an application that runs a code and at the end it sends an email with a report of the data. When I deploy pods on GKE , certain pods get terminated and a new pod is created due to Auto Scale, but the problem is that the termination is done after my code is finished and the email is sent twice for the same data.
Here is the JSON file of the deploy API:
{
"apiVersion": "batch/v1",
"kind": "Job",
"metadata": {
"name": "$name",
"namespace": "$namespace"
},
"spec": {
"template": {
"metadata": {
"name": "********"
},
"spec": {
"priorityClassName": "high-priority",
"containers": [
{
"name": "******",
"image": "$dockerScancatalogueImageRepo",
"imagePullPolicy": "IfNotPresent",
"env": $env,
"resources": {
"requests": {
"memory": "2000Mi",
"cpu": "2000m"
},
"limits":{
"memory":"2650Mi",
"cpu":"2650m"
}
}
}
],
"imagePullSecrets": [
{
"name": "docker-secret"
}
],
"restartPolicy": "Never"
}
}
}
}
and here is a screen-shot of the pod events:
Any idea how to fix that?
Thank you in advance.
"Perhaps you are affected by this "Note that even if you specify .spec.parallelism = 1 and .spec.completions = 1 and .spec.template.spec.restartPolicy = "Never", the same program may sometimes be started twice." from doc. What happens if you increase terminationgraceperiodseconds in your yaml file? – "
#danyL
my problem was that I had another jobs that deploy pods on my nodes with more priority , so it was trying to terminate my running pods but the job was already done and the email was already sent , so i fixed the problem by fixing the request and the limit resources on all my json files , i don't know if it's the perfect solution but for now it solved my problem.
Thank you all for you help
I am unable to run my ibm evote blockchain application in hyperledger faric.I am using IBM Evote in VS Code (v1.39) in ubuntu 16. When I start my local fabric (1 org local fabric), I am facing above error.
following is my local_fabric_connection.json file code
{
"name": "local_fabric",
"version": "1.0.0",
"client": {
"organization": "Org1",
"connection": {
"timeout": {
"peer": {
"endorser": "300"
},
"orderer": "300"
}
}
},
"organizations": {
"Org1": {
"mspid": "Org1MSP",
"peers": [
"peer0.org1.example.com"
],
"certificateAuthorities": [
"ca.org1.example.com"
]
}
},
"peers": {
"peer0.org1.example.com": {
"url": "grpc://localhost:17051"
}
},
"certificateAuthorities": {
"ca.org1.example.com": {
"url": "http://localhost:17054",
"caName": "ca.org1.example.com"
}
}
}
and following is the snapshot
Based off your second image it doesn't look like your 1 Org Local Fabric started properly in the first place (you have no gateways and for some reason your wallets aren't grouped together!).
If you teardown your 1 Org Local Fabric then start it again hopefully it'll work.
Basically what i wanted to achieve was to get MessageInPerSec metric for all the topic in kafka and to add the custom tag as topicName in the influx db so as to query based on the topic not based on the 'ObjDomain' definition, below are my JmxTrans configuration, (Note using the wildcard for the topic as to fetch the data MessageInPerSec JMX attribute for all the topic)
{
"servers": [
{
"port": "9581",
"host": "192.168.43.78",
"alias": "kafka-metric",
"queries": [
{
"outputWriters": [
{
"#class": "com.googlecode.jmxtrans.model.output.InfluxDbWriterFactory",
"url": "http://192.168.43.78:8086/",
"database": "kafka",
"username": "admin",
"password": "root"
}
],
"obj": "kafka.server:type=BrokerTopicMetrics,name=MessagesInPerSec,topic=*",
"attr": [
"Count",
"MeanRate",
"OneMinuteRate",
"FiveMinuteRate",
"FifteenMinuteRate"
],
"resultAlias": "newTopic"
}
],
"numQueryThreads": 2
}
]
}
which yields a result in the Influx DB as follow
[name=newTopic, time=1589425526087, tags={attributeName=FifteenMinuteRate,
className=com.yammer.metrics.reporting.JmxReporter$Meter, objDomain=kafka.server,
typeName=type=BrokerTopicMetrics,name=MessagesInPerSec,topic=backblaze_smart},
precision=MILLISECONDS, fields={FifteenMinuteRate=1362.9446063537794, _jmx_port=9581
}]
and create tag with whole objDomain spefcified in the config, but i wanted to have topic as a seperate tag that is something as follow
[name=newTopic, time=1589425526087, tags={attributeName=FifteenMinuteRate,
className=com.yammer.metrics.reporting.JmxReporter$Meter, objDomain=kafka.server,
topic=backblaze_smart,
typeName=type=BrokerTopicMetrics,name=MessagesInPerSec,topic=backblaze_smart},
precision=MILLISECONDS, fields={FifteenMinuteRate=1362.9446063537794, _jmx_port=9581
}]
was not able to find any adequate documentation for the same on how to use the wildcard value of topic as a separate tag using jmxtrans and writing it to the InfluxDB.
You just need to add the following additional properties for Influx output writer. Just make sure you are using the latest version of jmxtrans release. The docs are here: https://github.com/jmxtrans/jmxtrans/wiki/InfluxDBWriter
"typeNames": ["topic"],
"typeNamesAsTags": "true"
I have listed your config with the above modifications.
{
"servers": [
{
"port": "9581",
"host": "192.168.43.78",
"alias": "kafka-metric",
"queries": [
{
"outputWriters": [
{
"#class": "com.googlecode.jmxtrans.model.output.InfluxDbWriterFactory",
"url": "http://192.168.43.78:8086/",
"database": "kafka",
"username": "admin",
"password": "root",
"typeNames": ["topic"],
"typeNamesAsTags": "true"
}
],
"obj": "kafka.server:type=BrokerTopicMetrics,name=MessagesInPerSec,topic=*",
"attr": [
"Count",
"MeanRate",
"OneMinuteRate",
"FiveMinuteRate",
"FifteenMinuteRate"
],
"resultAlias": "newTopic"
}
],
"numQueryThreads": 2
}
]
}
I'd like to reference an EC2 Container Registry image in the Elastic Beanstalk section of my Cloud Formation template. The sample file references an S3 bucket for the source bundle:
"applicationVersion": {
"Type": "AWS::ElasticBeanstalk::ApplicationVersion",
"Properties": {
"ApplicationName": { "Ref": "application" },
"SourceBundle": {
"S3Bucket": { "Fn::Join": [ "-", [ "elasticbeanstalk-samples", { "Ref": "AWS::Region" } ] ] },
"S3Key": "php-sample.zip"
}
}
}
Is there any way to reference an EC2 Container Registry image instead? Something like what is available in the EC2 Container Service TaskDefinition?
Upload a Dockerrun file to S3 in order to do this. Here's an example dockerrun:
{
"AWSEBDockerrunVersion": "1",
"Authentication": {
"Bucket": "my-bucket",
"Key": "mydockercfg"
},
"Image": {
"Name": "quay.io/johndoe/private-image",
"Update": "true"
},
"Ports": [
{
"ContainerPort": "8080:80"
}
],
"Volumes": [
{
"HostDirectory": "/var/app/mydb",
"ContainerDirectory": "/etc/mysql"
}
],
"Logging": "/var/log/nginx"
}
Use this file as the s3 key. More info is available here.
I'm using Google Container Engine and I'm noticing entries like the following in my logs
{
"insertId": "1qfzyonf2z1q0m",
"internalId": {
"projectNumber": "1009253435077"
},
"labels": {
"compute.googleapis.com/resource_id": "710923338689591312",
"compute.googleapis.com/resource_name": "fluentd-cloud-logging-gke-gas2016-4fe456307445d52d-worker-pool-",
"compute.googleapis.com/resource_type": "instance",
"container.googleapis.com/cluster_name": "gas2016-4fe456307445d52d",
"container.googleapis.com/container_name": "kubedns",
"container.googleapis.com/instance_id": "710923338689591312",
"container.googleapis.com/namespace_name": "kube-system",
"container.googleapis.com/pod_name": "kube-dns-v17-e4rr2",
"container.googleapis.com/stream": "stderr"
},
"logName": "projects/cml-236417448818/logs/kubedns",
"resource": {
"labels": {
"cluster_name": "gas2016-4fe456307445d52d",
"container_name": "kubedns",
"instance_id": "710923338689591312",
"namespace_id": "kube-system",
"pod_id": "kube-dns-v17-e4rr2",
"zone": "us-central1-f"
},
"type": "container"
},
"severity": "ERROR",
"textPayload": "I0718 17:05:20.552572 1 dns.go:660] DNS Record:&{worker-7.default.svc.cluster.local. 6000 10 10 false 30 0 }, hash:f97f8525\n",
"timestamp": "2016-07-18T17:05:20.000Z"
}
Is this an actual error or is the severity incorrect? Where can I find the definition for the struct that is being printed?
The severity is incorrect. This is some tracing/debugging that shouldn't have been left in the binary, and has been removed since 1.3 was cut. It will be removed in a future release.
See also: Google container engine cluster showing large number of dns errors in logs