When I'm starting a connector in distributed mode (connect-runtime v1.0.0), there are several configuration values that are mandatory. I'm speaking of values like:
offset.storage.topic
offset.storage.partitions
key.converter
config.storage.topic
config.storage.replication.factor
rest.port
status.storage.topic
key.converter.schemas.enable
value.converter.schemas.enable
internal.value.converter
internal.key.converter
internal.key.converter.schemas.enable
internal.value.converter.schemas.enable
status.storage.partitions
status.storage.topic
value.converter
offset.flush.interval.ms
offset.storage.replication.factor
...
Once the connector is started with meaningful values for those properties, it works as expected. But at startup, the log get's flooded with entries like
WARN o.a.k.c.admin.AdminClientConfig.logUnused - The configuration 'offset.storage.topic' was supplied but isn't a known config.
for all above mentioned, mandatory configuration values.
There are three config classes which are logging these warnings:
org.apache.kafka.clients.consumer.ConsumerConfig
org.apache.kafka.clients.admin.AdminClientConfig
org.apache.kafka.clients.producer.ProducerConfig
Since now I haven't found a reason for this behavior. What's missing here or what is wrong, that causes this warnings? Do I have to worry about this warnings?
There's a ticket on this issue, still open as of Nov'19:
https://issues.apache.org/jira/browse/KAFKA-7509
When running Connect, the logs contain quite a few warnings about "The configuration '{}' was supplied but isn't a known config." This occurs when Connect creates producers, consumers, and admin clients, because the AbstractConfig is logging unused configuration properties upon construction. It's complicated by the fact that the Producer, Consumer, and AdminClient all create their own AbstractConfig instances within the constructor, so we can't even call its ignore(String key) method.
And similar issue exists for KafkaStreams:
https://issues.apache.org/jira/browse/KAFKA-6793
Judging by this thread, it doesn't seem to matter
Related
How do we configure value.subject.name.strategy based on https://docs.confluent.io/platform/current/schema-registry/connect.html#json-schema
I put various configuration names in worker.properties but it seems that nothing is recognized by kafka sink connector. As you can see in the logs, it's always defaulted to topicNameStrategy.
[2022-11-21 16:40:23,663] WARN The configuration 'value.converter.subject.name.strategy' was supplied but isn't a known config. (org.apache.kafka.clients.consumer.ConsumerConfig:355)
value.subject.name.strategy = class io.confluent.kafka.serializers.subject.TopicNameStrategy
[2022-11-21 16:40:23,690] WARN The configuration 'converter.subject.name.strategy' was supplied but isn't a known config. (org.apache.kafka.clients.consumer.ConsumerConfig:355)
[2022-11-21 16:40:23,690] WARN The configuration 'value.subject.name.strategy' was supplied but isn't a known config. (org.apache.kafka.clients.consumer.ConsumerConfig:355)
[2022-11-21 16:40:23,690] WARN The configuration 'value.converter.subject.name.strategy' was supplied but isn't a known config. (org.apache.kafka.clients.consumer.ConsumerConfig:355)
[2022-11-21 16:40:23,719] WARN The configuration 'converter.subject.name.strategy' was supplied but isn't a known config. (org.apache.kafka.clients.consumer.ConsumerConfig:355)
I put all of these variations in worker.properties and feed it to connector_distributed to start.
grep -i "name.strategy" /plugins/worker.properties
value.subject.name.strategy=io.confluent.kafka.serializers.subject.RecordNameStrategy
value.converter.subject.name.strategy=io.confluent.kafka.serializers.subject.RecordNameStrategy
consumer.value.subject.name.strategy=io.confluent.kafka.serializers.subject.RecordNameStrategy
consumer.value.converter.subject.name.strategy=io.confluent.kafka.serializers.subject.RecordNameStrategy
Those logs can be ignored. Consumer properties don't use those, only the config within the serializer does. That will be printed separately (where you may be seeing the default applied).
There's an open JIRA to silence the logs from passing converter properties all over the consumer.
To configure the serializer, you use converters. To configure converters you need to use
value.converter.[property]=[value]
So, like schema.registry.url,
value.converter.value.subject.name.strategy=OtherStrategy
I have two Kafka clusters, the first — Kafka-A — uses a "SASL SCRAM-SHA-256" mechanism to authenticate,the other — Kafka-B — has no configuration set for it.
To be able to connect to Kafka-A in Clickhouse, I configured a config.xml file as demonstrated bellow:
My config.xml configuration:
<kafka>
<security_protocol>sasl_plaintext</security_protocol>
<sasl_mechanism>SCRAM-SHA-256</sasl_mechanism>
<sasl_username>xxx</sasl_username>
<sasl_password>xxx</sasl_password>
<debug>all</debug>
<auto_offset_reset>latest</auto_offset_reset>
<compression_type>snappy</compression_type>
</kafka>
At this point I found that I can't connect to Kafka-B using the Kafka engine table. When I try to an error occurs that prints the following message:
StorageKafka (xxx): [rdk:FAIL]
[thrd:sasl_plaintext://xxx/bootstrap]:
sasl_plaintext://xxx/bootstrap: SASL SCRAM-SHA-256
mechanism handshake failed: Broker: Request not valid in current SASL
state: broker's supported mechanisms: (after 3ms in state
AUTH_HANDSHAKE, 4 identical error(s) suppressed)
It seems when connecting to Kafka-B, Clickhouse also use the SASL authentication, which leads to the error being thrown, since Kafka-B servers are not configured with authentication.
I would like to know how I can configure it correctly to connect to different Kafka clusters?
CH allows to define kafka config for each topic
Use topic in the name of an XML section:
<kafka_mytopic>
<security_protocol>....
....
</kafka_mytopic>
We using ELK stack (7.10.2) in Kubernetes (1.21.5). After several time our service provider Gardener change OS version (318.9.0 -> 576.1.0) and our troubles with logging stack started.
It seems, that Kafka (v 2.8.1, 2 pods) not stream data to Logstash (7.10.2, 2 pods), but sent it by chunks of data every few moments. In fact, in Kibana we not see continual adding log records, but we see bunch of new records every few moments. If high load occur (e.g. debugging some component in k8s cluster), this delay is rising to minutes.
We discovered, that metric delayed fetch in purgatory is jumping with very similar pattern
see screenshot, like a "saw". When I downgrade OS version on nodes from current (576.2.0, orange) to previous one (318.9.0, blue), problem disappeared. As you expected, we dont stay on same OS version much longer.
I asked Gardener staff for assistance, but without root cause they are not able help us. We not change any settings, component versions, ... Just OS version on nodes.
From Logstashs debug log I can see, that Logstash is continuously connecting/disconnecting to Kafka:
[2022-01-17T08:53:33,232][INFO ][org.apache.kafka.clients.consumer.internals.AbstractCoordinator] [Consumer clientId=elk-logstash-indexer-6c84d6bf8c-58gnz-containers-10, groupId=containers] Attempt to heartbeat failed since group is rebalancing
[2022-01-17T08:53:30,501][INFO ][org.apache.kafka.clients.consumer.internals.AbstractCoordinator] [Consumer clientId=elk-logstash-indexer-6c84d6bf8c-ct29t-containers-49, groupId=containers] Discovered group coordinator elk-kafka-0.kafka.logging.svc.cluster.local:9092 (id: 2147483647 rack: null)
[2022-01-17T08:53:30,001][INFO ][org.apache.kafka.common.utils.AppInfoParser] Kafka startTimeMs: 1642409610000
These lines are still repeating in loop.
Similar situation I can see on Kafka:
[2022-01-20 11:55:04,241] DEBUG [broker-0-to-controller-send-thread]: Controller isn't cached, looking for local metadata changes (kafka.server.BrokerToControllerRequestThread)
[2022-01-20 11:55:04,241] DEBUG [broker-0-to-controller-send-thread]: No controller defined in metadata cache, retrying after backoff (kafka.server.BrokerToControllerRequestThread)
[2022-01-20 11:55:04,342] DEBUG [broker-0-to-controller-send-thread]: Controller isn't cached, looking for local metadata changes (kafka.server.BrokerToControllerRequestThread)
[2022-01-20 11:55:04,342] DEBUG [broker-0-to-controller-send-thread]: No controller defined in metadata cache, retrying after backoff (kafka.server.BrokerToControllerRequestThread)
[2022-01-20 11:55:04,365] DEBUG Accepted connection from /10.250.1.127:53678 on /100.96.30.21:9092 and assigned it to processor 1, sendBufferSize [actual|requested]: [102400|102400] recvBufferSize [actual|requested]: [102400|102400] (kafka.network.Acceptor)
[2022-01-20 11:55:04,365] DEBUG Processor 1 listening to new connection from /10.250.1.127:53678 (kafka.network.Processor)
[2022-01-20 11:55:04,368] DEBUG [SocketServer listenerType=ZK_BROKER, nodeId=0] Connection with /10.250.1.127 disconnected (org.apache.kafka.common.network.Selector)
I attempted:
double resources for Kafka and Logstash (no change occurred)
change container engine from Docker to ContainerD (problem was worse in ContainerD, ~400 -> ~1000)
change Logstash parameters for Kafka plugin (no change occurred)
compare Kernel settings (5.4.0 -> 5.10.0, I not spotted any interesting changes)
temporary disable Karydia for Kafka, Logstash and ZooKeeper (no change occurred)
temporary upgrade Logstash version (7.10.2 -> 7.12.0, without success, all tested version have same bad behavior, move to higher version currently isnt possible without change version of another components in ELK)
Unfortunately, I am not a Kafka expert, I am not sure, that connecting/disconnecting is root cause of some of our non-optimal settings, or communication is interference by something unknow for us.
I would like to ask community for help with this problem. Some suggestion, how to continue with investigation are very welcome too.
I have setup Kafka cluster locally. Three broker's with properties :
broker.id=0
listeners=PLAINTEXT://:9092
broker.id=1
listeners=PLAINTEXT://:9091
broker.id=2
listeners=PLAINTEXT://:9090
Things were working fine but I am now getting error :
WARN Error while fetching metadata with correlation id 1 : {TRAIL_TOPIC=LEADER_NOT_AVAILABLE} (org.apache.kafka.clients.NetworkClient)
I am also trying to write messages via Java based client & I am getting error : unable to fetch metadata in 6000ms.
I faced the same problem, it is because the topic does not exist and the configuration of broker auto.create.topics.enable by default is set to false. I was using bin/connect-standalone so I didn't specify topics I would use.
I changed this config to true and it solved my problem.
I want to load apache server logs to hdfs using Kafka.
creating topic:
./kafka-topics.sh --create --zookeeper 10.25.3.207:2181 --replication-factor 1 --partitions 1 --topic lognew
tailing the apache access log directory:
tail -f /var/log/httpd/access_log |./kafka-console-producer.sh --broker-list 10.25.3.207:6667 --topic lognew
At another terminal (of kafka bin) start consumer:
./kafka-console-consumer.sh --zookeeper 10.25.3.207:2181 --topic lognew --from-beginning
camus.properties file is configured as :
# Needed Camus properties, more cleanup to come
# final top-level data output directory, sub-directory will be dynamically created for each topic pulled
etl.destination.path=/user/root/topics
# HDFS location where you want to keep execution files, i.e. offsets, error logs, and count files
etl.execution.base.path=/user/root/exec
# where completed Camus job output directories are kept, usually a sub-dir in the base.path
etl.execution.history.path=/user/root/camus/exec/history
# Kafka-0.8 handles all zookeeper calls
#zookeeper.hosts=
#zookeeper.broker.topics=/brokers/topics
#zookeeper.broker.nodes=/brokers/ids
# Concrete implementation of the Encoder class to use (used by Kafka Audit, and thus optional for now) `camus.message.encoder.class=com.linkedin.camus.etl.kafka.coders.DummyKafkaMessageEncoder`
# Concrete implementation of the Decoder class to use
#camus.message.decoder.class=com.linkedin.camus.etl.kafka.coders.LatestSchemaKafkaAvroMessageDecoder
# Used by avro-based Decoders to use as their Schema Registry
#kafka.message.coder.schema.registry.class=com.linkedin.camus.example.schemaregistry.DummySchemaRegistry
# Used by the committer to arrange .avro files into a partitioned scheme. This will be the default partitioner for all
# topic that do not have a partitioner specified
#etl.partitioner.class=com.linkedin.camus.etl.kafka.coders.DefaultPartitioner
# Partitioners can also be set on a per-topic basis
#etl.partitioner.class.<topic-name>=com.your.custom.CustomPartitioner
# all files in this dir will be added to the distributed cache and placed on the classpath for hadoop tasks
# hdfs.default.classpath.dir=
# max hadoop tasks to use, each task can pull multiple topic partitions
mapred.map.tasks=30
# max historical time that will be pulled from each partition based on event timestamp
kafka.max.pull.hrs=1
# events with a timestamp older than this will be discarded.
kafka.max.historical.days=3
# Max minutes for each mapper to pull messages (-1 means no limit)
kafka.max.pull.minutes.per.task=-1
# if whitelist has values, only whitelisted topic are pulled. nothing on the blacklist is pulled
#kafka.blacklist.topics=
kafka.whitelist.topics=lognew
log4j.configuration=true
# Name of the client as seen by kafka
kafka.client.name=camus
# Fetch Request Parameters
#kafka.fetch.buffer.size=
#kafka.fetch.request.correlationid=
#kafka.fetch.request.max.wait=
#kafka.fetch.request.min.bytes=
# Connection parameters.
kafka.brokers=10.25.3.207:6667
#kafka.timeout.value=
#Stops the mapper from getting inundated with Decoder exceptions for the same topic
#Default value is set to 10
max.decoder.exceptions.to.print=5
#Controls the submitting of counts to Kafka
#Default value set to true
post.tracking.counts.to.kafka=true
monitoring.event.class=class.that.generates.record.to.submit.counts.to.kafka
# everything below this point can be ignored for the time being, will provide more documentation down the road
##########################
etl.run.tracking.post=false
#kafka.monitor.tier=
#etl.counts.path=
kafka.monitor.time.granularity=10
etl.hourly=hourly
etl.daily=daily
etl.ignore.schema.errors=false
# configure output compression for deflate or snappy. Defaults to deflate
etl.output.codec=deflate
etl.deflate.level=6
#etl.output.codec=snappy
etl.default.timezone=America/Los_Angeles
etl.output.file.time.partition.mins=60
etl.keep.count.files=false
etl.execution.history.max.of.quota=.8
mapred.output.compress=true
mapred.map.max.attempts=1
kafka.client.buffer.size=20971520
kafka.client.so.timeout=60000
#zookeeper.session.timeout=
#zookeeper.connection.timeout=
I get errors when i execute the below command:
hadoop jar camus-example-0.1.0-SNAPSHOT-shaded.jar com.linkedin.camus.etl.kafka.CamusJob -P camus.properties
Below is the error:
[CamusJob] - Fetching metadata from broker 10.25.3.207:6667 with client id camus for 0 topic(s) []
[CamusJob] - failed to create decoder
com.linkedin.camus.coders.MessageDecoderException: com.linkedin.camus.coders.MessageDecoderException: java.lang.NullPointerException
at com.linkedin.camus.etl.kafka.coders.MessageDecoderFactory.createMessageDecoder(MessageDecoderFactory.java:28)
at com.linkedin.camus.etl.kafka.mapred.EtlInputFormat.createMessageDecoder(EtlInputFormat.java:390)
at com.linkedin.camus.etl.kafka.mapred.EtlInputFormat.getSplits(EtlInputFormat.java:264)
at org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:301)
at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:318)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:196)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1287)
at com.linkedin.camus.etl.kafka.CamusJob.run(CamusJob.java:280)
at com.linkedin.camus.etl.kafka.CamusJob.run(CamusJob.java:608)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at com.linkedin.camus.etl.kafka.CamusJob.main(CamusJob.java:572)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: com.linkedin.camus.coders.MessageDecoderException: java.lang.NullPointerException
at com.linkedin.camus.etl.kafka.coders.KafkaAvroMessageDecoder.init(KafkaAvroMessageDecoder.java:40)
at com.linkedin.camus.etl.kafka.coders.MessageDecoderFactory.createMessageDecoder(MessageDecoderFactory.java:24)
... 22 more
Caused by: java.lang.NullPointerException
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:195)
at com.linkedin.camus.etl.kafka.coders.KafkaAvroMessageDecoder.init(KafkaAvroMessageDecoder.java:31)
... 23 more
[CamusJob] - Discarding topic (Decoder generation failed) : avrotopic
[CamusJob] - failed to create decoder
Please, suggest what can be done to resolve this problem.
Thanks in advance
Deepthy
I've never used Camus. But I believe this is a Kafka related error and it has to do with how you're encoding/decoding the message. I believe the important lines in your stack trace are
Caused by: com.linkedin.camus.coders.MessageDecoderException: java.lang.NullPointerException
at com.linkedin.camus.etl.kafka.coders.KafkaAvroMessageDecoder.init(KafkaAvroMessageDecoder.java:40)
at com.linkedin.camus.etl.kafka.coders.MessageDecoderFactory.createMessageDecoder(MessageDecoderFactory.java:24)
How are you telling Kafka to use your Avro encoding? You've commented out the following line in your config,
#kafka.message.coder.schema.registry.class=com.linkedin.camus.example.schemaregistry.DummySchemaRegistry
So are you setting that somewhere else in code? If you're not, I would suggest uncommenting out that config value and setting it to whatever avro class you're trying to decode/encode in.
It might take you some debugging to use the right classpath and such, but I believe this is an easily solvable problem.
EDIT
In responding to your comments, I have a couple comments of my own.
I have never used Camus. So debugging the errors you get from Camus is not something that I'll be able to do very well or at all. So you'll have to spend some time (maybe several hours) researching and trying different things to get it to work.
I doubt DummySchemaRegistry is the correct configuration value that you need. Anything starting with Dummy is probably not a valid configuration option.
Doing a simple google search about camus and schema registry revealed some interesting links, SchemaRegistry Classes, KafkaAvroMessageEncoder. Those are more likely to be the correct config values that you need. Just my guess, because again, I've never used Camus.
This could also be of some use to you. I don't know if you've seen it. But if you haven't, I'm pretty sure googling the specific error you get is probably something you should before coming to Stack overflow.