Kafka org.apache.kafka.connect.converters.ByteArrayConverter doesn't work as values for key.converter and value.converter - apache-kafka

I'm trying to build a pipeline where I have to move binary data from kafka topic to kinesis stream with out transforming. So I'm planning to use ByteArrayConverter for worker properties setup. But I'm getting the following error! Although I could see the ByteArrayConverter class in here
on 0.11.0 version. I cannot find the same class under 3.2.x :(
Any help would be much appreciated.
key.converter=io.confluent.connect.replicator.util.ByteArrayConverter
value.converter=io.confluent.connect.replicator.util.ByteArrayConverter
Exception in thread "main" org.apache.kafka.common.config.ConfigException: Invalid value io.confluent.connect.replicator.util.ByteArrayConverter for configuration key.converter: Class io.confluent.connect.replicator.util.ByteArrayConverter could not be found.
at org.apache.kafka.common.config.ConfigDef.parseType(ConfigDef.java:672)
at org.apache.kafka.common.config.ConfigDef.parse(ConfigDef.java:418)
at org.apache.kafka.common.config.AbstractConfig.<init>(AbstractConfig.java:55)
at org.apache.kafka.common.config.AbstractConfig.<init>(AbstractConfig.java:62)
at org.apache.kafka.connect.runtime.WorkerConfig.<init>(WorkerConfig.java:156)
at org.apache.kafka.connect.runtime.distributed.DistributedConfig.<init>(DistributedConfig.java:198)
at org.apache.kafka.connect.cli.ConnectDistributed.main(ConnectDistributed.java:65)

org.apache.kafka.connect.converters.ByteArrayConverter was only added to Apache Kafka 0.11 (which is Confluent 3.3). If you are running a Confluent distro earlier than 3.3 then you will need the Confluent Enterprise distro (not Confluent Open Source) and use the io.confluent.connect.replicator.util.ByteArrayConverter converter

Related

Why is Cassandra crashing whenever I try to run DataStax Kafka Connector?

Goal: My goal is to use Kafka to send messages to a Cassandra sink using Kafka Connect.
I've deployed Kafka and Cassandra and I am able to work with each of them individually - I have no problem sending data to Kafka, using producers to pass messages, and using consumers to consume them. I have no problem using cqlsh to create tables and insert data into them. However, whenever I try to deploy the DataStax Apache Kafka Connector, Cassandra seems to crash.
I am trying to learn how to use Kafka Connect using just one Kafka producer, broker, and one Cassandra keyspace using the standalone mode. I've configured both connect-standalone.properties and the cassandra-sink-standalone.properties following the instructions shown on DataStax: https://docs.datastax.com/en/kafka/doc/kafka/kafkaStringJson.html
connect-standalone.properties
bootstrap.servers=localhost:9092
key.converter=org.apache.kafka.connect.storage.StringConverter
value.converter=org.apache.kafka.connect.json.JsonConverter
key.converter.schemas.enable=false
value.converter.schemas.enable=false
offset.storage.file.filename=/tmp/connect.offsets
offset.flush.interval.ms=10000
plugin.path= *install_location*/kafka-connect-cassandra-sink-1.4.0.jar
cassandra-sink-standalone.properties
name=stocks-sink
connector.class=com.datastax.kafkaconnector.DseSinkConnector
tasks.max=1
topics=stocks_topic
topic.stocks_topic.stocks_keyspace.stocks_table.mapping = symbol=value.symbol, ts=value.ts, exchange=value.exchange, industry=value.industry, name=key, value=value.value
Then, the Kafka Connector is started using bin/connect-standalone.sh connect-standalone.properties cassandra-sink-standalone.properties.
About 95% of the time I attempt to launch Kafka Connector, Cassandra crashes. Running bin/nodetool status shows the message:
nodetool: Failed to connect to '127.0.0.1:7199' - ConnectException: 'Connection refused (Connection refused)'
In the system.log and debug.log logs, there is no indication that Cassandra has even crashed. The last line just remains as:
INFO [main] 2023-01-31 00:00:00,143 StorageService.java:2806 - Node localhost/127.0.0.1:7000 state jump to NORMAL
And in the Kafka Connect logs, the error messages states:
[2023-01-31 15:24:47,803] INFO [plc-sink|task-0] DataStax Java driver for Apache Cassandra(R) (com.datastax.oss:java-driver-core) version 4.6.0 (com.datastax.oss.driver.internal.core.DefaultMavenCoordinates:37)
[2023-01-31 15:24:47,947] INFO [plc-sink|task-0] Could not register Graph extensions; this is normal if Tinkerpop was explicitly excluded from classpath (com.datastax.oss.driver.internal.core.context.InternalDriverContext:540)
[2023-01-31 15:24:47,948] INFO [plc-sink|task-0] Could not register Reactive extensions; this is normal if Reactive Streams was explicitly excluded from classpath (com.datastax.oss.driver.internal.core.context.InternalDriverContext:559)
[2023-01-31 15:24:47,997] INFO [plc-sink|task-0] Using native clock for microsecond precision (com.datastax.oss.driver.internal.core.time.Clock:40)
[2023-01-31 15:24:47,999] INFO [plc-sink|task-0] [s0] No contact points provided, defaulting to /127.0.0.1:9042 (com.datastax.oss.driver.internal.core.metadata.MetadataManager:134)
[2023-01-31 15:24:48,190] WARN [plc-sink|task-0] [s0] Error connecting to Node(endPoint=/127.0.0.1:9042, hostId=null, hashCode=3247c5e4), trying next node (ConnectionInitException: [s0|control|connecting...] Protocol initialization request, step 1 (OPTIONS): failed to send request (java.nio.channels.ClosedChannelException)) (com.datastax.oss.driver.internal.core.control.ControlConnection:34)
[2023-01-31 15:24:48,200] ERROR [plc-sink|task-0] WorkerSinkTask{id=plc-sink-0} Task threw an uncaught and unrecoverable exception. Task is being killed and will not recover until manually restarted (org.apache.kafka.connect.runtime.WorkerTask:196)
com.datastax.oss.driver.api.core.AllNodesFailedException: Could not reach any contact point, make sure you've provided valid addresses (showing first 1 nodes, use getAllErrors() for more): Node(endPoint=/127.0.0.1:9042, hostId=null, hashCode=3247c5e4): [com.datastax.oss.driver.api.core.connection.ConnectionInitException: [s0|control|connecting...] Protocol initialization request, step 1 (OPTIONS): failed to send request (java.nio.channels.ClosedChannelException)]
at com.datastax.oss.driver.api.core.AllNodesFailedException.copy(AllNodesFailedException.java:141)
at com.datastax.oss.driver.internal.core.util.concurrent.CompletableFutures.getUninterruptibly(CompletableFutures.java:149)
at com.datastax.oss.driver.api.core.session.SessionBuilder.build(SessionBuilder.java:612)
at com.datastax.oss.kafka.sink.state.LifeCycleManager.buildCqlSession(LifeCycleManager.java:518)
at com.datastax.oss.kafka.sink.state.LifeCycleManager.lambda$startTask$0(LifeCycleManager.java:113)
at java.util.concurrent.ConcurrentHashMap.computeIfAbsent(ConcurrentHashMap.java:1660)
at com.datastax.oss.kafka.sink.state.LifeCycleManager.startTask(LifeCycleManager.java:109)
at com.datastax.oss.kafka.sink.CassandraSinkTask.start(CassandraSinkTask.java:83)
at org.apache.kafka.connect.runtime.WorkerSinkTask.initializeAndStart(WorkerSinkTask.java:312)
at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:187)
at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:244)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)
Suppressed: com.datastax.oss.driver.api.core.connection.ConnectionInitException: [s0|control|connecting...] Protocol initialization request, step 1 (OPTIONS): failed to send request (java.nio.channels.ClosedChannelException)
at com.datastax.oss.driver.internal.core.channel.ProtocolInitHandler$InitRequest.fail(ProtocolInitHandler.java:342)
at com.datastax.oss.driver.internal.core.channel.ChannelHandlerRequest.writeListener(ChannelHandlerRequest.java:87)
at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:577)
at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:551)
at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:490)
at io.netty.util.concurrent.DefaultPromise.addListener(DefaultPromise.java:183)
at io.netty.channel.DefaultChannelPromise.addListener(DefaultChannelPromise.java:95)
at io.netty.channel.DefaultChannelPromise.addListener(DefaultChannelPromise.java:30)
at com.datastax.oss.driver.internal.core.channel.ChannelHandlerRequest.send(ChannelHandlerRequest.java:76)
at com.datastax.oss.driver.internal.core.channel.ProtocolInitHandler$InitRequest.send(ProtocolInitHandler.java:183)
at com.datastax.oss.driver.internal.core.channel.ProtocolInitHandler.onRealConnect(ProtocolInitHandler.java:118)
at com.datastax.oss.driver.internal.core.channel.ConnectInitHandler.lambda$connect$0(ConnectInitHandler.java:57)
at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:577)
at io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:570)
at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:549)
at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:490)
at io.netty.util.concurrent.DefaultPromise.setValue0(DefaultPromise.java:615)
at io.netty.util.concurrent.DefaultPromise.setFailure0(DefaultPromise.java:608)
at io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:117)
at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.fulfillConnectPromise(AbstractNioChannel.java:321)
at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:337)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:702)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:650)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:576)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493)
at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
... 1 more
Suppressed: io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: /127.0.0.1:9042
Caused by: java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:716)
at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:330)
at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:334)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:702)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:650)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:576)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493)
at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.lang.Thread.run(Thread.java:750)
Caused by: java.nio.channels.ClosedChannelException
at io.netty.channel.AbstractChannel$AbstractUnsafe.newClosedChannelException(AbstractChannel.java:957)
at io.netty.channel.AbstractChannel$AbstractUnsafe.flush0(AbstractChannel.java:921)
at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.flush0(AbstractNioChannel.java:354)
at io.netty.channel.AbstractChannel$AbstractUnsafe.flush(AbstractChannel.java:897)
at io.netty.channel.DefaultChannelPipeline$HeadContext.flush(DefaultChannelPipeline.java:1372)
at io.netty.channel.AbstractChannelHandlerContext.invokeFlush0(AbstractChannelHandlerContext.java:748)
at io.netty.channel.AbstractChannelHandlerContext.invokeFlush(AbstractChannelHandlerContext.java:740)
at io.netty.channel.AbstractChannelHandlerContext.flush(AbstractChannelHandlerContext.java:726)
at io.netty.channel.ChannelDuplexHandler.flush(ChannelDuplexHandler.java:127)
at io.netty.channel.AbstractChannelHandlerContext.invokeFlush0(AbstractChannelHandlerContext.java:748)
at io.netty.channel.AbstractChannelHandlerContext.invokeWriteAndFlush(AbstractChannelHandlerContext.java:763)
at io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:788)
at io.netty.channel.AbstractChannelHandlerContext.writeAndFlush(AbstractChannelHandlerContext.java:756)
at io.netty.channel.AbstractChannelHandlerContext.writeAndFlush(AbstractChannelHandlerContext.java:806)
at io.netty.channel.DefaultChannelPipeline.writeAndFlush(DefaultChannelPipeline.java:1025)
at io.netty.channel.AbstractChannel.writeAndFlush(AbstractChannel.java:294)
at com.datastax.oss.driver.internal.core.channel.ChannelHandlerRequest.send(ChannelHandlerRequest.java:75)
... 20 more
In the 5% of the time that Cassandra doesn't actually crash, the following message shows up in Kafka Connect's logs:
[2023-01-31 15:41:32,839] INFO [plc-sink|task-0] DataStax Java driver for Apache Cassandra(R) (com.datastax.oss:java-driver-core) version 4.6.0 (com.datastax.oss.driver.internal.core.DefaultMavenCoordinates:37)
[2023-01-31 15:41:32,981] INFO [plc-sink|task-0] Could not register Graph extensions; this is normal if Tinkerpop was explicitly excluded from classpath (com.datastax.oss.driver.internal.core.context.InternalDriverContext:540)
[2023-01-31 15:41:32,982] INFO [plc-sink|task-0] Could not register Reactive extensions; this is normal if Reactive Streams was explicitly excluded from classpath (com.datastax.oss.driver.internal.core.context.InternalDriverContext:559)
[2023-01-31 15:41:33,037] INFO [plc-sink|task-0] Using native clock for microsecond precision (com.datastax.oss.driver.internal.core.time.Clock:40)
[2023-01-31 15:41:33,040] INFO [plc-sink|task-0] [s0] No contact points provided, defaulting to /127.0.0.1:9042 (com.datastax.oss.driver.internal.core.metadata.MetadataManager:134)
[2023-01-31 15:41:33,254] INFO [plc-sink|task-0] [s0] Failed to connect with protocol DSE_V2, retrying with DSE_V1 (com.datastax.oss.driver.internal.core.channel.ChannelFactory:224)
[2023-01-31 15:41:33,263] INFO [plc-sink|task-0] [s0] Failed to connect with protocol DSE_V1, retrying with V4 (com.datastax.oss.driver.internal.core.channel.ChannelFactory:224)
[2023-01-31 15:41:34,091] INFO [plc-sink|task-0] WorkerSinkTask{id=plc-sink-0} Sink task finished initialization and start (org.apache.kafka.connect.runtime.WorkerSinkTask:313)
[2023-01-31 15:41:34,092] INFO [plc-sink|task-0] WorkerSinkTask{id=plc-sink-0} Executing sink task (org.apache.kafka.connect.runtime.WorkerSinkTask:198)
...
Versions:
Apache Cassandra 4.0.7
Apache Kafka 3.3.1
DataStax Apache Kafka Connector 1.4.0
I am currently using WSL2 Ubuntu 20.04.5 on Windows 11, with the following specs:
CPU: 4 Cores
Memory: 8GB RAM
Disk (SSD): 250 GB
Seeing that it actually works 5% of the time, I suspect that it's an OOM problem as outlined in https://community.datastax.com/questions/6947/index.html (and I sometimes just happen to have enough memory?). I've tried the solution in this article but it didn't help. How can I configure Cassandra / Kafka Connect to avoid this problem? Is this just a matter of needing a computer with more memory?
I think you're on the right track when you suggested that memory is an issue.
I have a "small" Windows Surface Pro I use to replicate issues like yours. I'm also running Ubuntu 20.04 with WSL2 on this laptop.
By default, Windows allocates half of system RAM to WSL2 so on my sub-8GB RAM installation, my Ubuntu installation takes up 3.7GB of memory. A vanilla installation of Cassandra (out-of-the-box zero configuration), starts with 1.3GB of memory allocated to it so there's only 2.4GB left for everything else.
What I suspect is happening is that when you start Kafka on the same node, Ubuntu runs out of memory and it triggers the Linux oom-killer. Although the end result is similar, the trigger is slightly different to what I described in the post you linked so my recommendation to explicitly set disk_access_mode doesn't help in this situation.
As a workaround, configure Cassandra to only allocate 1GB of memory by setting the MAX_HEAP_SIZE in conf/cassandra-env.sh:
MAX_HEAP_SIZE="1G"
Kafka is configured with 1GB by default but if it isn't, set the following in bin/kafka-server-start.sh:
export KAFKA_HEAP_OPTS="-Xmx1G -Xms1G"
By setting both, there should be over 1GB left for Ubuntu and hopefully allow you to run your tests. Cheers!

Kafka consumer using AWS_MSK_IAM ClassCastException error

I have MSK running on AWS and I'd like to consume information using AWS_MSK_IAM authentication.
My MSK is properly configured and I can consume the information using Kafka CLI with the following command:
../bin/kafka-console-consumer.sh --bootstrap-server b-1.kafka.*********.***********.amazonaws.com:9098 --consumer.config client_auth.properties --topic TopicTest --from-beginning
My client_auth.properties has the following information:
# Sets up TLS for encryption and SASL for authN.
security.protocol = SASL_SSL
# Identifies the SASL mechanism to use.
sasl.mechanism = AWS_MSK_IAM
# Binds SASL client implementation.
sasl.jaas.config = software.amazon.msk.auth.iam.IAMLoginModule required;
# Encapsulates constructing a SigV4 signature based on extracted credentials.
# The SASL client bound by "sasl.jaas.config" invokes this class.
sasl.client.callback.handler.class = software.amazon.msk.auth.iam.IAMClientCallbackHandler
When I try to consume from my Databricks cluster using spark, I receive the following error:
Caused by: kafkashaded.org.apache.kafka.common.KafkaException: java.lang.ClassCastException: software.amazon.msk.auth.iam.IAMClientCallbackHandler cannot be cast to kafkashaded.org.apache.kafka.common.security.auth.AuthenticateCallbackHandler
Here is my cluster config:
The libraries I'm using in the cluster:
And the code I'm running on Databricks:
raw = (
spark
.readStream
.format('kafka')
.option('kafka.bootstrap.servers', 'b-.kafka.*********.***********.amazonaws.com:9098')
.option('subscribe', 'TopicTest')
.option('startingOffsets', 'earliest')
.option('kafka.sasl.mechanism', 'AWS_MSK_IAM')
.option('kafka.security.protocol', 'SASL_SSL')
.option('kafka.sasl.jaas.config', 'software.amazon.msk.auth.iam.IAMLoginModule required;')
.option('kafka.sasl.client.callback.handler.class', 'software.amazon.msk.auth.iam.IAMClientCallbackHandler')
.load()
)
Though I haven't tested this, based on the comment from Andrew on being theoretically able to relocate the dependency, I dug a bit into the source of aws-msk-iam-auth. They have a compileOnly('org.apache.kafka:kafka-clients:2.4.1') in their build.gradle. Hence the uber jar doesn't contain this library and is picked up from whatever databricks has (and shaded).
They are also relocating all their dependent jars with a prefix. So changing the compileOnly to implementation and rebuilding the uber jar with gradle clean shadowJar should include and relocate the kafka jars without any conflicts when uploading to databricks.
I faced the same issue, I forked aws-msk-iam-auth in order to make it compatible with databricks. Just add the jar from the following release https://github.com/Iziwork/aws-msk-iam-auth-for-databricks/releases/tag/v1.1.2-databricks to your cluster.

Unable to connect IIDR CDC to kafka

When trying to connect IIDR replication engine for kafka to a kafka cluster via Zookeeper, I am getting the following error
kafka.common.KafkaException: Failed to parse the broker info from zookeeper: {"listener_security_protocol_map":{"PLAINTEXT":"PLAINTEXT","PLAINTEXT_HOST":"PLAINTEXT"},"endpoints":["PLAINTEXT:
//broker:29092","PLAINTEXT_HOST://localhost:9092"],"jmx_port":9101,"host":"broker","timestamp":"1598174513950","port":29092,"version":4}
at kafka.cluster.Broker$.createBroker(Broker.scala:101)
at kafka.utils.ZkUtils.getBrokerInfo(ZkUtils.scala:787)
at kafka.utils.ZkUtils$$anonfun$getAllBrokersInCluster$2.apply(ZkUtils.scala:162)
at kafka.utils.ZkUtils$$anonfun$getAllBrokersInCluster$2.apply(ZkUtils.scala:162)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
at scala.collection.AbstractTraversable.map(Traversable.scala:104)
at kafka.utils.ZkUtils.getAllBrokersInCluster(ZkUtils.scala:162)
at com.datamirror.ts.target.publication.KafkaTargetPublisherProxy.getBootstrapBrokers(KafkaTargetPublisherProxy.java:829)
at com.datamirror.ts.target.publication.KafkaTargetPublisherProxy.loadKafkaServicesInfo(KafkaTargetPublisherProxy.java:417)
at com.datamirror.ts.target.publication.KafkaTargetPublisherProxy.handleStartReplicateMessage(KafkaTargetPublisherProxy.java:139)
at com.datamirror.ts.enginemsg.MessageDispatcher.dispatchSwitch(MessageDispatcher.java:547)
at com.datamirror.ts.enginemsg.MessageDispatcher.dispatch(MessageDispatcher.java:142)
at com.datamirror.ts.engine.ReplicationSession.dispatchDynamicMessage(ReplicationSession.java:2816)
at com.datamirror.ts.engine.ReplicationSession.dispatchDataMessage(ReplicationSession.java:2910)
at com.datamirror.ts.target.publication.TargetDataChannelJob.moderateForTheTarget(TargetDataChannelJob.java:178)
at com.datamirror.ts.target.publication.TargetDataChannelJob.execute(TargetDataChannelJob.java:74)
at com.datamirror.ts.engine.component.PipelineThread.runThread(PipelineThread.java:217)
at com.datamirror.ts.util.TsThread.run(TsThread.java:130)
Caused by: java.lang.IllegalArgumentException: No enum constant org.apache.kafka.common.protocol.SecurityProtocol.PLAINTEXT_HOST
at java.lang.Enum.valueOf(Enum.java:249)
at org.apache.kafka.common.protocol.SecurityProtocol.valueOf(SecurityProtocol.java:28)
at org.apache.kafka.common.protocol.SecurityProtocol.forName(SecurityProtocol.java:89)
at kafka.cluster.EndPoint$.createEndPoint(EndPoint.scala:49)
at kafka.cluster.Broker$$anonfun$1.apply(Broker.scala:90)
at kafka.cluster.Broker$$anonfun$1.apply(Broker.scala:89)
at scala.collection.immutable.List.map(List.scala:277)
at kafka.cluster.Broker$.createBroker(Broker.scala:89)
It seems like there is a compatibility issue between IIDR and my kafka cluster. Appreciate any advice on how to overcome this.
System setup
- IIDR v11.4 kafka engine setup
- Quick start confluent kafka docker setup, https://github.com/confluentinc/cp-all-in-one, cd cp-all-in-one
The recommendation for both Kafka these days and the IDR product is to use bootstrap.servers and list the brokers in the kafkaproducer.properties and kafkaconsumer.properties files. Contact L2 with regards to zookeeper config if there is a need to pursue, but we strongly recommend using the bootstrap.servers parameter as per Apache Kafka recommendations.

Implementing custom AvroConverter for confluent kafka-connect-s3

I am using Confluent's Kafka s3 connect for copying data from apache Kafka to AWS S3.
The problem is that I have Kafka data in AVRO format which is NOT using Confluent Schema Registry’s Avro serializer and I cannot change the Kafka producer. So I need to deserialize existing Avro data from Kafka and then persist the same in parquet format in AWS S3. I tried using confluent's AvroConverter as value converter like this -
value.converter=io.confluent.connect.avro.AvroConverter
value.converter.schema.registry.url=http://localhost/api/v1/avro
And i am getting this error -
Caused by: org.apache.kafka.connect.errors.DataException: Failed to deserialize data for topic dcp-all to Avro:
at io.confluent.connect.avro.AvroConverter.toConnectData(AvroConverter.java:110)
at org.apache.kafka.connect.storage.Converter.toConnectData(Converter.java:86)
at org.apache.kafka.connect.runtime.WorkerSinkTask.lambda$convertAndTransformRecord$2(WorkerSinkTask.java:488)
at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndRetry(RetryWithToleranceOperator.java:128)
at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndHandleError(RetryWithToleranceOperator.java:162)
Caused by: org.apache.kafka.common.errors.SerializationException: Error deserializing Avro message for id -1
Caused by: org.apache.kafka.common.errors.SerializationException: Unknown magic byte!
As far as I understand, "io.confluent.connect.avro.AvroConverter" will only work if the data is written in Kafka using Confluent Schema Registry’s Avro serializer and hence I am getting this error. So my question is Do I need to implement a generic AvroConverter in this case? And if yes, how do I extend the existing source code - https://github.com/confluentinc/kafka-connect-storage-cloud?
Any help here will be appreciated.
You don't need to extend that repo. You just need to implement a Converter (part of Apache Kafka) shade it into a JAR, then place it on your Connect worker's CLASSPATH, like BlueApron did for Protobuf
Or see if this works - https://github.com/farmdawgnation/registryless-avro-converter
NOT using Confluent Schema Registry
Then what registry are you using? Each one that I know of has configurations to interface with the Confluent one

Kafka mirrormaker do not start

Im in the process of upgrading our cluster. However Im having issues trying to make the mirrormakers run.
So this machines have kafka-brokers and kafka-mirrormakers running. They have separate init scripts.
The brokers currently are using version 10.1.1.1 and mirrormakers are using version 0.8.2-beta.
Both of them have their own config files and locations
for example brokers are installed in /server/kafka/
mirrormakers are installed under /opt/kafka_mirrormaker.
Here the config lines for brokers following what the upgrade process explained:
inter.broker.protocol.version=0.10.1
log.message.format.version=0.8.2
and for mirrormakers:
inter.broker.protocol.version=0.8.2
log.message.format.version=0.8.2
So I was testing to upgrade this to 10.2.1 I tried the upgrade in one host.
Broker is running fine after I applied the upgrade version 10.2.1 however the mirrormaker dies right away when I tried to start it.
I see this exception on the logs
Exception in thread "main" java.lang.NullPointerException
at kafka.tools.MirrorMaker$.main(MirrorMaker.scala:309)
at kafka.tools.MirrorMaker.main(MirrorMaker.scala)
Exception in thread "MirrorMakerShutdownHook" java.lang.NullPointerException
at kafka.tools.MirrorMaker$.cleanShutdown(MirrorMaker.scala:399)
at kafka.tools.MirrorMaker$$anon$2.run(MirrorMaker.scala:222)
tail: kafka-mirrormaker-repl-sjc2-to-hkg1.out: file truncated
Exception in thread "main" java.lang.NullPointerException
at kafka.tools.MirrorMaker$.main(MirrorMaker.scala:309)
at kafka.tools.MirrorMaker.main(MirrorMaker.scala)
Exception in thread "MirrorMakerShutdownHook" java.lang.NullPointerException
at kafka.tools.MirrorMaker$.cleanShutdown(MirrorMaker.scala:399)
at kafka.tools.MirrorMaker$$anon$2.run(MirrorMaker.scala:222)
and this one
[2017-05-18 17:02:27,936] ERROR Exception when starting mirror maker. (kafka.tools.MirrorMaker$)
org.apache.kafka.common.config.ConfigException: Missing required configuration "bootstrap.servers" which has no default value.
at org.apache.kafka.common.config.ConfigDef.parse(ConfigDef.java:436)
at org.apache.kafka.common.config.AbstractConfig.<init>(AbstractConfig.java:56)
at org.apache.kafka.common.config.AbstractConfig.<init>(AbstractConfig.java:63)
at org.apache.kafka.clients.producer.ProducerConfig.<init>(ProducerConfig.java:340)
at org.apache.kafka.clients.producer.KafkaProducer.<init>(KafkaProducer.java:191)
at kafka.tools.MirrorMaker$MirrorMakerProducer.<init>(MirrorMaker.scala:694)
at kafka.tools.MirrorMaker$.main(MirrorMaker.scala:236)
at kafka.tools.MirrorMaker.main(MirrorMaker.scala)
This bootstrap error is kind of weird due to this already config. The server.properties has localhost:9292 configured as the bootstrap.server
To upgrade this I did broker and mirrormaker at the same time. Im not sure if I should first upgrade all the brokers first and then the mirrormakers.
Any suggestions. Should I follow the same procedure, upgrade first all brokers and then all mirrormakers. once they are upgrade bump the protocols in server.properties. Even though when it seems that the documentation kind of doesnt imply that: http://kafka.apache.org/documentation.html#upgrade
This has being solved.
The reason why they were not starting was due to the options on the configuration files change or were not properly configure