Kafka Connect error: com.aerospike.connect.inbound.aerospike.exception.ConvertToAerospikeException: user key missing from record - apache-kafka

I am trying to ingest data from kafka into aerospike. What am I missing in the kafka message being sent?
I am sending below data into kafka for pushing into aerospike:
ubuntu#ubuntu-VirtualBox:/opt/kafka_2.13-2.8.1$ bin/kafka-console-producer.sh --topic phone --bootstrap-server localhost:9092
>{"schema":{"type":"struct","optional":false,"version":1,"fields":[{"field":"name","type":"string","optional":true}]},"payload":{"name":"Anuj"}}
Kafka connect gives the below error:
com.aerospike.connect.inbound.aerospike.exception.ConvertToAerospikeException: user key missing from record
[2021-12-13 21:33:34,747] ERROR failed to put record SinkRecord{kafkaOffset=13, timestampType=CreateTime} ConnectRecord{topic='phone', kafkaPartition=0, key=null, keySchema=null, value=Struct{name=Anuj}, valueSchema=Schema{STRUCT}, timestamp=1639411413702, headers=ConnectHeaders(headers=)} (com.aerospike.connect.kafka.inbound.AerospikeSinkTask:288)
com.aerospike.connect.inbound.aerospike.exception.ConvertToAerospikeException: user key missing from record
at com.aerospike.connect.inbound.converter.AerospikeRecordConverter.extractUserKey(AerospikeRecordConverter.kt:131)
at com.aerospike.connect.inbound.converter.AerospikeRecordConverter.extractKey(AerospikeRecordConverter.kt:68)
at com.aerospike.connect.inbound.converter.AerospikeRecordConverter.extractRecord(AerospikeRecordConverter.kt:41)
at com.aerospike.connect.kafka.inbound.KafkaInboundDefaultMessageTransformer.transform(KafkaInboundDefaultMessageTransformer.kt:69)
at com.aerospike.connect.kafka.inbound.KafkaInboundDefaultMessageTransformer.transform(KafkaInboundDefaultMessageTransformer.kt:25)
at com.aerospike.connect.kafka.inbound.AerospikeSinkTask.applyTransform(AerospikeSinkTask.kt:341)
at com.aerospike.connect.kafka.inbound.AerospikeSinkTask.toAerospikeOperation(AerospikeSinkTask.kt:315)
at com.aerospike.connect.kafka.inbound.AerospikeSinkTask.putRecord(AerospikeSinkTask.kt:239)
at com.aerospike.connect.kafka.inbound.AerospikeSinkTask.access$putRecord(AerospikeSinkTask.kt:47)
at com.aerospike.connect.kafka.inbound.AerospikeSinkTask$put$2$2.invokeSuspend(AerospikeSinkTask.kt:220)
at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33)
at kotlinx.coroutines.DispatchedTask.run(DispatchedTask.kt:106)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
[2021-12-13 21:33:35,458] INFO 1 errors for topic phone (com.aerospike.connect.kafka.inbound.AerospikeSinkTask:552)
aerospike-kafka-inbound.yml file:
GNU nano 4.8 /home/ubuntu/ib-zipaerospikesink/aerospike-kafka-inbound-2.2.0/lib/etc/aerospike-kafka-inbound/aerospike-kafka-inbound.yml
# Change the configuration for your use case.
#
# Refer to https://www.aerospike.com/docs/connect/streaming-to-asdb/from-kafka-to-asdb-overview.html
# for details.
# Map of Kafka topic name to its configuration.
topics:
phone: # Kafka topic name.
invalid-record: ignore # not Kill task on invalid record.
mapping: # Config to convert Kafka record to Aerospike record.
namespace: # Aerospike record namespace config.
mode: static
value: test
set: # Aerospike record set config.
mode: static
value: t1
key-field: # Aerospike record key config.
source: key # Use Kafka record key as the Aerospike record key.
bins: # Aerospike record bins config.
type: multi-bins
# all-value-fields: true # Convert all values in Kafka record to Aerospike record bins.
map:
name:
source: value-field
field-name: firstName
# The Aerospike cluster connection properties.
aerospike:
seeds:
- 127.0.0.1:
port: 3000

It looks like you are not specifying a key when you are sending your kafka message. By default Kafka sends a null key and your config says to use the kafka key as the aerospike key. In order to send a kafka key you need to set parse.key to true and specify what your separator will be (in the kafka producer).
see step 8 here
https://kafka-tutorials.confluent.io/kafka-console-consumer-producer-basics/kafka.html
kafka-console-producer \
--topic orders \
--bootstrap-server broker:9092 \
--property parse.key=true \
--property key.separator=":"
The two properties tell the kafka producer to expect a key in your messages and a separator to tell the key from the value.
In this example there are two records one with the key foo and the other with fun.
foo:bar
fun:programming
This will result in those two records being written to aerospike with the primary keys matching the kafka keys foo and fun.

Related

How to send key, value messages with flume to a kafka producer

In console you add producer properties --property "parse.key=true" --property "key.separator=:" to produce key-value data into Kafka, but how to do this with flume? I tried to add
a1.sinks.k1.producer.parse.key=true
a1.sinks.k1.producer.key.separator=:
in .conf file but was of no avail, the kafka treated the key like a string.
Those are console-producer CLI arguments, not ProducerConfig properties for Kafka (which are passed to Flume)
The key will always be a string, but you pass it via the headers of the Flume record
https://github.com/apache/flume/blob/trunk/flume-ng-sinks/flume-ng-kafka-sink/src/main/java/org/apache/flume/sink/kafka/KafkaSink.java#L193

Configure Apache Kafka sink jdbc connector

I want to send the data sent to the topic to a postgresql-database. So I follow this guide and have configured the properties-file like this:
name=transaction-sink
connector.class=io.confluent.connect.jdbc.JdbcSinkConnector
tasks.max=1
topics=transactions
connection.url=jdbc:postgresql://localhost:5432/db
connection.user=db-user
connection.password=
auto.create=true
insert.mode=insert
table.name.format=transaction
pk.mode=none
I start the connector with
./bin/connect-standalone etc/schema-registry/connect-avro-standalone.properties etc/kafka-connect-jdbc/sink-quickstart-postgresql.properties
The sink-connector is created but does not start due to this error:
Caused by: org.apache.kafka.common.errors.SerializationException: Error deserializing Avro message for id -1
Caused by: org.apache.kafka.common.errors.SerializationException: Unknown magic byte!
The schema is in avro-format and registered and I can send (produce) messages to the topic and read (consume) from it. But I can't seem to sent it to the database.
This is my ./etc/schema-registry/connect-avro-standalone.properties
key.converter=io.confluent.connect.avro.AvroConverter
key.converter.schema.registry.url=http://localhost:8081
value.converter=io.confluent.connect.avro.AvroConverter
value.converter.schema.registry.url=http://localhost:8081
This is a producer feeding the topic using the java-api:
properties.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class);
properties.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, KafkaAvroSerializer.class);
properties.put(AbstractKafkaAvroSerDeConfig.SCHEMA_REGISTRY_URL_CONFIG, "http://localhost:8081");
try (KafkaProducer<String, Transaction> producer = new KafkaProducer<>(properties)) {
Transaction transaction = new Transaction();
transaction.setFoo("foo");
transaction.setBar("bar");
UUID uuid = UUID.randomUUID();
final ProducerRecord<String, Transaction> record = new ProducerRecord<>(TOPIC, uuid.toString(), transaction);
producer.send(record);
}
I'm verifying data is properly serialized and deserialized using
./bin/kafka-avro-console-consumer --bootstrap-server localhost:9092 \
--property schema.registry.url=http://localhost:8081 \
--topic transactions \
--from-beginning --max-messages 1
The database is up and running.
This is not correct:
The unknown magic byte can be due to a id-field not part of the schema
What that error means that the message on the topic was not serialised using the Schema Registry Avro serialiser.
How are you putting data on the topic?
Maybe all the messages have the problem, maybe only someā€”but by default this will halt the Kafka Connect task.
You can set
"errors.tolerance":"all",
to get it to ignore messages that it can't deserialise. But if all of them are not correctly Avro serialised this won't help and you need to serialise them correctly, or choose a different Converter (e.g. if they're actually JSON, use the JSONConverter).
These references should help you more:
https://www.confluent.io/blog/kafka-connect-deep-dive-converters-serialization-explained
https://www.confluent.io/blog/kafka-connect-deep-dive-error-handling-dead-letter-queues
http://rmoff.dev/ksldn19-kafka-connect
Edit :
If you are serialising the key with StringSerializer then you need to use this in your Connect config:
key.converter=org.apache.kafka.connect.storage.StringConverter
You can set it at the worker (global property, applies to all connectors that you run on it), or just for this connector (i.e. put it in the connector properties itself, it will override the worker settings)

Missing required argument "[zookeeper]"

i'm trying to start a consumer using Apache Kafka, it used to work well, but i had to format my pc and reinstall everything again, and now when trying to run this:
bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic test --from-beginning
this is what i'm getting:
Missing required argument "[zookeeper]"
Option Description
------ -----------
--blacklist <blacklist> Blacklist of topics to exclude from
consumption.
--bootstrap-server <server to connect
to>
--consumer.config <config file> Consumer config properties file.
--csv-reporter-enabled If set, the CSV metrics reporter will
be enabled
--delete-consumer-offsets If specified, the consumer path in
zookeeper is deleted when starting up
--formatter <class> The name of a class to use for
formatting kafka messages for
display. (default: kafka.tools.
DefaultMessageFormatter)
--from-beginning If the consumer does not already have
an established offset to consume
from, start with the earliest
message present in the log rather
than the latest message.
--key-deserializer <deserializer for
key>
--max-messages <Integer: num_messages> The maximum number of messages to
consume before exiting. If not set,
consumption is continual.
--metrics-dir <metrics directory> If csv-reporter-enable is set, and
this parameter isset, the csv
metrics will be outputed here
--new-consumer Use the new consumer implementation.
--property <prop>
--skip-message-on-error If there is an error when processing a
message, skip it instead of halt.
--timeout-ms <Integer: timeout_ms> If specified, exit if no message is
available for consumption for the
specified interval.
--topic <topic> The topic id to consume on.
--value-deserializer <deserializer for
values>
--whitelist <whitelist> Whitelist of topics to include for
consumption.
--zookeeper <urls> REQUIRED: The connection string for
the zookeeper connection in the form
host:port. Multiple URLS can be
given to allow fail-over.
my guess is that there's some kind of problem with the zookeeper connection port, because it's telling me to specify the port which zookeeper has to use to get connected to kafka. I'm not sure of this though, and don't know how to figure out the port to specify if this was the problem. Any suggestions??
Thanks in advance for the help
It looks like you are using an old version of the Kafka tools that requires to set --new-consumer if you want to directly connect to the brokers.
I'd recommend picking a recent version of Kafka so you only need to specify --bootstrap-server like in your example: http://kafka.apache.org/downloads

How can you set the max.message.bytes of a state store changelog topic?

I have a Kafka Streams application with messages up to 10MiB. I want to persist these messages in a state store, but Kafka Streams fails to produce to the internal changelog topic:
2017-11-17 08:36:19,792 ERROR RecordCollectorImpl - task [4_5] Error sending record to topic appid-statestorename-state-store-changelog. No more offsets will be recorded for this task and the exception will eventually be thrown
org.apache.kafka.common.errors.RecordTooLargeException: The request included a message larger than the max message size the server will accept.
2017-11-17 08:36:20,583 ERROR StreamThread - stream-thread [StreamThread-1] Failed while executing StreamTask 4_5 due to flush state:
By adding some logging, it looks like the default max.message.bytes setting of an internal topic is 1MiB.
The default max.message.bytes for the cluster is set to 50MiB.
Is it possible to tweak the configuration of internal topics of Kafka Streams applications?
A work-around is to start the streams application, let it create the topics, and afterwards alter the topic config. But this feels like a dirty hack.
./kafka-topics.sh --zookeeper ... \
--alter --topic appid-statestorename-state-store-changelog \
--config max.message.bytes=10485760
Kafka 1.0 allows to specify custom topic properties for internal topics via StreamsConfig.
You prefix those configs with "topic." and can use any configs as defined in TopicConfig.
See the original KIP for more details:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-173%3A+Add+prefix+to+StreamsConfig+to+enable+setting+default+internal+topic+configs

filebeat-kafka:WARN producer/broker/0 maximum request accumulated, waiting for space

when filebeat output data to kafka , there are many warning message in filebeat log.
..
*WARN producer/broker/0 maximum request accumulated, waiting for space
*WARN producer/broker/0 maximum request accumulated, waiting for space
..
nothing special in my filebeat config:
..
output.kafka:
hosts: ["localhost:9092"]
topic: "log-oneday"
..
i have also updated these socket setting in kafka:
...
socket.send.buffer.bytes=10240000
socket.receive.buffer.bytes=10240000
socket.request.max.bytes=1048576000
queued.max.requests=1000
...
but it did not work.
is there something i missing? or i have to increase those number bigger?
besides , no error or exception found in kafka server log
is there any expert have any idea about this ?
thanks
Apparently you have only one partition in your topic. Try to increase partitions for the topic. See the links below for more information.
More Partitions Lead to Higher Throughput
https://www.confluent.io/blog/how-to-choose-the-number-of-topicspartitions-in-a-kafka-cluster/
https://kafka.apache.org/documentation/#basic_ops_modify_topic
Try the following command (replacing info with your particular use case):
bin/kafka-topics.sh --zookeeper zk_host:port/chroot --alter --topic my_topic_name --partitions 40
You need to configure 3 things:
Brokers
Filebeat kafka output
Consumer
Here a example (change paths according your environment).
Broker configuration:
# open kafka server configuration file
vim /opt/kafka/config/server.properties
# add this line
# The largest record batch size allowed by Kafka.
message.max.bytes=100000000
# restart kafka service
systemctl restart kafka.service
Filebeat kafka output:
output.kafka:
...
max_message_bytes: 100000000
Consumer configuration:
# larger than the max.message.size
max.partition.fetch.bytes=200000000