Confluent Cloud -> BigQuery - How to diagnose "Bad records" cause - apache-kafka

I could push the data from MSSql Server to Topics on Confluent Cloud,but not from topics to BigQuery, it throws an error "Bad records in the last hour - 65"
I could able to connect the topics to bigQuery but not able to ingest the data.
MSSQL and BigQuery table format are the same
first(string) last(string)
raj ram
Do I need to add any other columns to ingest data such as timestamp, offset,etc.?

If there are messages that can't be sent to the target they'll get written to a Dead Letter Queue with details of the problem.
From the Connectors screen you can see the ID of your connector
Use that id to locate a topic with the same name and a dlq- prefix.
You can then browse the topic and use the header information to determine the cause of the problem
If you prefer you can use kafkacat to view the headers:
$ docker run --rm edenhill/kafkacat:1.5.0 \
-X security.protocol=SASL_SSL -X sasl.mechanisms=PLAIN \
-X ssl.ca.location=./etc/ssl/cert.pem -X api.version.request=true \
-b ${CCLOUD_BROKER_HOST} \
-X sasl.username="${CCLOUD_API_KEY}" \
-X sasl.password="${CCLOUD_API_SECRET}" \
-t dlq-lcc-emj3x \
-C -c1 -o beginning \
-f 'Topic %t[%p], offset: %o, Headers: %h'
Topic dlq-lcc-emj3x[0], offset: 12006, Headers: __connect.errors.topic=mysql-01-asgard.demo.transactions,__connect.errors.partition=5,__connect.errors.offset=90,__connect.errors.connector.name=lcc-emj3x,__connect.errors.task.id=0,__connect.errors.stage=VALUE_CONVERTER,__connect.errors.class.name=org.apache.kafka.connect.json.JsonConverter,__connect.errors.exception.class.name=org.apache.kafka.connect.errors.DataException,__connect.errors.exception.message=Converting byte[] to Kafka Connect data failed due to serialization error: ,__connect.errors.exception.stacktrace=org.apache.kafka.connect.errors.DataException: Converting byte[] to Kafka Connect data failed due to serialization error:
at org.apache.kafka.connect.json.JsonConverter.toConnectData(JsonConverter.java:344)
at org.apache.kafka.connect.runtime.WorkerSinkTask.lambda$convertAndTransformRecord$1(WorkerSinkTask.java:487)
at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndRetry(RetryWithToleranceOperator.java:128)
at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndHandleError(RetryWithToleranceOperator.java:162)
at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execute(RetryWithToleranceOperator.java:104)
at org.apache.kafka.connect.runtime.WorkerSinkTask.convertAndTransformRecord(WorkerSinkTask.java:487)
at org.apache.kafka.connect.runtime.WorkerSinkTask.convertMessages(WorkerSinkTask.java:464)
at org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:320)
at org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:224)
at org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:192)
at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:177)
at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:227)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: org.apache.kafka.common.errors.SerializationException: com.fasterxml.jackson.core.JsonParseException: Illegal character ((CTRL-CHAR, code 0)): only regular white space (\r, \n, \t) is allowed between tokens
at [Source: (byte[])"
From there on in it's just a case of understanding the error. A lot of the time it's down to serialisation issues, which you can learn more about here.

Related

Not able to get the data insert into snowflake database, through snowflake kafka connector, however connector is getting started

Kafka,Zookeeper user#ip-XX-XX-XX-XXX config]$ curl -X POST -H "Content-Type: application/json" --data #snowflake-connector.json http://XX.XX.XX.XXX:8083/connectors
{"name":"file-stream-distributed","config":{"connector.class":"com.snowflake.kafka.connector.SnowflakeSinkConnector","tasks.max":"1","topics":"uat.product.topic","buffer.count.records":"10000","buffer.flush.time":"60","buffer.size.bytes":"5000000","snowflake.url.name":"XXX00000.XX-XXXX-0.snowflakecomputing.com:443","snowflake.user.name":"kafka_connector_user_1","snowflake.private.key":"XXXXX,"snowflake.database.name":"KAFKA_DB","snowflake.schema.name":"KAFKA_SCHEMA","key.converter":"org.apache.kafka.connect.storage.StringConverter","key.converter.schemas.enable":"true","value.converter":"com.snowflake.kafka.connector.records.SnowflakeJsonConverter","value.converter:schemas.enable":"true","value.converter.schema.registry.url":"http://localhost:8081","name":"file-stream-distributed"},"tasks":[],"type":"sink"}[Panamax-UAT-Kafka,Zookeeper deploy#ip-XX-XX-XX-XXX config]$
Kafka,Zookeeper user#ip-XX-XX-XX-XXX config]$
Checking for the status of connector: But its giving not found
Kafka,Zookeeper user#ip-XX-XX-XX-XXX config]$ curl XX.XX.XX.XXX:8083/connectors/file-stream-demo-distributed/tasks
{"error_code":404,"message":"Connector file-stream-demo-distributed not found"}
Kafka,Zookeeper user#ip-XX-XX-XX-XXX config]$
And the data is not getting inserted into the database
logs: /opt/Kafka/kafka/logs/connect.logs:
[2022-05-29 14:51:22,521] INFO [Worker clientId=connect-1, groupId=connect-cluster] Joined group at generation 39 with protocol version 2 and got assignment: Assignme nt{error=0, leader='connect-1-fee23ff6-e61d-4e4c-982d-da2af6272a08', leaderUrl='http://XX.XX.XX.XXX:8083/', offset=4353, connectorIds=[], taskIds=[], revokedConnector Ids=[], revokedTaskIds=[], delay=0} with rebalance delay: 0 (org.apache.kafka.connect.runtime.distributed.DistributedHerder:1681)
You need to specify a user that has appropriate CREATE TABLE privileges for use in Kafka connector. Suggest you review the documention.

Use kafka-avro-console-consumer with already registered schema - error 500

Using the kafka-avro-console-producer cli
When trying the following command
kafka-avro-console-producer \
--broker-list <broker-list> \
--topic <topic> \
--property schema.registry.url=http://localhost:8081 \
--property value.schema.id=419
I have this error
org.apache.kafka.common.errors.SerializationException: Error registering Avro schema {...}
Caused by: io.confluent.kafka.schemaregistry.client.rest.exceptions.RestClientException: Internal Server Error; error code: 500
I can’t understand why is it trying to register the schema as the schema already exists and I’m trying to use it through its ID within the registry.
Note: my schema registry is in READ_ONLY mode, but as I said it should not be an issue right?
Basically I needed to ask the producer not to try to auto register the schema using this property
--property auto.register.schemas=false
Found out here Use kafka-avro-console-producer without autoregister the schema

Confluent Kafka Rest Proxy - Avro Deserialization

I am trying to use Confluent Kafka REST Proxy to retrieve data in Avro format from one of my topics but unfortunately I get a deserialization error. I am querying the Kafka REST proxy using the following command
curl -X GET -H "Accept: application/vnd.kafka.avro.v2+json"
http://localhost:8082/consumers/my-group/instances/my-consumer/records?timeout=30000
And I get as response
{
"error_code": 50002,
"message": "Kafka error: Error deserializing key/value for partition input-0 at offset 0. If needed, please seek past the record to continue consumption."
}
and the logs on Kafka Rest Proxy server are:
org.apache.kafka.common.errors.SerializationException: Error deserializing key/value for partition input-0 at offset 0. If needed, please seek past the record to continue consumption.
Caused by: org.apache.kafka.common.errors.SerializationException: Error deserializing Avro message for id -1
Caused by: org.apache.kafka.common.errors.SerializationException: Unknown magic byte!
The data have been produced using KafkaAvroSerializer and the schema is present on the Schema Registry. Also note that data are readable by using avro-console-consumer on CLI.
Does anybody know how to resolve this issue?
It's most likely that as well as valid Avro messages on the topic, you also have invalid ones. That's what this error means, and is exactly the error that I got when I tried to consume a non-Avro message locally with the REST Proxy:
ERROR Unexpected exception in consumer read task id=io.confluent.kafkarest.v2.KafkaConsumerReadTask#2e20d4f3 (io.confluent.kafkarest.v2.KafkaConsumerReadTask)
org.apache.kafka.common.errors.SerializationException: Error deserializing key/value for partition avrotest-0 at offset 2. If needed, please seek past the record to continue consumption.
Caused by: org.apache.kafka.common.errors.SerializationException: Error deserializing Avro message for id -1
Caused by: org.apache.kafka.common.errors.SerializationException: Unknown magic byte!
I would use a tool such as kafkacat to inspect the actual messages at the offset given in the error, e.g.:
kafkacat -C -b localhost:9092 -t test_topic_avro -o 0 -c 1
The -o 0 will consume the message at offset 0, and -c 1 means consume just one message.
You can also seek past the problematic offset, e.g. for topic avrotest move the offset to 1:
echo '{ "offsets": [ { "topic": "avrotest", "partition": 0, "offset": 1 } ] }' | \
http POST localhost:8082/consumers/rmoff_consumer_group/instances/rmoff_consumer_instance/positions \
Content-Type:application/vnd.kafka.v2+json
It wasn't supported to have String keys and AVRO values in the rest proxy until recently:
https://github.com/confluentinc/kafka-rest/issues/210
So recently that the code has been merged, but issue is still open and docs haven't been updated fully:
https://github.com/confluentinc/kafka-rest/pull/797

Shows invalid characters while consuming using kafka console consumer

While consuming from the Kafka topic using Kafka console consumer or kt(GoLang CLI tool for Kafka), I am getting invalid characters.
...
\u0000\ufffd?\u0006app\u0000\u0000\u0000\u0000\u0000\u0000\u003e#\u0001
\u0000\u000cSec-39\u001aSome Actual Value Text\ufffd\ufffd\ufffd\ufffd\ufffd
\ufffd\u0015#\ufffd\ufffd\ufffd\ufffd\ufffd\ufff
...
Even though Kafka connect can actually sink the proper data to an SQL database.
Given that you say
Kafka connect can actually sink the proper data to an SQL database.
my assumption would be that you're using Avro serialization for the data on the topic. Kafka Connect configured correctly will take the Avro data and deserialise it.
However, console tools such as kafka-console-consumer, kt, kafkacat et al do not support Avro, and so you get a bunch of weird characters if you use them to read data from a topic that is Avro-encoded.
To read Avro data to the command line you can use kafka-avro-console-consumer:
kafka-avro-console-consumer
--bootstrap-server kafka:29092\
--topic test_topic_avro \
--property schema.registry.url=http://schema-registry:8081
Edit: Adding a suggestion from #CodeGeas too:
Alternatively, reading data using REST Proxy can be done with the following:
# Create a consumer for JSON data
curl -X POST -H "Content-Type: application/vnd.kafka.v2+json" \
-H "Accept: application/vnd.kafka.v2+json" \
--data '{"name": "my_consumer_instance", "format": "avro", "auto.offset.reset": "earliest"}' \
# Subscribe the consumer to a topic
http://kafka-rest-instance:8082/consumers/my_json_consumer
curl -X POST -H "Content-Type: application/vnd.kafka.v2+json" \
--data '{"topics":["YOUR-TOPIC-NAME"]}' \
http://kafka-rest-instance:8082/consumers/my_json_consumer/instances/my_consumer_instance/subscription
# Then consume some data from a topic using the base URL in the first response.
curl -X GET -H "Accept: application/vnd.kafka.avro.v2+json" \
http://kafka-rest-instance:8082/consumers/my_json_consumer/instances/my_consumer_instance/records
Later, to delete the consumer afterwards:
curl -X DELETE -H "Accept: application/vnd.kafka.avro.v2+json" \
http://kafka-rest-instance:8082/consumers/my_json_consumer/instances/my_consumer_instance
By default, the console consumer tools deserializes both the message key and value using ByteArrayDeserializer but then obviously tries to print data to the command line using the default formatter.
This tool however allows to customize the deserializers and formatter used. See the following extract from the help output:
--formatter <String: class> The name of a class to use for
formatting kafka messages for
display. (default: kafka.tools.
DefaultMessageFormatter)
--property <String: prop> The properties to initialize the
message formatter. Default
properties include:
print.timestamp=true|false
print.key=true|false
print.value=true|false
key.separator=<key.separator>
line.separator=<line.separator>
key.deserializer=<key.deserializer>
value.deserializer=<value.
deserializer>
Users can also pass in customized
properties for their formatter; more
specifically, users can pass in
properties keyed with 'key.
deserializer.' and 'value.
deserializer.' prefixes to configure
their deserializers.
--key-deserializer <String:
deserializer for key>
--value-deserializer <String:
deserializer for values>
Using these settings, you should be able to change the output to be what you want.

Exception Occurred Subject not found error code - Confluent

I can see an error in my logs that Subject with name A.Abc-key is not present.
I listed all the subjects and verified that the A.Abc-key is not present but the A.Abc-value is present
On checking property-key for same Topic i get following error :
./kafka-avro-console-consumer --bootstrap-server http://localhost:9092 --from-beginning --property print.key=true --topic A.Abc
null Processed a total of 1 messages
[2018-09-05 16:26:45,470] ERROR Unknown error when running consumer: (kafka.tools.ConsoleConsumer$:76)
org.apache.kafka.common.errors.SerializationException: Error deserializing Avro message for id 80
Caused by: java.net.ConnectException: Connection refused (Connection refused)
at java.net.PlainSocketImpl.socketConnect(Native Method)
I am not sure hot to debug and fix this.
Your error is HTTP related, so make sure your registry is running on localhost since you have not specified it
and verified that the A.Abc-key is not present
Then your key is not Avro, but Avro console consumer will try to deserialize your keys as Avro if you add the print key property
You can try adding key-deserializer and if your registry is not on localhost, you must specify it
--property schema.registry.url="http://..." \
--property key-deserializer=org.apache.kafka.common.serialization.StringDeserializer \
--property print.key=true