Debezium before field is null for update operation - postgresql

My Debezium is run by server ,and my config file is below
debezium.sink.type=kafka
debezium.source.connector.class=io.debezium.connector.postgresql.PostgresConnector
debezium.source.offset.storage.file.filename=org.apache.kafka.connect.storage.FileOffsetBackingStore
debezium.source.offset.flush.interval.ms=0
debezium.source.database.hostname=localhost
debezium.source.database.port=5432
debezium.source.plugin.name=pgoutput
debezium.source.database.user=casdoor
debezium.source.database.password=casdoor
debezium.source.database.dbname=casdoor
debezium.source.database.server.name=casdoor
debezium.source.database.schema.name=public
debezium.source.database.table.name=user
debezium.source.key.converter=org.apache.kafka.connect.json.JsonConverter
debezium.source.value.converter=org.apache.kafka.connect.json.JsonConverter
debezium.sink.kafka.producer.key.serializer=org.apache.kafka.common.serialization.StringSerializer
debezium.sink.kafka.producer.value.serializer=org.apache.kafka.common.serialization.StringSerializer
quarkus.log.console.json=false
but the update data in kafka have no before part.
kafka data
as shown in the picture,the before is always null. but the operation is update data
{
"schema":Object{...},
"payload":{
"before":null,
"after":Object{...},
"source":Object{...},
"op":"u",
"ts_ms":1652168353005,
"transaction":null
}
}

Related

Unable to send message to kafka Producer using kafka-node

I am using default server.properties/zookeeper.properties files provided by Kafka framework.
I am trying to create a simple NodeJS app which would send messages to Producer and consume them.
Below is NodeJS code.
config.js
module.exports = {
kafka_topic: 'catalog',
kafka_server: 'localhost:9092',
};
nodejs-producer.js
const kafka = require('kafka-node');
const config = require('./config');
try {
// set the desired timeout in options
const options = {
timeout: 5000,
};
const Producer = kafka.Producer;
const client = new kafka.KafkaClient({kafkaHost: config.kafka_server, requestTimeout: 5000});
const producer = new Producer(client);
const kafka_topic = config.kafka_topic;
let payloads = [
{
topic: kafka_topic,
messages: 'This is test message'
}
];
producer.on('ready', async function() {
let push_status = producer.send(payloads, (err, data) => {
if (err) {
console.log(err.toString());
console.log('[kafka-producer -> '+kafka_topic+']: broker update failed');
} else {
console.log(data.toString());
console.log('[kafka-producer -> '+kafka_topic+']: broker update success');
}
});
});
producer.on('error', function(err) {
console.log(err);
console.log('[kafka-producer -> '+kafka_topic+']: connection errored');
throw err;
});
}
catch(e) {
console.log(e);
}
kafka version = 2.8.0
kafka-node version = 5.0.0
I am getting the error - Error: LeaderNotAvailable
How to fix this? I tried playing with different values in server.properties file like advertised.listeners but didn't get solution.
I have already answered this problem here
In short: this problem happens when trying to produce messages to a topic that doesn't exist.
You may configure your kafka installation to automatically create topic in such case: what will then happen is - in order: you will still receive the error message and the framework will create the topic. In my case i then had to re-produce the same message a second time but this was on an old version of Kafka.
EDIT:
here a link to a post which explains how to setup your kafka configuration to automatically create kafka topics.
I have also faced same issue while sending a message. I solved the issue by adding a partition in the payload and same partition is used in the consumer also.
Code I have used
Since I got this error in the development environment. I solved this problem by deleting the zookeeper snapshot and Kafka consumer offset.
NOTE: Don't do this on production.
rm -rf /tmp/zookeeper
rm -rf /tmp/kafka-logs

How can I acqurie the JSON data from Kafka using SparkStreaming

I use kafka monitor the alteration of LocalFile and SparkStreaming to analyse . But I can't extrct the data from the kafka because the format of data is JSON .
When I tap the command bin/kafka-console-consumer.sh --bootstrap-server master:9092,slave1:9092,slave2:9092 --topic kafka-streaming --from-beginning,
THE FORMAT OF DATA IS:
{
"schema": {
"type": "string",
"optional": false
},
"payload": "{\"like_count\": 594, \"view_count\": 49613, \"user_name\": \" w\", \"play_url\": \"http://upic/2019/04/08/12/BMjAxOTA0MDgxMjQ4MTlfMjA3ODc2NTY2XzEyMDQzOTQ0MTc4XzJfMw==_b_Bfa330c5ca9009708aaff0167516a412d.mp4?tag=1-1555248600-h-0-gjcfcmzmef-954d5652f100c12e\", \"description\": \"ţ ų ඣ 9 9 9 9\", \"cover\": \"http://uhead/AB/2016/03/09/18/BMjAxNjAzMDkxODI1MzNfMjA3ODc2NTY2XzJfaGQ5OQ==.jpg\", \"video_id\": 5235997527237673952, \"comment_count\": 39, \"download_url\": \"http://2019/04/08/12/BMjAxOTA0MDgxMjQ4MTlfMjA3ODc2NTY2XzEyMDQzOTQ0MTc4XzJfMw==_b_Bfa330c5ca9009708aaff0167516a412d.mp4?tag=1-1555248600-h-1-zdpjkouqke-5862405191e4c1e4\", \"user_id\": 207876566, \"video_create_time\": \"2019-04-08 12:48:21\", \"user_sex\": \"F\"}"
}
The version of spark is 2.3.0 and the kafka version is 1.1.0. The version of spark-streaming-kafka is 0-10_2.11-2.3.0.
The JSON data in the column of PAYOAD is I want to deal with and analyse. How can I change the codes to acquire the JSON data
Use org.apache.kafka.common.serialization.StringDeserializer and org.apache.kafka.common.serialization.StringSerializer for consuming and sending data to kafka topic respectively.
This way you will get a String on consumption which can very easily be converted to JSON Object using JSONParser

MQTT Kafka Source connector : funny byte characters

I am following https://github.com/kaiwaehner/kafka-connect-iot-mqtt-connector-example for connecting Mosquitto and Kafka with MQTT source connector. I am getting the data sent by the Mosquitto Publisher into the Mosquitto Subscriber and the Kafka Consumer. But the key and value field in my ConsumerRecord object of kafka-consumer is having some prepended byte characters.
Below are the code snippets and the outputs I'm getting.
mqttPublisher.py
while v3 < 3:
data3 = {
"time": str(datetime.datetime.now().time()),
"val": v3
}
client.publish("sensor/dist", json.dumps(data3), qos=2)
v3 += 1
time.sleep(2)
mqttSubscriber.py
def on_message_print(client, userdata, message):
print(message.topic,message.payload)
subscribe.callback(on_message_print, "sensor/#", hostname="localhost")
kafkaConsumer.py
consumer = KafkaConsumer('mqtt.',
bootstrap_servers=['localhost:9092'])
for message in consumer:
print(message)
Output:mqttSubscriber.py
sensor/dist b'{"time": "12:44:30.817462", "val": 0}'
sensor/dist b'{"time": "12:44:32.820040", "val": 1}'
sensor/dist b'{"time": "12:44:34.822657", "val": 2}'
Output : kafkaConsumer.py
ConsumerRecord(topic='mqtt.', partition=0, offset=225, timestamp=1545117270870, timestamp_type=0, key=b'\x00\x00\x00\x00\x01\x16sensor/dist', value=b'\x00\x00\x00\x00\x02J{"time": "12:44:30.817462", "val": 0}', headers=[('mqtt.message.id', b'0'), ('mqtt.qos', b'0'), ('mqtt.retained', b'false'), ('mqtt.duplicate', b'false')], checksum=None, serialized_key_size=17, serialized_value_size=43, serialized_header_size=62)
ConsumerRecord(topic='mqtt.', partition=0, offset=226, timestamp=1545117272821, timestamp_type=0, key=b'\x00\x00\x00\x00\x01\x16sensor/dist', value=b'\x00\x00\x00\x00\x02J{"time": "12:44:32.820040", "val": 1}', headers=[('mqtt.message.id', b'0'), ('mqtt.qos', b'0'), ('mqtt.retained', b'false'), ('mqtt.duplicate', b'false')], checksum=None, serialized_key_size=17, serialized_value_size=43, serialized_header_size=62)
ConsumerRecord(topic='mqtt.', partition=0, offset=227, timestamp=1545117274824, timestamp_type=0, key=b'\x00\x00\x00\x00\x01\x16sensor/dist', value=b'\x00\x00\x00\x00\x02J{"time": "12:44:34.822657", "val": 2}', headers=[('mqtt.message.id', b'0'), ('mqtt.qos', b'0'), ('mqtt.retained', b'false'), ('mqtt.duplicate', b'false')], checksum=None, serialized_key_size=17, serialized_value_size=43, serialized_header_size=62)
What is causing the above prepending of extra bytes in the Kafka Consumer?
Thanks in advance.
As part of the demo, you're starting a Schema Registry
Start Kafka Connect and dependencies (Kafka, Zookeeper, Schema Registry):
confluent start connect
If you look at the first 5 bytes, you'll see they start with 0, then four more bytes representing an integer.
See the Schema Registry Wire Format and try doing a curl localhost:8081/subjects to see if it lists your topic name for mqtt-key and mqtt-value.
If you didn't want Avro, you would need to configure and edit your Kafka Connect property file to use different Converters, and not use confluent start other than getting Kafka and Zookeeper running
Or if you want Python to deserialize the Avro, you can refer to the confluent-kafka-python repo on Github

Can’t get Logstash to read existing Kafka topic from start

I'm trying to consume a Kafka topic using Logstash, for indexing by Elasticsearch. The Kafka events are JSON documents.
We recently upgraded our Elastic Stack to 5.1.2.
I believe that I was able to consume the topic OK in 5.0, using the same settings, but that was a while ago so perhaps I'm doing something wrong now, but can't see it. This is my config (slightly sanitized):
input {
kafka {
bootstrap_servers => "host1:9092,host2:9092,host3:9092"
client_id => "logstash-elastic-5-c5"
group_id => "logstash-elastic-5-g5"
topics => "trp_v1"
auto_offset_reset => "earliest"
}
}
filter {
json {
source => "message"
}
mutate {
rename => { "#timestamp" => "indexedDatetime" }
remove_field => [
"#timestamp",
"#version",
"message"
]
}
}
output {
stdout { codec => rubydebug }
elasticsearch {
hosts => ["host10:9200", "host11:9200", "host12:9200", "host13:9200"]
action => "index"
index => "trp-i"
document_type => "event"
}
}
When I run this, no messages are consumed, no sign of activity appears in the log after "[org.apache.kafka.clients.consumer.internals.ConsumerCoordinator] Setting newly assigned partitions", and in Kafka Manager the consumer appears to immediately appear with "total lag = 0" for the topic.
This version of the Kafka plugin stores consumer offsets in Kafka itself, so each time I try to run Logstash against the same topic, I increment the group_id so in theory, it should start from the earliest offset for the topic.
Any advice?
EDIT: It appears that despite setting auto_offset_reset to "earliest", it isn't working - it's as if it's being set to "latest". I left Logstash running, then had more entries loaded into the Kafka queue, and they were processed by Logstash.

Reading from multiple topics in Apache Kafka

I'm trying to read from multiple kafka topics (say 'newtest-1' and 'newtest-2') using 'white_list' configuration in the logstash input plugin. My logstash conf looks like:
input { kafka { white_list => "newtest-1|newtest-2" } } output { stdout {codec => rubydebug } }
With this configuration I can successfully read from two different topics. But I want to use regex for input topics as I'm expecting the topics to be of the form 'newtest-*'. According to the suggestion in this link, the following configuration should work:
input { kafka { white_list => "newtest-*" } } output { stdout {codec => rubydebug } }
But with this I'm not able to read from kafka. Any help is appreciated.
The white_list should be newtest-.*
This is relevant to older versions of the plugin. Now you can use topics.