how to configure debezium fields sent on update events (mongo connector) - mongodb

I want to use debezium mongo connector to:
-> get events from mongo
-> get them in my kafka
-> read from kafka
my issue is, when debezium gets update events from mongo it only sends the updated fields:
The value of an update change event on this collection will actually
have the exact same schema, and its payload will be structured the
same but will hold different values. Specifically, an update event
will not have an after value and will instead have a patch string
containing the JSON representation of the idempotent update operation.
and I was wondering if i can configure it somehow cause there are some fields I would like to get with the updates events.

you should be able to solve the problem using Kafka Streams - in a similar (simplified) way as in this case https://debezium.io/blog/2018/03/08/creating-ddd-aggregates-with-debezium-and-kafka-streams/

Related

Debezium mongodb connector delete event doesn’t give the before event in payload

adding a question to this thread..
I am using the debezium connetor and everything is going smooth except with few things. The delete operation doesn’t give any after or patch or before event and it just returns the id in the key field of confluent kafka. Do you suggest any property that i can add in my connector configuration and also for the update event, i am expecting the after and before event where as the connector gives me only tha patch event with only the updated values in mongodb and here also i should go and check the key field in confluent kafka topic to get the id of that record which is changed.. any help? Mongodb version which i am using is 4.4

Mongo source connector dynamic target topic

I'm using the mongo source connector in order to produce messages from mongo to Kafka.
In my use case, I need to dynamically decide on the target topic, based on some of the mongo document values.
Is there any way to do so? I tried to use this SMT but it didn't work (the message wasn't produced to any topic).
Any ideas? Thanks!

JDBC sink connector insert/upsert based on max timestamp?

I'm very new to Kafka connect
I am inserting records from multiple sources into one table.
In some cases, it may be possible for some records to reach before others.
Since I cannot control which source will pull which record first, I want to add a check on the timestamp key of the record.
I have a key called "LastModified_timestamp" in my schema where I store the timestamp of the latest state of my record.
I want to add a check to my JDBC sink connector where I can upsert a record based on comparing the value of LastModified_timestamp
I want to ignore the records which have a older timestamp and only want to upsert/insert the latest one. I couldn't find any configuration to achieve this
Is there any way by which I can achieve this?
Will writing a custom query help in this case?
The JDBC Sink connector does not support this kind of feature. You have two options to consider:
Single Message Transform (SMT) - these apply logic to records as they pass through Kafka Connect. SMT are great for things like dropping columns, changing datatypes, etc. BUT not appropriate for more complex processing and logic, including logic which needs to span multiple records as yours does here
Process the data in the source Kafka topic first, to apply the necessary logic. You can do this with Kafka Streams, KSQL, and several other stream processing frameworks (e.g. Spark, Flink, etc). You'd need some kind of stateful logic that could work out if a record was older than already processed.
Can you describe more about your upstream source for the data? It might be there's a better way to orchestrate the data coming through to enforce the ordering.
A final idea would be to land all records to your target DB and then use logic in your database query consuming it to select the most recent (based on LastModified_timestamp) record for a given key.
Disclaimer: I work for Confluent, the company behind the open-source KSQL project.

Querying MySQL tables using Apache Kafka

I am trying to use Kafka Streams for achieving a use-case.
I have two tables in MySQL - User and Account. And I am getting events from MySQL into Kafka using a Kafka MySQL connector.
I need to get all user-IDs within an account from within Kafka itself.
So I was planning to use KStream on MySQL output topic, process it to form an output and publish it to a topic with Key as the account-id and value as the userIds separated by comma (,).
Then I can use interactive query to get all userIds using account id, with the get() method of ReadOnlyKeyValueStore class.
Is this the right way to do this? Is there a better way?
Can KSQL be used here?
You can use Kafka Connect to stream data in from MySQL, e.g. using Debezium. From here you can use KStreams, or KSQL, to transform the data, including re-keying which I think is what you're looking to do here, as well as join it to other streams.
If you ingest the data from MySQL into a topic with log compaction set then you are guaranteed to always have the latest value for every key in the topic.
I would take a look at striim if you want built in CDC and interactive continuous SQL queries on the streaming data in one UI. More info here:
http://www.striim.com/blog/2017/08/making-apache-kafka-processing-preparation-kafka/

Using kafka streams to create a table based on elasticsearch events

Is it possible to use Kafka streaming to create a pipeline that reads JSON from a Kafka topic and then do some logic with them and send the results to another Kafka topic or something else?
For example, I populate my topic using logs from elasticsearch. That is pretty easy using a simple logstash pipeline.
Once I have my logs in the kafka topic, I want to extract some pieces of information from the log and put them in a sort of "table" with N column(is Kafka capable of this?) and then put the table somewhere else (another topic or a db).
I didn't find any example that satisfies my criteria.
thanks
Yes, it's possible.
There is no concept of columns in kafka or kafka-streams. However, you typically just define a plain old java object of your choice, with the fields that your want (fields being the equivalent of columns in this case). You produce the output in that format to an output topic (using an appropriately chosen serializer). Finally, if you want to store the result in a relational database, you map the fields into columns, typically using a kafka connect jdbc sink:
http://docs.confluent.io/current/connect/connect-jdbc/docs/sink_connector.html