DB2 Partition and Queue Replication - db2

The version of DB2 is v11.1. This question is with respect to DB2 Queue Replication and dropping of partitions.
Scenario is that there are 2 tables. Tab1 is partitioned and Tab2 is not partitioned. There is queue DB2 replication setup between Tab1 and Tab2 to replicate deletes. Question is that if we do a drop partition on Tab1 will it replicate the delete of the rows to Tab2.
For e.g. There are 10 rows available in partition1 on Tab1. The same 10 rows are present in Tab2 due to replication.
When a drop partition is triggered on Tab1, will the 10 rows from Tab2 get deleted too?
If we have to achieve the effect, can we implement a custom solution?

Q-rep won't turn a drop partition into deletes.
https://www.ibm.com/support/knowledgecenter/en/SSTRGZ_10.2.1/com.ibm.swg.im.iis.db.repl.sqlrepl.doc/topics/iiyrscapparttblv97fp2.html?cp=SSEPGG_11.1.0
The DETACH operation is not replicated. The data that is deleted from the source table by the DETACH operation is not deleted from the target table. If you need to change the target data partition into a separate table, you need to do so manually.
The operation is also explicitly excluded in schema level subscriptions.
https://www.ibm.com/support/knowledgecenter/SSTRGZ_10.2.0/com.ibm.swg.im.iis.repl.qrepl.doc/topics/iiyrqsubcrtschemasub.html
I guess your options are to delete all the rows on the source partition and so only drop empty partitions, or to manually delete the rows on the target when you do the DROP PARTITION.
You could MDC the source tables to speed source deletes, but then again, not sure Q-rep would work with a MDC block delete on the source but a non MDC target. Is your target row organised and hence the partition difference?

Related

JDBC Source Connector - table.whitelist - 5 Tables in List - What is The Order of Execution

I have 5 tables listed in JDBC Source connector table.whitelist. Are they all synced from PostgreSQL to Kafka in parallel or one table after the other. If it is one table the other, what is the order (the order the tables are listed in table.whitelist?, alphabetically?). Is the order guaranteed?
Is there a way to guarantee the order?

In Kafka, how to handle deleted rows from source table that are already reflected in Kafka topic?

I am using a JDBC source connector with mode timestamp+incrementing to fetch table from Postgres, using Kafka Connect. The updates in data are reflected in Kafka topic but the deletion of records has no effect. So, my questions are:
Is there some way to handle deleted records?
How to handle records that are deleted but still present in kafka topic?
The recommendation is to either 1) adjust your source database to be append/update only, as well, either via a boolean or timestamp that is filtered out when Kafka Connect queries the table.
If your database is running out of space, then you can delete old records, which should already have been processed by Kafka
Option 2) Use CDC tools to capture delete events immediately rather than missing them in a period table scan. Debezium is a popular option for Postgres
A Kafka topic can be seen as an "append-only" log. It keeps all meesages for as long as you like but Kafka is not built to delete individual messages out of a topic.
In the scenario you are describing it is common that the downstream application (consuming the topic) handles the information on a deleted record.
As an alternative you could set the cleanup.policy of your topic to compact which means it will eventually keep only the latest value for each key. If you now define the key of a message as the primary key of the Postgres table, your topic will eventually delete the record when you produce a message with the same key and a null value into the topic. However,
I am not sure if your connector is flexible to do this
Depending on what you do with the data in the kafka topic, this could still not be a solution to your problem as the downstream application will still read both record, the original one and the null message as the deleted record.

Duplicate data in ksql table | How to update the rows of ksql table on same ROWKEY update?

I've created a table in ksql from kafka topic. Pushed a set of data to topic and table's populated. Posted a query and got the response too. For the 2nd time I've pushed the same data into the topic and the table gets loaded again. Now, when I query, response is 2 rows instead of 1 with 2 different ROWTIME timestamps.
I believe ksql table should overwrite the value if the same key comes in and retain the latest value. But that's not happening. Is my understanding correct?
What should be done for the table to keep the latest value and discard previous value on same key is inserted/updated?
Thanks
As far as I know it is not possible to apply a log compaction policy in order to keep exactly one message per key. Even if you set cleanup.policy=compact (topic-level) or log.cleanup.policy=compact (global level), there is no guarantee that only the latest message will be kept and older ones will be compacted.
According to the official Kafka documentation:
Log compaction gives us a more granular retention mechanism so that we
are guaranteed to retain at least the last update for each primary key

Filter Repeated Messages In Kafka

PREFACE:
In our organization we're trying to use Kafka to solve a problem that involves capturing changes in an Oracle Database and send through Kafka. It's in fact a CDC, we are using Kafka Connector for that.
We catch the changes in Oracle, using Oracle Flashback queries, this allows us to get the timestamp of the change and operation (Insert, Delete, Update) involved.
Once some change is made in a table we observe, the Kafka Connector publishes this to a topic, we further read this topic using Kafka Streams.
The problem is that sometimes there are equal lines that appears in the Flashback query, because of some Update in the table that didn't change nothing (this triggers a flashback change too), or if the table has 100 columns, and we watch only 20, it end up that we see repeated lines in the query because none of those 20 fields has changed.
We use flashback to get changed rows (including excluded ones). In the connector we set timestamp+increment mode (timestamp is obtained by the field versions_starttime of flashback query)
Important: we can't touch the DB more than this, I mean, we can't create triggers instead of using this already Flashback scheme.
THE QUESTION
We're trying to filter records in Kafka, if some (key, value) is equally in content we want to discard. Note that, this is not exactly once semantics. The record will be repeated with large timestamps differences.
If I use a KTable to check the last value of some record, how efficient this will be after a long period?
I mean, internal state storage of consumers, is handled by RocksDB and a backed Kafka Topic, since if I use a non windowed KTable this internal space could end up being very large.
Which is considered a good approach in this scenario? To not overload Kafka Consumers internal state storage, and the same time being capable to know if the actual record was already processed time ago.

Can we have partitions within partition in a Hive table?

Can we do partitions within partitions in Hive table.
I mean can we partition a partitioned table? or is bucketing the only option in Hive tables?
Hive supports multiple levels of partitioning. But keep in mind that having more than a single level of partitioning in Hive is almost never a good idea. HDFS is really optimized for manipulating large files, ~100MB and larger. Each partition of a Hive table is a HDFS directory. There are normally multiple files in each of these directories. You really should be closing on a petabyte of data to make multiple levels of partitioning in a Hive table.
What problem are you trying to solve? I'm sure we can find a sensible solution for it.