Is it possible to create a stream in ksqlDB to emit an ID that detects changes from 3 different tables that are joined together? - apache-kafka

Is it possible to create a stream that detects changes on 3 different tables? For example, I have Table A which contains Ids for Table B and Table C. If I constructed my join query correctly. could I emit an event that contains Table A's id if there was a change in Table B or C?
Table A
id
b_id
c_id
field_abc
field_xyz
Table B
id
foo
Table C
id
bar
I want a stream that will emit Table A id's if there is any changes in any of those 3 tables. Is this possible?
For example, if fields field_abc, foo, or bar were to change, I want Table A's id to be emitted to a stream.

I recently ran into a similar issue as what you're describing. Currently this isn't possible using streams or tables due to limitations on ksqlDB. We did find a way to achieve the same results though.
Our solution was to create a custom query with the connector that creates a 3-way joined table and combines the updated fields on the 3 tables.
CREATE SOURCE CONNECTOR xyz_change WITH (
'connector.class' = '${v_connector_class}',
'connection.url' = '${v_connection_url}',
'connection.user' = '${v_connection_user}',
'connection.password' = '${v_connection_pass}',
'topic.prefix' = 'jdbc_abc_change',
'mode' = 'timestamp+incrementing',
'numeric.mapping' = 'best_fit',
'incrementing.column.name' = 'id',
'timestamp.column.name' = 'last_modified',
'key' = 'id',
'key.converter' = '${v_converter_long}',
'query' = 'select id, last_modified from(select a.id as id, GREATEST(a.last_modified, COALESCE(b.last_modified,from_unixtime(0)), COALESCE(c.last_modified,from_unixtime(0))) as last_modified from aaa a LEFT JOIN bbb b on a.fk_id = b.id LEFT JOIN ccc c on a.fk_id = c.id ) sub'
);
With this you're able to create any streams/tables you need off of it.

Related

Return closest timestamp from Table B based on timestamp from Table A with matching Product IDs

Goal: Create a query to pull the closest cycle count event (Table C) for a product ID based on the inventory adjustments results sourced from another table (Table A).
All records from Table A will be used, but is not guaranteed to have a match in Table C.
The ID column will be present in both tables, but is not unique in either, so that pair of IDs and Timestamps together are needed for each table.
Current simplified SQL
SELECT
A.WHENOCCURRED,
A.LPID,
A.ITEM,
A.ADJQTY,
C.WHENOCCURRED,
C.LPID,
C.LOCATION,
C.ITEM,
C.QUANTITY,
C.ENTQUANTITY
FROM
A
LEFT JOIN
C
ON A.LPID = C.LPID
WHERE
A.facility = 'FACID'
AND A.WHENOCCURRED > '23-DEC-22'
AND A.ADJREASONABBREV = 'CYCLE COUNTS'
ORDER BY A.WHENOCCURRED DESC
;
This is currently pulling the first hit on C.WHENOCCURRED on the LPID matches. Want to see if there is a simpler JOIN solution before going in a direction that creates 2 temp tables based on WHENOCCURRED.
I have a functioning INDEX(MATCH(MIN()) solution in Excel but that requires exporting a couple system reports first and is extremely slow with X,XXX row tables.
If you are using Oracle 12 or later, you can use a LATERAL join and FETCH FIRST ROW ONLY:
SELECT A.WHENOCCURRED,
A.LPID,
A.ITEM,
A.ADJQTY,
C.WHENOCCURRED,
C.LPID,
C.LOCATION,
C.ITEM,
C.QUANTITY,
C.ENTQUANTITY
FROM A
LEFT OUTER JOIN LATERAL (
SELECT *
FROM C
WHERE A.LPID = C.LPID
AND A.whenoccurred <= c.whenoccurred
ORDER BY c.whenoccurred
FETCH FIRST ROW ONLY
) C
ON (1 = 1) -- The join condition is inside the lateral join
WHERE A.facility = 'FACID'
AND A.WHENOCCURRED > DATE '2022-12-23'
AND A.ADJREASONABBREV = 'CYCLE COUNTS'
ORDER BY A.WHENOCCURRED DESC;

Stream Joins based on distinct values from a previous stream

I have a main table A which stores all the events which occur with some details.
Then for each "event" in Table A, there is a separate table by that name (called the event table)
I am using the main table A as a stream and ideally need to perform a join to the event table so that I get a single table for Table A with its respective details.
In the case below there are two distinct event types tables each with its own table and schema.
Example:
Table A:
id
time
detail_1
detail_2
event
Table event 1:
id
detail_6
detail_8
detail_9
Table event 2:
id
detail 11
detail 12
How do I union it so I have a single table in the end with the corresponding details from Table A, event 1 and event 2?
Here is what I was trying to do:
df = (
spark.readStream.format("delta")
.option("ignoreChanges", "true")
.load(f"{table_name}")
)
event_types = df.select("event").distinct().collect()
for row in event_types:
event = row[0].replace(" ", "_").replace(":","").lower()
if event in ["task", "module", "_created", "test_created", "test_to_be_deleted"]:
df_event = spark.readStream.format("delta").option("ignoreChanges", "true").load(f"{event}")
joined_df = df.join(df_event, Seq("message_id"),"inner")
df.writeStream.format("delta").outputMode("append").option(
"checkpointLocation",
f"{table}",
).trigger(once=True).foreachBatch(apple_a_bunch_of_changes).start()
is there a better way to do this?

Issue using INNER JOIN on multiple tables in Postgres

I am trying to create a new table by using inner join to combine multiple tables. All, the tables have a primary key/column called reach_id. I have a primary table called q3_studies. I want all of the columns from this table. I then have multiple other tables that have reach_id + another column. I want to JOIN this table ON reach_id that matches q3_studies but only include the other columns (so I don't have redundant reach_id columns). My first attempt seems to work if I run it from SELECT * ... using a LIMIT 1000; at the end, but adds redundant reach_ids.
SELECT * FROM second_schema.q3_studies s
INNER JOIN second_schema.bs_trigger_q3 b ON s.reach_id = b.reach_id
INNER JOIN second_schema.mod_unmod_q3 m ON s.reach_id = m.reach_id LIMIT 1000;
How can I amend this to add only the additional columns (ex: bs_trigger_q3 has an additional columns called bs_trigger, mod_unmod_q3 has an additional column called mod_unmod)?
Secondly, if I try to create a new table, I get an error: column reach_id specified more than one. What am I doing wrong here?
CREATE TABLE first_schema.report_q3 AS
SELECT * FROM second_schema.q3_studies s
INNER JOIN second_schema.bs_trigger_q3 b ON s.reach_id = b.reach_id
INNER JOIN second_schema.mod_unmod_q3 m ON s.reach_id = m.reach_id;
Instead of select * you need to list the columns you want explicitly. This is good practice in any case. It also allows you to rename columns e.g. s.column_A as "foo_column"
In the future the schema may change.
CREATE TABLE first_schema.report_q3 AS
SELECT
s.reach_id,
s.column_A, s.column_B,
b.column_C, b.column_D,
m.column_E, m.column_F
FROM second_schema.q2_studies s
INNER JOIN second_schema.bs_trigger_q3 b ON s.reach_id = b.reach_id
INNER JOIN second_schema.mod_unmod_q3 m ON s.reach_id = m.reach_id
;
If your editor does not help you with column names consider a different editor.

Query to retrive only Identity Columns in Teradata

In Oracle DBA_SEQUENCES will retrieve all the sequences columns from each and every table.
Can you please tell me how can I find the same in Teradata?
Identity information is stored in dbc.idcol, there's no Data Dictionary view on top of that, but it's easy to write:
SELECT
d.DatabaseName
,t.tvmName AS TABLENAME
,c.FieldName
,id.AvailValue
,id.StartValue
,id.MINVALUE
,id.MAXVALUE
,id.INCREMENT
,id.cyc
FROM dbc.IdCol AS id
JOIN dbc.Dbase AS d
ON id.DatabaseId = d.DatabaseId
JOIN dbc.tvm AS t
ON id.TableId = t.tvmID
JOIN dbc.TVFields AS c
ON c.TableId = id.TableID
WHERE c.IdColType IS NOT NULL
;

Select all values from 2 tables

I have 3 tables code first, A and B and a join table (one to many relationship to link A and B), I would like to get all the results, not duplicate of the tables A and B and return it as selectList:
var a = from s in db.A
join ss in db.B on s.ps_id equals ss.ss_id
orderby s.ps_label
select new SelectListItem
{
Text = s.ps_id.ToString(),
Value = s.ps_label
};
return a;
This only returns results from the A table, but not from B as well. What is wrong, and what is your advice for best practice and performance for this?