Create a table from a topic using a where clause using ksql - apache-kafka

I'm using the latest version of Kafka sql server 0.29.2, I guess. I'm trying to create a reading table that reads from a specific topic which receives lots of events, but I'm interested in specific events. The JSON event has a property named "evenType", so I want to continually filter the events and create a specific table to store the client data, like phone number, email etc., to update the client info.
I created a stream called orders_inputs only for testing purposes, and then I tried to create this table, but I got that error.
create table orders(orderid varchar PRIMARY KEY, itemid varchar) WITH (KAFKA_TOPIC='ORDERS', PARTITIONS=1, REPLICAS=1) as select orderid, itemid from orders_inputs where type='t1';
line 1:120: mismatched input 'as' expecting ';'
Statement: create table orders(orderid varchar PRIMARY KEY, itemid varchar) WITH (KAFKA_TOPIC='ORDERS', PARTITIONS=1, REPLICAS=1) as select orderid, itemid from orders_inputs where type='t1';
Caused by: line 1:120: mismatched input 'as' expecting ';'
Caused by: org.antlr.v4.runtime.InputMismatchException

If you are wanting to create a table that contains the results of a select query from a stream you can use CREATE TABLE AS SELECT
https://docs.confluent.io/5.2.1/ksql/docs/developer-guide/create-a-table.html#create-a-ksql-table-with-streaming-query-results
e.g.
CREATE TABLE orders AS
SELECT orderid, itemid FROM orders_inputs
WHERE type='t1';
You can specify the primary key when creating the stream order_inputs: https://docs.confluent.io/5.4.4/ksql/docs/developer-guide/syntax-reference.html#message-keys
Otherwise, you can specify the primary key when creating a table from a topic:
https://docs.confluent.io/5.2.1/ksql/docs/developer-guide/create-a-table.html#create-a-table-with-selected-columns
e.g.
CREATE TABLE orders
(orderid VARCHAR PRIMARY KEY,
itemid VARCHAR)
WITH (KAFKA_TOPIC = 'orders',
VALUE_FORMAT='JSON');
However, you would then have to query the table and filter where type=t1

Related

Insert into foreign table partition doesn't work

I have some table in mariadb.
I need to create foreign table partition for that table
CREATE TABLE IF NOT EXISTS users (
id serial NOT NULL,
name varchar(30)
) partition by range(id);
CREATE foreign TABLE IF NOT EXISTS users_p
partition of users for values from(0) to (10000)
SERVER test22
options (table_name 'users');
If i try to get some data all is OK
but then i try insert something
insert into users (id,name) values (111,'somename');
I got an error (text depends on fdw)
COPY and foreign partition routing not supported in mysql_fdw
i tried two variants of fdw.
EnterpriseDB/mysql_fdw and pgspider/jdbc_fdw
Is there a fdw that supports insert if foreign table is partition? Or any variants how can i realize that
pgspider/mysql_fdw - solves my problem

PostgreSQL two-table constaint not working properly

I run the following code:
-- Table describing messages
CREATE TABLE messages
(
id serial PRIMARY KEY NOT NULL,
text TEXT -- Message can have or not have text
);
-- Table describing media attached to messages
CREATE TABLE messages_attachments
(
message_id integer NOT NULL REFERENCES messages,
-- Messages can have any number of attachments, including 0
attachment_id TEXT NOT NULL
);
-- Messages must have either text or at least one attachment
CREATE FUNCTION message_has_text_or_attachments(integer) RETURNS bool STABLE
AS
$$
SELECT
EXISTS(SELECT 1 FROM messages_attachments WHERE message_id = $1)
OR
(SELECT text IS NOT NULL FROM messages WHERE id = $1);
$$ LANGUAGE SQL;
ALTER TABLE messages ADD CONSTRAINT nonempty_message CHECK ( message_has_text_or_attachments(id) );
-- Insert a message with no text and no attachments. Should fail, but it does not
INSERT INTO messages(text) VALUES (NULL);
SELECT *, message_has_text_or_attachments(id) FROM messages;
I expected it to fail on the INSERT line because the row being inserted violates the check constraint (we are inserting a message which's text is NULL and there are no attachments for that message), but it runs successfully and the next query returns (1, NULL, false) (here is an example with slightly modified function definition (apostrophes instead of dollar symbols because of the database version).
One more interesting thing is that if I change the order of the commands and INSERT the row before adding the CONSTRAINT, then PostgreSQL fails to ALTER the table, because "check constraint "nonempty_message" is violated by some row".
Why does PostgreSQL allow inserting the row, which violates the constraint? Am I mistaken somewhere in the function definition? Is there some limitation on how constraints can be applied and which tables can they depend on? Is it a PostgreSQL bug?
From the docs:
PostgreSQL does not support CHECK constraints that reference table data other than the new or updated row being checked. While a CHECK constraint that violates this rule may appear to work in simple tests, it cannot guarantee that the database will not reach a state in which the constraint condition is false (due to subsequent changes of the other row(s) involved).

Is there any way in sql to resolve this scenario?

I have a table in database like this
CREATE TABLE assignments
(
id uuid,
owner_id uuid NOT NULL,
);
Now I want to check in records , If IDs I am getting from request already exist or Not. If exist I will update owner_id and If In request I Not getting a ID which already exist in table I have to delete that record.
(In short it's update mechanism In which I am getting multiple Id's to update in table and If there is already a record in database and In request aswell I will update in database , and If there is record in table but not in request I will delete that from database)
This can be done with a single Insert statement with the ON Conflict clause. Your first task will be creating a PK (or UNIQUE) constraint on the table. Presumably that would be id.
alter table assignments
add constraint primary key (id);
Then insert your data, the 'on constraint' clause will update the owner_id for any existing id.
insert assignments(id, owner_id)
values ( <id>,<owner>)
on conflict (id)
do update
set owner_id = excluded.owner_id;

Import a csv with foreignkeys

Let's say I have 2 tables: Students and Groups.
The Group table has 2 columns: id, GroupName
The Student table has 3 columns: id, StudentName and GroupID
The GroupID is a foreign key to a Group field.
I need to import the Students table from a CSV, but in my CSV instead of the Group id appears the name of the group. How can I import it with pgAdmin without modifying the csv?
Based on Laurenz answer, use follwoing scripts:
Create a temp table to insert from CSV file:
CREATE TEMP TABLE std_temp (id int, student_name char(25), group_name char(25));
Then, import the CSV file:
COPY std_temp FROM '/home/username/Documents/std.csv' CSV HEADER;
Now, create std and grp tables for students and groups:
CREATE TABLE grp (id int, name char(25));
CREATE TABLE std (id int, name char(20), grp_id int);
It's grp table's turn to be populated based on distinct value of group name. Consider how row_number() is use to provide value for id`:
INSERT INTO grp (id, name) select row_number() OVER (), * from (select distinct group_name from std_temp) as foo;
And the final step, select data based on the join then insert it into the std table:
insert into std (id, name, grp_id) select std_temp.id, std_temp.student_name,grp.id from std_temp inner join grp on std_temp.group_name = grp.name;
At the end, retreive data from final std table:
select * from std;
Your easiest option is to import the file into a temporary table that is defined like the CSV file. Then you can join that table with the "groups" table and use INSERT INTO ... SELECT ... to populate the "students" table.
There is of course also the option to define a view on a join of the two tables and define an INSTEAD OF INSERT trigger on the view that inserts values into the underlying tables as appropriate. Then you could load the data directly to the view.
The suggestion by #LaurenzAlbe is the obvious approach (IMHO never load a spreadsheet directly to
your tables, they are untrustworthy beasts). But I believe your implementation after loading the staging
table is flawed.
First, using row_number() virtually ensures you get duplicated ids for the same group name.
The ids will always increment from 1 by 1 to then number of group names no matter the number of groups previously loaded and you cannot ensure the identical sequence on a subsequent spreadsheets. What happens when you have a group that does not previously exist.
Further there is no validation that the group name does not already exist. Result: Duplicate group names and/or multiple ids for the same name.
Second, you attempt to use the id from the spreadsheet as the id the student (std) table is full of error possibilities. How do you ensure that number is unique across spreadsheets?
Even if unique in a single spreadsheet, how do you ensure another spreadsheet does not use the same numbers as a previous one. Or assuming multiple users create the spreadsheets that one users numbers do not overlap another users even if all users
user are very conscious of the numbers they use. Result: Duplicate id numbers.
A much better approach would be to put a unique key on the group table name column then insert any group names from the stage table into the group trapping any duplicate name errors (using on conflict). Then load the student table directly from the stage table
while selecting group id from the group table by the (now unique) group name.
create table csv_load_temp( junk_num integer, student_name text, group_name text);
create table groups( grp_id integer generated always as identity
, name text
, grp_key text generated always as ( lower(name) ) stored
, constraint grp_pk
primary key (grp_id)
, constraint grp_bk
unique (grp_key)
);
create table students (std_id integer generated always as identity
, name text
, grp_id integer
, constraint std_pk
primary key (std_id)
, constraint std2grp_fk
foreign key (grp_id)
references groups(grp_id)
);
-- Function to load Groups and Students
create or replace function establish_students()
returns void
language sql
as $$
insert into groups (name)
select distinct group_name
from csv_load_temp
on conflict (grp_key) do nothing;
insert into students (name, grp_id)
select student_name, grp_id
from csv_load_temp t
join groups grp
on (grp.name = t.group_name);
$$;
The groups table requires Postgres v12. For prior versions remove the column grp_key couumn
and and put the unique constraint directly on the name column. What to do about capitalization is up to your business logic.
See fiddle for full example. Obviously the 2 inserts in the Establish_Students function can be run standalone and independently. In that case the function itself is not necessary.

Data not syncing from mysql to elastic search after processing through Kafka

We are trying to send data from MySQL to elastic(ETL) though Kafka.
In MySQL we have multiple tables which we need to aggregate in specific format than we can send it to elastic search.
For that we used debezium to connect with Mysql and elastic and transformed data through ksql.
we have created streams for both the tables then partition them and create table of one entity but after joining we dint get the data from both the tables.
we are trying to join two tables of Mysql through Ksql and send it to elastic search using debezium.
Table 1: items
table 2 : item_images
CREATE STREAM items_from_debezium (id integer, tenant_id integer, name string, sku string, barcode string, qty integer, type integer, archived integer)
WITH (KAFKA_TOPIC='mySqlTesting.orderhive.items',VALUE_FORMAT='json');
CREATE STREAM images_from_debezium (id integer,item_id integer,image string, thumbnail string)
WITH (KAFKA_TOPIC='mySqlTesting.orderhive.item_images',VALUE_FORMAT='json');
CREATE STREAM items_flat
WITH (KAFKA_TOPIC='ITEMS_REPART',VALUE_FORMAT='json',PARTITIONS=1) as SELECT * FROM items_from_debezium PARTITION BY id;
CREATE STREAM images_flat
WITH (KAFKA_TOPIC='IMAGES_REPART',VALUE_FORMAT='json',PARTITIONS=1) as SELECT * FROM images_from_debezium PARTITION BY item_id;
CREATE TABLE item_images (id integer,item_id integer,image string, thumbnail string)
WITH (KAFKA_TOPIC='IMAGES_REPART',VALUE_FORMAT='json',KEY='item_id');
SELECT item_images.id,item_images.image,item_images.thumbnail,items_flat.id,items_flat.name,items_flat.sku,items_flat.barcode,items_flat.type,items_flat.archived,items_flat.qty
FROM items_flat left join item_images on items_flat.id=item_images.item_id
limit 10;
We are expecting data of both the tables but we are getting null from item_images table.