N-Way table joins in ksqlDB - apache-kafka

I have three ksqldb tables, whose relation is illustrated in the picture below. I must join them.
This query will result in an error:
CREATE TABLE `reviewer-email-user`
AS SELECT *
FROM USER
INNER JOIN REVIEWER ON USER.USERID = REVIEWER.USERID
INNER JOIN EMAILADDRESS ON USER.USERID = EMAILADDRESS.USERID
EMIT CHANGES;
And the error is:
Could not determine output schema for query due to error: Invalid join condition: table-table joins require to join on the primary key of the right input table. Got USER.USERID = REVIEWER.USERID
So, how do I join these three ksqldb tables? Thank you.

table-table joins expect the join condition to have the primary key on the right side and therefore, the following will not work:
ksql> CREATE TABLE reviewer_user
> AS SELECT *
> FROM REVIEWER
> INNER JOIN USER ON USER.user_id = REVIEWER.user_id
>EMIT CHANGES;
Could not determine output schema for query due to error: Cannot add table 'REVIEWER_USER': A table with the same name already exists
Statement: CREATE TABLE REVIEWER_USER WITH (KAFKA_TOPIC='REVIEWER_USER', PARTITIONS=2, REPLICAS=1) AS SELECT *
FROM REVIEWER REVIEWER
INNER JOIN USER USER ON ((USER.USER_ID = REVIEWER.USER_ID))
EMIT CHANGES;
however the following query does work (note that I have flipped the left and right side of the join)
ksql> CREATE TABLE reviewer_user
> AS SELECT *
> FROM REVIEWER
> INNER JOIN USER ON REVIEWER.user_id = USER.user_id
>EMIT CHANGES;
Message
---------------------------------------------
Created query with ID CTAS_REVIEWER_USER_11
---------------------------------------------
Another limitation of table-table joins is that it doesn't support n-way joins, so you would have to create 2 new tables(reviewer_user and email_user) by performing joins as suggested above and then finally perform a join on them to get your final result.

Related

Row level security - Update Rows

Hi I am working with Postgres, I have one role "my_role", and I want to update records from one table only where my corporate_id is related to other table.
I want to create a Policy to person table, and I have a corporate_id from my corporate table to drive to get this information would be something like these:
SELECT * FROM person p
INNER JOIN person_brand a ON p.person_id=a.person_id
INNER JOIN brand b ON a.brand_id=b.brand_id
INNER JOIN corporate c on b.corporate_id=c.corporate_id
WHERE c.corporate_id=corporate_id
I my policy will be something like these:
ALTER TABLE core.person ENABLE ROW LEVEL SECURITY;
CREATE POLICY person_corporation_all
ON person
AS PERMISSIVE
FOR UPDATE
TO "my_role"
USING (EXISTS(SELECT 1 FROM person p
INNER JOIN person_brand a ON p.person_id=a.person_id
INNER JOIN brand b ON a.brand_id=b.brand_id
INNER JOIN corporate c on b.corporate_id=c.corporate_id
WHERE c.corporate_id=corporate_id));
But show me this error:
ERROR: column reference "corporate_id" is ambiguous
SQL state: 42702
What I need to send as variable into my query?
Regards
You will have a nested policy because inside the verification you have the person table again, you will need to remove it, and refers to the columns using the name of the table person, for example:
CREATE POLICY person_corporation_all
ON person
AS PERMISSIVE
FOR UPDATE
TO "my_role"
USING (EXISTS(SELECT 1 FROM person_brand a
INNER JOIN brand b ON a.brand_id=b.brand_id
INNER JOIN corporate c on b.corporate_id=c.corporate_id
WHERE a.person_id=person.person_id and c.corporate_id=person.corporate_id));

"Spectrum nested query error" Redshift error

When I run this query in Redshift:
select sd.device_id
from devices.s_devices sd
left join devices.c_devices cd
on sd.device_id = cd.device_id
I get an error like this:
ERROR: Spectrum nested query error
DETAIL:
-----------------------------------------------
error: Spectrum nested query error
code: 8001
context: A subquery that refers to a nested table cannot refer to any other table.
query: 0
location: nested_query_rewriter.cpp:726
process: padbmaster [pid=6361]
-----------------------------------------------
I'm not too sure what this error means. I'm only joining to one table I'm not sure which "other table" it's referring to, and I can't find much info about this error on the web.
I've noticed if I change it from left join to join, the error goes away, but I do need to do a left join.
Any ideas what I'm doing wrong?
Redshift reference mentions:
If a FROM clause in a subquery refers to a nested table, it can't refer to any other table.
In your example, you're trying to join two nested columns in one statement.
I would try to first unnest them separately and only then join:
with
s_dev as (select sd.device_id from devices.s_devices sd),
c_dev as (select cd.device_id from devices.c_devices cd)
select
c_dev.device_id
from c_dev
left join s_dev
on s_dev.device_id = c_dev.device_id
The solution that worked for me, was to create a temporary table with the nested table's data and then join the temp table with the rest of the tables I needed to.
For example, if the nested table is spectrum.customers, the solution will be:
DROP TABLE IF EXISTS temp_spectrum_customers;
CREATE TEMPORARY TABLE
temp_spectrum_customers AS
SELECT c.id, o.shipdate, c.customer_id
FROM spectrum.customers c,
c.orders o;
SELECT tc.id, tc.shipdate, tc.customer_id, d.delivery_carrier
FROM temp_spectrum_customers tc
LEFT OUTER JOIN orders_delivery d on tc.id = d.order_id;

HQL update column with COUNT

I'm working with Hibernate thus HQL, linked to a PostgreSQL database.
I have a table users and a table teams that are linked with a ManyToMany condition throught the table teams_users.
I'd like to update or select the table team so the property usersCount takes the amount of users belonging to a team.
I do not want to add a #Formula to my Entity Class, because I don't want it to be executed all the time, that's too wastful on big JOIN FETCH query where I do not need the count.
I other words, I'd like to find the HQL equivalent of the following PSQL query
UPDATE teams t
SET users_count = (SELECT COUNT(ut.*)
FROM teams t1
LEFT JOIN teams_users tu
ON t1.id = tu.team_id
WHERE t1.id = t.id
GROUP BY t1.id);
OR
An equivalent of the following
SELECT t.*, count(tu.*) AS users_count
FROM teams t
LEFT JOIN teams_users tu
ON t.id = tu.team_id
GROUP BY t.id;
Unsuccessful tries (to get an idea)
UPDATE Team t SET
t.usersCount = COUNT(t.users)
UPDATE Team t SET
t.usersCount = (SELECT COUNT(t1.users) FROM Team t1 WHERE t1.id = t.id)
SELECT t, count(t.users) AS t.usersCount
FROM Team t
I've found the solution for the UPDATE query.
It simply is
UPDATE Team t
SET t.usersCount = (SELECT COUNT(u) from t.users u)
It makes an extra join on the table users whilst the table teams_users would be enought but well... It works.
If anyone has ths solution for the SELECT one, I'm still curious !

SQL Natural Join

Okay. So the question that I got asked by the teacher was this:
(5 marks) Construct a SQL query on the dvdrental database that uses a natural join of two or more tables and an additional where condition. (E.g. find the titles of films rented by a particular customer.) Note the hints on the course news page if your query returns nothing.
Here is the layout of the database im working with:
http://www.postgresqltutorial.com/wp-content/uploads/2013/05/PostgreSQL-Sample-Database.png
The hint to us was this:
PostgreSQL hint:
If a natural join doesn't produce any results in the dvdrental DB, it is because many tables have the last update: timestamp field, and thus the natural join tries to join on that field as well as the intended field.
e.g.
select *
from film natural join inventory;
does not work because of this - it produces an empty table (no results).
Instead, use
select *
from film, inventory
where film.film_id = inventory.film_id;
This is what I did:
select *
from film, customer
where film.film_id = customer.customer_id;
The problem is I cannot get a particular customer.
I tried doing customer_id = 2; but it returns a error.
Really need help!
Well, it seems that you would like to join two tables that have no direct relation with each other, there's your issue:
where film.film_id = customer.customer_id
To find which films are rented by which customer you would have to join customer table with rental, then with inventory and finally with film.
The task description states
Construct a SQL query on the dvdrental database that uses a natural join of two or more tables and an additional where condition.quote

Can't disable order joins in Postgresql

Postgresql provides a parameter join_collapse_limit = 1 to disable order of joins. But when I set the parameter and reset the server, the query plan is not changed and there still is order join optimization. My query FROM table is like this
FROM
student as group_A,
student as group_B,
student as group_C
WHERE ...
If users want to use join_collapse_limit = 1 to disable join reordering, explicit JOINS must be provided in the query. For example, the above FROM table should be
FROM
student as group_A CROSS JOIN
student as group_B CROSS JOIN
student as group_C
WHERE ...