I am trying to convert this update statement to spark sql
update table 1
set col1 = t2.col1,
col2=1
from
table1 t1
inner join
table2 t2
on t1.id = t2.id
Where right(t2.col3,2)<>'00'
and t1.col3>0
I used the following code but in the output its missing data
i converted t1 and t2 to temporaryview from spark dataframe
spark.sql("""select t2.col1,
1 as col2,
t1.col3
from
table1 t1
inner join
table2 t2
on t1.id = t2.id
Where right(t2.col3,2)<>'00'
and t1.col3>0 """).createOrReplaceTempView("outputTable")
thinking of selecting the remaining data in a temp view and union the outputs into a final one but not sure if its the proper way to deal with this issue.
Using spark 3.2.1
Related
I am learning sql (postgres) and id like to find values that do not exist.
I have a table, table 1 with ids and i want to find those ids that are not in table 4.
I have to join between 3 tables as table 1 holds id and table 4 contact_id (not the same number)
The tables 2,3 need to be joined as that connects the ids.
So how do i do that with “not exists”?
Select t1.id, table4.contact_id
From table1 t1
Join table2 using(id)
Join table3 using(id)
Join table4 using(contact_id)
Where not exists (
Select 1
From table4
Where table4.contact_id=t1.id
);
It returns no values, but should
No error msg…
I have thinking error i assume
Your query probably returns no values because you join table4 on contact_id and then you exclude in the WHERE clause the rows which come from this join.
To find values that don't exist, you can usually use LEFT JOIN or RIGHT JOIN or FULL OUTER JOIN and then filter the rows with NULL values in the WHERE clause.
Try this :
SELECT t1.id
FROM table1 t1
LEFT JOIN table2 t2 using(id)
LEFT JOIN table3 t3 using(id)
LEFT JOIN table4 t4 using(contact_id)
WHERE t4.contact_id IS NULL
I'm currently migrating from SQL Server to PostgreSQL and got confuse with update query in postgres.
I have query like this in SQL Server:
UPDATE t1
SET col1 = 'xx'
FROM table1 t1
LEFT JOIN table2 t2 ON t1.id = t2.id
WHERE t2.id is null
how do you do this in postgres?
thanks in advance
The left join is used to simulate a "NOT EXISTS" condition, so you can rewrite it to:
update table1 t1
set col1 = 'xx'
where not exists (select *
from table2 t2
where t1.id = t2.id);
As a side note: Postgres works differently than SQL Server and in general you should not repeat the target table of an UPDATE statement in the FROM clause in Postgres.
I want to copy rows from one table t2 to another t1, while excluding rows with values already existing in t1. The usual approach of 'NOT IN' works fine but only as long there are not multiple occurences of the same value in the source table t2.
Now, assuming I have two tables with the schema:
CREATE TABLE t1 ( id INTEGER );
CREATE TABLE t2 ( id INTEGER );
then insert data into them like:
INSERT INTO t1 VALUES (1);
INSERT INTO t2 VALUES (1);
INSERT INTO t2 VALUES (2);
Now, I try to insert all data from t2 into t1 but exclude pre-existing in t1:
INSERT INTO t1 (id) SELECT t2.id FROM t2
WHERE t2.id NOT IN ( SELECT t1.id FROM t1 WHERE t1.id = t2.id );
it works flawlessly; the row in t2 with the value of '1' did not get insert a second time into t1:
SELECT * FROM t1;
id
----
1
2
(2 rows)
But when there are multiple occurences of the same value in t2 it doesn't check if they exist in t1 for each individual insert, but for the whole transaction as it seems. Let's continue with my example by:
DELETE FROM t1;
INSERT INTO t2 VALUES (2);
SELECT * FROM t2;
id
----
1
2
2
(3 rows)
INSERT INTO t1 (id) SELECT t2.id FROM t2
WHERE t2.id NOT IN ( SELECT t1.id FROM t1 WHERE t1.id = t2.id );
SELECT * FROM t1;
id
----
1
2
2
(3 rows)
The same result is achieved with WHERE NOT EXISTS as well.
Has anyone an idea of how to check for existing values in t1 on an individual row-level to prevent multiple occurences?
I could as well use ON CONFLICT DO ... but I rather not want to since the idea is to split the data coming from t2 into a "clean" t1 and a "dirty" t1_faulty where all the rows are collected which do not fit some given criteria (one of which the uniqueness of id for which I am asking this question).
I think you could simply filter the records you want from the source table (t2).
you might use distinct on
INSERT INTO t1 (id) SELECT distinct on (t2.id) t2.id FROM t2
WHERE t2.id NOT IN ( SELECT t1.id FROM t1 WHERE t1.id = t2.id );
or group by
INSERT INTO t1 (id) SELECT t2.id FROM t2
WHERE t2.id NOT IN ( SELECT t1.id FROM t1 WHERE t1.id = t2.id ) group by t2.id;
or, if you want only the records that are already unique on t2, add a having count = 1
INSERT INTO t1 (id) SELECT t2.id FROM t2
WHERE t2.id NOT IN ( SELECT t1.id FROM t1 WHERE t1.id = t2.id )
group by t2.id
having count(t2.id) = 1
Is there such thing like conditional join:
SELECT *
FROM TABLE1 A
IF (a=='TABLE2') THEN INNER JOIN TABLE2 B ON A.item_id=B.id
ELSE IF (a=='TABLE3') THEN INNER JOIN TABLE3 C ON A.item_id=C.id
While a is a field in TABLE1.
I like to use this in stored procedures without using dynamic sql (without writing query as string and EXEC(#query)).
EDIT: I can't write:
IF (a=='TABLE2) THEN queryA
ELSE IF (a=='TABLE3') THEN queryB
Because a is a field of TABLE1.
EDIT: Modified answer based on comment below:
You could try to get clever with some left joins. This will return more columns, so you'd probably want to be more discriminating than just SELECT *.
SELECT *
FROM TABLE1 A
LEFT JOIN TABLE2 B
ON A.item_id = B.id
AND A.a = 'TABLE2'
LEFT JOIN TABLE3 C
ON A.item_id = C.id
AND A.a = 'TABLE3'
WHERE (B.id IS NOT NULL AND A.a = 'TABLE2')
OR (C.id IS NOT NULL AND A.a = 'TABLE3')
Updated the query as requried:
SELECT * FROM
(
SELECT *
FROM TABLE1 A INNER JOIN TABLE2 B
ON A.a='TABLE2' --This will eleminate the table rows if the value of A.a is not 'TABLE2'
AND A.item_id=B.id) A,
(SELECT * FROM
INNER JOIN TABLE3 C
ON A.a='TABLE3' --This will eleminate the table rows if the value of A.a is not 'TABLE3'
AND A.item_id=C.id
) B
) a
I wrote this, and it is wrong syntax, help me fix it, I want 'T' to be an alias of the result of the two inner joins.
select T.id
from table1
inner join table2 on table1.x = table2.y
inner join table3 on table3.z = table1.w as T;
You cannot use aliases to name the "entire" join, you can, however, put aliases on individual tables of the join:
select t1.id
from table1 t1
inner join table2 t2 on t1.x = t2.y
inner join table3 t3 on t3.z = t1.w
In the projection, you will have to use the alias of the table, which defines the id column you are going to select.
You can't directly name the result of a join. One option is to use a subquery:
select T.id
from (
select *
from table1
inner join table2 on table1.x = table2.y
inner join table3 on table3.z = table1.w
) T
Another option is subquery factoring:
with T as (
select *
from table1
inner join table2 on table1.x = table2.y
inner join table3 on table3.z = table1.w
)
select T.id
from T