Calculating correlation coefficient using PostgreSQL? - postgresql

I have worked out how to calculate the correlation coefficient between two fields if both are in the same table:
SELECT corr(column1, column2) FROM table WHERE <my filters>;
...but I can't work out how to do it when the columns are from different tables (I need to apply the same filters to both tables).
Any hints, please?

If the tables are related to one another such that you can join them, it's fairly simple. Just join them and do the correlation:
SELECT corr(t1.col1, t2.col2)
FROM table1 t1
JOIN table2 t2
ON t1.join_field = t2.join_field
WHERE
<filters for t1>
AND
<filters for t2>
If they're not, then how are you supposed to find out which combination of fields from each table you want to run corr on?

try this
SELECT corr(t1.column1, t2.column2)
FROM table1 t1
join table2 t2 on t1.SomeColumn = t2.SomeColumn
WHERE t1.<my filters>
AND t2.<my filters>;

Related

How to write a Postgres SELECT FOR UPDATE when using the EXCEPT set operator?

In Postgres (11, if it matters), I need to do a SELECT FOR UPDATE to obtain a collection of rows that I'll subsequently be doing some alterations on, and which I don't want anyone outside my transaction messing with while I do those alterations.
However, the set of rows I want to lock is actually defined by a set-difference, i.e.,
SELECT <columns> FROM table1 t1 JOIN table2 t2 ON ... WHERE ...
EXCEPT
SELECT <columns> FROM table1 t1 JOIN table3 t3 ON ... WHERE ...
I want the result-set of this set-difference to determine the set of rows that get locked; that is, those rows that are selected by the second SELECT should ideally not get locked.
But I'm not quite sure where to put the FOR UPDATE clause to achieve this. It seems like putting the FOR UPDATE immediately after either of the SELECT lines above would not give me what I want. And in fact I suspect that I can't legally put it after the first of those SELECT lines (i.e., just before the EXCEPT).
One idea that occurred to me was to parenthesize the second SELECT (the one that's the subject of the EXCEPT), so that the FOR UPDATE won't be interpreted as part of that second SELECT:
SELECT <columns> FROM table1 t1 JOIN table2 t2 ON ... WHERE ...
EXCEPT
(SELECT <columns> FROM table1 t1 JOIN table3 t3 ON ... WHERE ...)
FOR UPDATE
But I'm not sure that that gives me what I want either, even if it turns out to be syntactically acceptable.
It's possible that if I had an idea of the shape of the parse tree for a (Postgres) select statement, I could easily figure this out myself; but as it is, I'm a bit lost right now.
You cannot use FOR UPDATE together with UNION, INTERSECT or EXCEPT, because this could cause ambiguities in the general case.
I can think of two approaches:
Use EXISTS and NOT EXISTS:
SELECT ... FROM table1
WHERE EXISTS (SELECT 1 FROM table2 ...
WHERE table2.x = table1.x AND ...)
AND NOT EXISTS (SELECT 1 FROM table3 ...
WHERE table3.y = table1.y AND ...)
FOR UPDATE OF table1;
Use a subquery:
SELECT ... FROM table1
WHERE id IN (SELECT t1.id
FROM table1 t1 JOIN table2 t2 ON ...
WHERE ...
EXCEPT
SELECT t1.id
FROM table1 t1 JOIN table3 t3 ON ...
WHERE ...)
FOR UPDATE OF table1;

Defining order of columns in Postgresql full join without naming all columns

I'm joining different tables with countries information, where one of them (cty) is the main table with the countries' names. All the tables have a column c, linking to the primary key in cty (also called c).
To join them all, I first used
select * from cty
full join table1 using (c)
full join table2 using (c)
This gives me all the countries in cty, but I want only the countries present in the other tables. To solve this, I tried
select * from table1
full join table2 using (c)
join cty using (c)
This solves the problem about the number of lines, but now the main columns are the last in the table.
Is there a way to keep the columns from cty in the beginning (left side) of the table without specifying all the column names of all tables (I have many tables), and keep only the lines present in the secondary tables?
select * from cty
right join
(select * from table1
full join table2 using(id)
) fj on fj.id = tt1.id
;
Check it: http://rextester.com/HCA83570

Select distinct join in HQL Request

I have a bit complicated request to do in HQL and I don't manage to obtain the result I want.
Here is what I want to do :
I have two entities t1 and t2 with a OneToMany relation between both.
I want to select some infos from both tables in the same request but here is the issue, I don't want any duplicate of t1.
So basically I want 4 properties, 3 from t1 and 1 from t2, but as there are several records from t2 for the same t1 object, I just want to get the first from t2 to not have any t1 duplicated records.
Here is what I did :
SELECT DISTINCT(t1.a , t1.b, t1.c, t2.z) FROM t1 LEFT JOIN t2
But Obviously, that worked when I did not need any t2 parameter, but now I have some records (a,b,c) duplicated for different t2.z
And I don't find any way in HQL to do it (I can't do any Select LIMIT 1 that could work in SQL).
Does anybody have an idea on how to resolve that?
Thanks

How to select distinct-columns along with one nondistinct-column in DB2?

I need to perform distinct select on few columns out of which, one column is non-distinct. Can I specify which columns make up the distinct group in my SQL statement.
Currently I am doing this.
Select distinct a,b,c,d from TABLE_1 inner join TABLE_2 on TABLE_1.a = TABLE_2.a where TABLE_2.d IS NOT NULL;
The problem I have is I am getting 2 rows for the above SQL because column D holds different values. How can I form a distinct group of columns (a,b&c) ignoring column d, but have column d in my select clause as well?
FYI: I am using DB2
Thanks
Sandeep
SELECT a,b,c,MAX(d)
FROM table_1
INNER JOIN table_2 ON table_1.a = table_2.a
GROUP BY a,b,c
Well, your question, even with refinements, is still pretty general. So, you get a general answer.
Without knowing more about your table structure or your desired results, it may be impossible to give a meaningful answer, but here goes:
SELECT a, b, c, d
FROM table_1 as t1
JOIN table_2 as t2
ON t2.a = t1.a
AND t2.[some_timestamp_column] = (SELECT MAX(t3.[some_timestamp_column])
FROM table_2 as t3
WHERE t3.a = t2.a)
This assumes that table_1 is populated with single rows to retrieve, and that the one-to-many relationship between table_1 and table_2 is created because of different values of d, populated at unique [some_timestamp_column] times. If this is the case, it will get the most-recent table_2 record that matches to table_1.

GROUP BY in UPDATE FROM clause

I really need do something like that:
UPDATE table t1
SET column1=t2.column1
FROM table t2
INNER JOIN table t3
USING (column2)
GROUP BY t1.column2;
But postgres is saying that I have syntax error about GROUP BY clause. What is a different way to do this?
The UPDATE statement does not support GROUP BY, see the documentation. If you're trying to update t1 with the corresponding row from t2, you'd want to use the WHERE clause something like this:
UPDATE table t1 SET column1=t2.column1
FROM table t2
JOIN table t3 USING (column2)
WHERE t1.column2=t2.column2;
If you need to group the rows from t2/t3 before assigning to t1, you'd need to use a subquery something like this:
UPDATE table t1 SET column1=sq.column1
FROM (
SELECT t2.column1, column2
FROM table t2
JOIN table t3 USING (column2)
GROUP BY column2
) AS sq
WHERE t1.column2=sq.column2;
Although as formulated that won't work because t2.column1 isn't included in the GROUP BY statement (it would have to be an aggregate function rather than a simple column reference).
Otherwise, what exactly are you trying to do here?
In MariaDB/ MySQL this SQL work :
UPDATE table t1 left join (
SELECT t2.column1, column2
FROM table t2
JOIN table t3 USING (column2)
GROUP BY column2
) AS sq on t1.column2=sq.column2
SET column1=sq.column1;