postgreSQL query over several very large tables with same columns , how to optimize it and its code - postgresql

I am runining a following "simple query" from tables a1, a2, ..., a20. each table a1, a2, ...., a20 has milions of rows, and each of them have same columns, X, Y, Z.
CREATE TABLE A_bis as
SELECT
X, Y, Z
FROM a1
WHERE
Y= 3
UNION
SELECT
X, Y, Z
FROM a2
WHERE
Y= 3
UNION
SELECT
X, Y, Z
FROM a3
WHERE
Y= 3
UNION
...
SELECT
X, Y, Z
FROM a20
WHERE
Y= 3
and I get table A_bis, but it takes at least 20 minutes.
I'd like to:
a) optimize the query so it is faster.
b) improve the code (loop ? ) so I don't have to literally write a 7 lines for each of tables a1, .... a20 to get 130 lines of code

Comments answered your question A (Basically : Add an index on each aX table).
For the question B, you can use PostgreSQL inheritance:
CREATE TABLE aParent (x INT, y INT, z INT);
ALTER TABLE a1 INHERITS aParent;
ALTER TABLE a2 INHERITS aParent;
...
ALTER TABLE a20 INHERITS aParent;
Then you can do
SELECT X, Y, Z FROM aParent WHERE Y = 3;

Related

PostgreSQL - How to use index for this kind of query

We got this query:
SELECT * FROM table WHERE A AND ( B = X1 OR B = X2 ) AND ( C = X3 OR D = TRUE ) AND E = 0;
I created this index:
CREATE INDEX _my_index ON public.table USING btree (A, B, C, D, E);
But I don't get any better performances ... how to deal with such queries for indexing ?
Thank you !
I'll assume that X1, X2 and X3 are constants and not table columns.
You won't be able to index C = X3 OR D = TRUE — OR is always a performance problem.
The condition B = X1 OR B = X2 should be rewritten to B IN (X1, X2).
Then this is the best index:
CREATE INDEX ON "table" (e, a, b);
If you always want to query for truth of a and e = 0, a partial index would be even better:
CREATE INDEX ON "table" (b) WHERE a AND e = 0;
If you need to index the conditions on c and d as well, and the table has a primary key, you can rewrite the query to:
SELECT * FROM "table"
WHERE a AND b IN (X1, X2) AND c = X3 AND e = 0
UNION
SELECT * FROM "table"
WHERE a AND b IN (X1, X2) AND d AND e = 0;
For this query, the following two indexes are commendable:
CREATE INDEX ON "table" (c, a, e, b);
CREATE INDEX ON "table" (e, a, d, b);
Again, you can move certain index columns into a WHERE condition if you always query for a certain value.

Referring to a different column in PostgreSQL column

Consider the following image
If you want to get a result row containing all steps to get the length of the non-labeled sides, you can do the following:
SELECT
5 AS a, --side 1, triangle 1
7 AS b, --side 2, triangle 1
(5*5) AS a2, --a^2
(7*7) AS b2, --b^2
(5*5)+(7*7) AS c2, --a^2 * b^2 = c^2
SQRT((5*5)+(7*7)) AS c, --√c2 = c
19 AS d, --side 1, triangle 2
24 AS e, --side 2 triangle 2
(19*19) AS d2, --d^2
(24*24) AS e2, --e^2
(19*19)+(24*24) AS f2, --d^2 * e^2 = f^2
SQRT((19*19)+(24*24)) AS f, --√f2 = f
(5*5)+(7*7)+(19*19)+(24*24) AS g2, --c^2 * f^2 = g^2
SQRT((5*5)+(7*7)+(19*19)+(24*24)) AS g --√g2 = g
However, that is CLEARLY very ugly. I'd like to use column substitution, like:
SELECT
5 AS a, --side 1, triangle 1
7 AS b, --side 2, triangle 1
(a*a) AS a2, --a^2
(b*b) AS b2, --b^2
a2+b2 AS c2, --a^2 * b^2 = c^2
SQRT(c2) AS c, --√c2 = c
19 AS d, --side 1, triangle 2
24 AS e, --side 2 triangle 2
(d*d) AS d2, --d^2
(e*e) AS e2, --e^2
d+e AS f2, --d^2 * e^2 = f^2
SQRT(f2) AS f, --√f2 = f
c2+f2 AS g2, --c^2 * f^2 = g^2
SQRT(g2) AS g --√g2 = g
Is there any easy way to do this?
PS Please don't explain how this is a ridiculous use of SQL, I know THAT! This was just the simplest way that I could reduce my problem to be understood. In my scenario, it is much more complex calculations with variables coming from many joined tables, that the results need to be inserted into a summary table with a very rigid structure. Currently, I'm bringing the results out to Node doing the calculations and inserting the data, but that is very VERY slow especially since I to go through the network to get to the database server.
This can be done using common table expressions:
with base_vars (a,b,d,e) as (
values (5),(7),(19),(24)
), var2 (a2, b2, d2, e2) as (
select a*a, b*b, d*d*, e*e
from base_vars,
), var3 (c2, c, f2, f) as (
select a2+b2, SQRT(a2+b2), d+e, sqrt(d+e)
from var2, base_vars
), var3 (g2, g) as (
select c2+f2, sqrt(c2+f2)
from var3
)
select sqrt(g)
from var3;
I am not 100% if I got all variables right, but I think you get the idea.
Another option would be to put that into a PL/pgSQL function.
lateral is a bit shorter than CTEs since it is not necessary to refer to a previous CTE. And the planner can not join the CTEs and the main query into a single plan.
with t (a,b,d,e) as (values (5,7,19,24))
select c, f, sqrt(c2 + f * f)
from
t
cross join lateral
(select a * a, b * b, d * d, e * e) t1 (a2, b2, d2, e2)
cross join lateral
(select a2 + b2, d2 + e2) t2 (c2, f2)
cross join lateral
(select sqrt(c2), sqrt(f2)) t3 (c, f)
;
c | f | sqrt
------------------+------------------+------------------
8.60232526704263 | 30.6104557300279 | 31.7962261911693

How get sum of weighted 'tensor-multipllication' vectors without loop?

I have groups of scalars and two groups of vectors respertively:
w1, w2... wn
b1, b2... bn
c1, c2... cn
w1, w2... wn are scalars and stored in w,
b1, b2... bn stored in B and
c1, c2... cn stored in C. How efficiently get
w1*(b1*c1') + w2*(b2*c2') + ... + wn*(bn*cn')
Where bi and ci are vectors but bi*ci' is matrix, not a scalar?
Sizes: 1 x N for w, P x N for B and Q x N for C. wi = w(i), bi = B(:, i) and Ci = C(:, i)
Simply:
result = B*diag(W)*C';
If N is much bigger than P and Q, you might prefer to compute the weight matrix diag(W) in its sparse form with spdiags(W', 0, N, N) instead.

HIVE : Not in clause

Is there any way to execute the following Sql query in HiveQL?
select * from my_table
where (a,b,c) not in (x,y,z)
where a,b,c correspond respectively to x,y,z
Thanks:)
You'll have to break these down to separate conditions:
SELECT *
FROM my_table
WHERE a != x AND b != y AND c != z
Is this what you intend?
where a <> x or b <> y or c <> z
Or this?
where a not in (x, y, z) and
b not in (x, y, z) and
c not in (x, y, z)
Or some other variation?

Unions script based on datasets circles

Can someone tell me if I made this good? I am not so sure, especially about one thing explained by second diagram: does this green region means values of X AND Z, or rather X OR Z?
I made some corrects in code, but it seems that I am not using parentheses correctly. Don't know if this code is good
-- 1
/*
// Values stored in Y, that are parts of X and Z
"Y NOT IN (Y EXCEPT (UNION OF X AND Y))"
*/
SELECT Val FROM Y
EXCEPT
SELECT Val FROM X
EXCEPT
SELECT Val FROM Z
-- 2
/*
// Values stored in Y, that are parts of X and Z
"Y NOT IN (Y EXCEPT (UNION OF X AND Y))"
*/
SELECT VAL FROM Y
INTERSECT (
SELECT Val FROM Y
EXCEPT
SELECT Val FROM X
EXCEPT
SELECT Val FROM Z
)
-- 3
/*
// Values stored in X and Z. that are not a part of Y
"(UNION OF X & Z) EXCEPT Y"
*/
SELECT VAL FROM X
UNION
SELECT VAL FROM Z
EXCEPT
SELECT VAL FROM Y
-- 4
/*
// Every value of X, and same values from Y and Z
"(Y NOT IN (Y EXCEPT (UNION OF X AND Y))) UNION X"
*/
SELECT Val FROM X
UNION(
SELECT Val FROM Y
INTERSECT
SELECT Val FROM Z)
I agree with 1,3 and 4 but 2 should be:
SELECT VAL FROM Y
EXCEPT (
SELECT Val FROM Y
EXCEPT
SELECT Val FROM X
EXCEPT
SELECT Val FROM Z
or alternatively:
(SELECT Val FROM Y
INTERSECT
SELECT Val FROM X)
UNION
(SELECT Val FROM Y
INTERSECT
SELECT Val FROM X)