Referring to a different column in PostgreSQL column - postgresql

Consider the following image
If you want to get a result row containing all steps to get the length of the non-labeled sides, you can do the following:
SELECT
5 AS a, --side 1, triangle 1
7 AS b, --side 2, triangle 1
(5*5) AS a2, --a^2
(7*7) AS b2, --b^2
(5*5)+(7*7) AS c2, --a^2 * b^2 = c^2
SQRT((5*5)+(7*7)) AS c, --√c2 = c
19 AS d, --side 1, triangle 2
24 AS e, --side 2 triangle 2
(19*19) AS d2, --d^2
(24*24) AS e2, --e^2
(19*19)+(24*24) AS f2, --d^2 * e^2 = f^2
SQRT((19*19)+(24*24)) AS f, --√f2 = f
(5*5)+(7*7)+(19*19)+(24*24) AS g2, --c^2 * f^2 = g^2
SQRT((5*5)+(7*7)+(19*19)+(24*24)) AS g --√g2 = g
However, that is CLEARLY very ugly. I'd like to use column substitution, like:
SELECT
5 AS a, --side 1, triangle 1
7 AS b, --side 2, triangle 1
(a*a) AS a2, --a^2
(b*b) AS b2, --b^2
a2+b2 AS c2, --a^2 * b^2 = c^2
SQRT(c2) AS c, --√c2 = c
19 AS d, --side 1, triangle 2
24 AS e, --side 2 triangle 2
(d*d) AS d2, --d^2
(e*e) AS e2, --e^2
d+e AS f2, --d^2 * e^2 = f^2
SQRT(f2) AS f, --√f2 = f
c2+f2 AS g2, --c^2 * f^2 = g^2
SQRT(g2) AS g --√g2 = g
Is there any easy way to do this?
PS Please don't explain how this is a ridiculous use of SQL, I know THAT! This was just the simplest way that I could reduce my problem to be understood. In my scenario, it is much more complex calculations with variables coming from many joined tables, that the results need to be inserted into a summary table with a very rigid structure. Currently, I'm bringing the results out to Node doing the calculations and inserting the data, but that is very VERY slow especially since I to go through the network to get to the database server.

This can be done using common table expressions:
with base_vars (a,b,d,e) as (
values (5),(7),(19),(24)
), var2 (a2, b2, d2, e2) as (
select a*a, b*b, d*d*, e*e
from base_vars,
), var3 (c2, c, f2, f) as (
select a2+b2, SQRT(a2+b2), d+e, sqrt(d+e)
from var2, base_vars
), var3 (g2, g) as (
select c2+f2, sqrt(c2+f2)
from var3
)
select sqrt(g)
from var3;
I am not 100% if I got all variables right, but I think you get the idea.
Another option would be to put that into a PL/pgSQL function.

lateral is a bit shorter than CTEs since it is not necessary to refer to a previous CTE. And the planner can not join the CTEs and the main query into a single plan.
with t (a,b,d,e) as (values (5,7,19,24))
select c, f, sqrt(c2 + f * f)
from
t
cross join lateral
(select a * a, b * b, d * d, e * e) t1 (a2, b2, d2, e2)
cross join lateral
(select a2 + b2, d2 + e2) t2 (c2, f2)
cross join lateral
(select sqrt(c2), sqrt(f2)) t3 (c, f)
;
c | f | sqrt
------------------+------------------+------------------
8.60232526704263 | 30.6104557300279 | 31.7962261911693

Related

PySpark - Creating a single column from multiple columns with some basic math

Consider the following PySpark dataframe
Col1
Col2
Col3
A, B
D, G
A, G
C, F
C, D
A, G
C, F
C, D
A, G
I'd like to create a new dataframe with 2 columns, the first with all the different combinations, and the second column is the ratio: Frequency of Combination / Total Number of Combinations. For example,
Combination
Ratio
A, B
0.111 (1/9)
C, F
0.222 (2/9)
D, G
0.111 (1/9)
C, D
0.222 (2/9)
A, G
0.333 (3/9)
You can unpivot, then group by and count:
from pyspark.sql import functions as F, Window
df2 = df.selectExpr(
'stack(' + str(len(df.columns)) + ', ' + ', '.join(df.columns) + ') as combination'
).groupBy('combination').count().withColumn(
'ratio',
F.col('count') / F.sum('count').over(Window.orderBy())
).drop('count')
df2.show()
+-----------+------------------+
|combination| ratio|
+-----------+------------------+
| A, B|0.1111111111111111|
| C, F|0.2222222222222222|
| C, D|0.2222222222222222|
| D, G|0.1111111111111111|
| A, G|0.3333333333333333|
+-----------+------------------+

PostgreSQL - How to use index for this kind of query

We got this query:
SELECT * FROM table WHERE A AND ( B = X1 OR B = X2 ) AND ( C = X3 OR D = TRUE ) AND E = 0;
I created this index:
CREATE INDEX _my_index ON public.table USING btree (A, B, C, D, E);
But I don't get any better performances ... how to deal with such queries for indexing ?
Thank you !
I'll assume that X1, X2 and X3 are constants and not table columns.
You won't be able to index C = X3 OR D = TRUE — OR is always a performance problem.
The condition B = X1 OR B = X2 should be rewritten to B IN (X1, X2).
Then this is the best index:
CREATE INDEX ON "table" (e, a, b);
If you always want to query for truth of a and e = 0, a partial index would be even better:
CREATE INDEX ON "table" (b) WHERE a AND e = 0;
If you need to index the conditions on c and d as well, and the table has a primary key, you can rewrite the query to:
SELECT * FROM "table"
WHERE a AND b IN (X1, X2) AND c = X3 AND e = 0
UNION
SELECT * FROM "table"
WHERE a AND b IN (X1, X2) AND d AND e = 0;
For this query, the following two indexes are commendable:
CREATE INDEX ON "table" (c, a, e, b);
CREATE INDEX ON "table" (e, a, d, b);
Again, you can move certain index columns into a WHERE condition if you always query for a certain value.

postgreSQL query over several very large tables with same columns , how to optimize it and its code

I am runining a following "simple query" from tables a1, a2, ..., a20. each table a1, a2, ...., a20 has milions of rows, and each of them have same columns, X, Y, Z.
CREATE TABLE A_bis as
SELECT
X, Y, Z
FROM a1
WHERE
Y= 3
UNION
SELECT
X, Y, Z
FROM a2
WHERE
Y= 3
UNION
SELECT
X, Y, Z
FROM a3
WHERE
Y= 3
UNION
...
SELECT
X, Y, Z
FROM a20
WHERE
Y= 3
and I get table A_bis, but it takes at least 20 minutes.
I'd like to:
a) optimize the query so it is faster.
b) improve the code (loop ? ) so I don't have to literally write a 7 lines for each of tables a1, .... a20 to get 130 lines of code
Comments answered your question A (Basically : Add an index on each aX table).
For the question B, you can use PostgreSQL inheritance:
CREATE TABLE aParent (x INT, y INT, z INT);
ALTER TABLE a1 INHERITS aParent;
ALTER TABLE a2 INHERITS aParent;
...
ALTER TABLE a20 INHERITS aParent;
Then you can do
SELECT X, Y, Z FROM aParent WHERE Y = 3;

Compute the change of basis matrix in Matlab

I've an assignment where I basically need to create a function which, given two basis (which I'm representing as a matrix of vectors), it should return the change of basis matrix from one basis to the other.
So far this is the function I came up with, based on the algorithm that I will explain next:
function C = cob(A, B)
% Returns C, which is the change of basis matrix from A to B,
% that is, given basis A and B, we represent B in terms of A.
% Assumes that A and B are square matrices
n = size(A, 1);
% Creates a square matrix full of zeros
% of the same size as the number of rows of A.
C = zeros(n);
for i=1:n
C(i, :) = (A\B(:, i))';
end
end
And here are my tests:
clc
clear out
S = eye(3);
B = [1 0 0; 0 1 0; 2 1 1];
D = B;
disp(cob(S, B)); % Returns cob matrix from S to B.
disp(cob(B, D));
disp(cob(S, D));
Here's the algorithm that I used based on some notes. Basically, if I have two basis B = {b1, ... , bn} and D = {d1, ... , dn} for a certain vector space, and I want to represent basis D in terms of basis B, I need to find a change of basis matrix S. The vectors of these bases are related in the following form:
(d1 ... dn)^T = S * (b1, ... , bn)^T
Or, by splitting up all the rows:
d1 = s11 * b1 + s12 * b2 + ... + s1n * bn
d2 = s21 * b1 + s22 * b2 + ... + s2n * bn
...
dn = sn1 * b1 + sn2 * b2 + ... + snn * bn
Note that d1, b1, d2, b2, etc, are all column vectors. This can be further represented as
d1 = [b1 b2 ... bn] * [s11; s12; ... s1n];
d2 = [b1 b2 ... bn] * [s21; s22; ... s2n];
...
dn = [b1 b2 ... bn] * [sn1; sn2; ... s1n];
Lets call the matrix [b1 b2 ... bn], whose columns are the columns vectors of B, A, so we have:
d1 = A * [s11; s12; ... s1n];
d2 = A * [s21; s22; ... s2n];
...
dn = A * [sn1; sn2; ... s1n];
Note that what we need now to find are all the entries sij for i=1...n and j=1...n. We can do that by left-multiplying both sides by the inverse of A, i.e. by A^(-1).
So, S might look something like this
S = [s11 s12 ... s1n;
s21 s22 ... s2n;
...
sn1 sn2 ... snn;]
If this idea is correct, to find the change of basis matrix S from B to D is really what I'm doing in the code.
Is my idea correct? If not, what's wrong? If yes, can I improve it?
Things become much easier when one has an intuitive understanding of the algorithm.
There are two key points to understand here:
C(B,B) is the identity matrix (i.e., do nothing to change from B to B)
C(E,D)C(B,E) = C(B,D) , think of this as B -> E -> D = B -> D
A direct corollary of 1 and 2 is
C(E,D)C(D,E) = C(D,D), the identity matrix
in other words
C(E,D) = C(D,E)-1
Summarizing.
Algorithm to calculate the matrix C(B,D) to change from B to D:
Define C(B,E) = [b1, ..., bn] (column vectors)
Define C(D,E) = [d1, ..., dn] (column vectors)
Compute C(E,D) as the inverse of C(D,E).
Compute C(B,D) as the product C(E,D)C(B,E).
Example
B = {(1,2), (3,4)}
D = {(1,1), (1,-1)}
C(B,E) = | 1 3 |
| 2 4 |
C(D,E) = | 1 1 |
| 1 -1 |
C(E,D) = | .5 .5 |
| .5 -.5 |
C(B,D) = | .5 .5 | | 1 3 | = | 1.5 3.5 |
| .5 -.5 | | 2 4 | | -.5 -.5 |
Verification
1.5 d1 + -.5 d2 = 1.5(1,1) + -.5(1,-1) = (1,2) = b1
3.5 d1 + -.5 d2 = 3.5(1,1) + -.5(1,-1) = (3,4) = b2
which shows that the columns of C(B,D) are in fact the coordinates of b1 and b2 in the base D.

Solving for a variable in matlab

I have a system of two equations and I need Matlab to solve for a certain variable. The problem is the variable I need is inside an expression, and trig functions. I wrote the following code:
function [ V1, V2 ] = find_voltages( w1, l1, d, w2, G1, G2, m, v, e, h, a, x)
k1 = sqrt((2*V1*e)/(G1^2*m*v^2));
k2 = sqrt((2*V2*e)/(G2^2*m*v^2));
A = h + l1*a;
b = -A*k1*sin(k1*w1) + a*cos(k1*w1);
B = A*cos(k1*w1) + (a/k1)*sin(k1*w1);
C = B + a*b;
c = C*k2*sinh(k2*w2) + b*cosh(k2*w2);
D = C*cosh(k2*w2) + (b/k2)*sinh(k2*w2);
bd = A*k1*sinh(k1*w1) + a*cosh(k1*w1);
Bd = A*cosh(k1*w1) + (a/k1)*sinh(k1*w1);
Cd = Bd + a*bd;
cd = -Cd*k2*sin(k2*w2) + bd*cos(k2*w2);
Dd = Cd*cos(k2*w2) + (bd/k2)*sin(k2*w2);
fsolve([c*(x-(l1+w1+d+w2)) + D == 0, cd*(x-(l1+w1+d+w2)) + Dd == 0], [V1,V2])
end
and got an error because V1 and V2 are not defined. They are part of an expression, and need to be solved for. Is there a way to do this? Also, is it a problem that the functions I put as arguments to solve are conglomerates of the smaller equations above them?
Valid values:
Drift space 1 (l1): 0.11
Quad 1 length (w1): 0.11
Quad 2 length (w2): 0.048
Separation (d): 0.014
Radius of Separation 1 (G1): 0.016
Radius of Separation 2 (G2): 0.01
Voltage 1 (V1): -588.5
Voltage 2 (V2): 418
Kinetic Energy in eV: 15000
Mass (m) 9.109E-31
Kinetic Energy in Joules (K): 2.4E-15
Velocity (v): 72591415.94
Charge on an Electron (e): 1.602E-19
k1^2=(2*V1*e)/(G1^2*m*v^2): 153.4467773
k2^2=(2*V2*e)/(G2^2*m*v^2): 279.015
First start by re-writing your function as an expression which returns the extent to which your function(s) fail to hold for some valid guess for [V1,V2]. E.g.,
function gap = voltage_eqn(V, w1, l1, d, w2, G1, G2, m, v, e, h, a, x)
V1 = V(1) ;
V2 = V(2) ;
k1 = sqrt((2*V1*e)/(G1^2*m*v^2));
k2 = sqrt((2*V2*e)/(G2^2*m*v^2));
A = h + l1*a;
b = -A*k1*sin(k1*w1) + a*cos(k1*w1);
B = A*cos(k1*w1) + (a/k1)*sin(k1*w1);
C = B + a*b;
c = C*k2*sinh(k2*w2) + b*cosh(k2*w2);
D = C*cosh(k2*w2) + (b/k2)*sinh(k2*w2);
bd = A*k1*sinh(k1*w1) + a*cosh(k1*w1);
Bd = A*cosh(k1*w1) + (a/k1)*sinh(k1*w1);
Cd = Bd + a*bd;
cd = -Cd*k2*sin(k2*w2) + bd*cos(k2*w2);
Dd = Cd*cos(k2*w2) + (bd/k2)*sin(k2*w2);
gap(2) = c*(x-(l1+w1+d+w2)) + D ;
gap(1) = cd*(x-(l1+w1+d+w2)) + Dd ;
end
Then call fsolve from some initial V0:
Vf = fsolve(#(V) voltage_eqn(V, w1, l1, d, w2, G1, G2, q, m, v, e, h, a, x), V0) ;