PostgreSQL, Variables - postgresql

I am using PostgreSQL through Npgsql driver for windows/NET and I see that it is possible to use PL/pgSQL language through it.
So that way I can make use of variables for my calculation scripts which may look like in this example:
DO $$
DECLARE
tlist text='mylistofbills';
tcontent text='mycontentofbills';
BEGIN
CREATE TEMP TABLE tlist
(billno integer, bdate timestamp, rebate double precision)
ON COMMIT DROP;
INSERT INTO tlist
VALUES (1, '10.01.2017. 10:14:56', 10),
(2, '10.01.2017. 11:02:13', 5),
(3, '10.01.2017. 11:45:22', 0),
(4, '10.01.2017. 12:01:01', 6);
CREATE TEMP TABLE tcontent
(billno integer, rowno integer, price double precision, tax double precision)
ON COMMIT DROP;
INSERT INTO tcontent
VALUES (1, 1, 100, 19),
(1, 2, 30, 0),
(2, 1, 20, 19),
(3, 1, 18, 19),
(4, 1, 43, 0);
END $$;
SELECT s.price,
l.rebate,
s.price/100*l.rebate AS valrebate,
s.price-(s.price/100*l.rebate) AS worebate,
((s.price-(s.price/100*l.rebate))/100)*s.tax AS valtax,
s.price-(s.price/100*l.rebate)+(((s.price-(s.price/100*l.rebate))/100)*s.tax) AS finalprice
FROM tlist l, tcontent s
WHERE l.billno=s.billno;
Example is simplified (from real situation) and is suitable for pasting into PgAdmin's SQL editor.
So, now is question: Can I somehow in the body of those code, without adding new functions to server use formulas for writing more elegant and readable code?
If I would be able to add simple formulas like:
rebatec=s.price/100*l.rebate
priceworebate=s.price-rebatec
Then my code may look more readable and less error prone.
Like this:
SELECT s.price,
l.rebate,
rebatec AS valrebate,
priceworebate AS worebate,
(priceworebate/100)*s.tax AS valtax,
priceworebate+((priceworebate/100)*s.tax) AS finalprice
FROM tlist l, tcontent s
WHERE l.billno=s.billno;
If that may be possible where and how to put this formulas so it can be used in my last SELECT code?
SOLUTION:
Based on #Clodoaldo's answer which give something new to me I find a solution which I am able to understand:
SELECT s.price,
l.rebate,
rebatec AS valrebate,
priceworebate AS worebate,
priceworebate/100*s.tax AS valtax,
priceworebate+priceworebate/100*s.tax AS finalprice
FROM tlist l, tcontent s, LATERAL
(SELECT s.price/100*l.rebate AS rebatec,
s.price-s.price/100* l.rebate AS priceworebate
)sub
WHERE l.billno=s.billno;
It works and I hope it is technically correct.

Use lateral:
The LATERAL key word can precede a sub-SELECT FROM item. This allows the sub-SELECT to refer to columns of FROM items that appear before it in the FROM list.
select
s.price,
l.rebate,
rebatec as valrebate,
priceworebate as worebate,
priceworebate / 100 * s.tax as valtax,
priceworebate + priceworebate / 100 * s.tax as finalprice
from
tlist l
inner join
tcontent s using (billno)
cross join lateral (
select
s.price / 100 * l.rebate as rebatec,
s.price - s.price / 100 * l.rebate as priceworebate
) cjl
Use the modern join syntax.

You could use a subquery to define those variables:
select var1 * col3
from (
select col1 / col2 as var1
, *
from YourTable
) sub
Or alternatively a common table expression:
with cte as
(
select col1 / col2 as var1
, *
from YourTable
)
select var1 * col3
from cte

Related

PostgreSQL: replace generate_series with an array

I have a working SQL code that creates geometries according to numbers generated from a generate_series function:
CREATE TEMPORARY TABLE catchments ON COMMIT DROP AS (
SELECT lims, ST_ConcaveHull(the_geom, alpha_factor) AS the_geom_overlap FROM (
SELECT lims, ST_MakeValid(ST_Collect(n.the_geom)) AS the_geom
FROM generate_series(1, 10, 2) AS lims, pgr_drivingDistance(
'SELECT id, source, target, cost, reverse_cost FROM edges',
vertex_id, lims, true
) a, nodes n WHERE a.node = n.vid
GROUP BY lims
) AS conv_hull
ORDER BY lims DESC
);
Now I need to replace the fixed interval series by an array with varying intervals, e.g. [1, 2, 5, 7, 8].
Is there a simple way to "convert" the generate_series by an array with the same logic? I would like to avoid using a for loop if possible.
FROM unnest(ARRAY[1,2,5,7,8]) AS lims
should do it.

TSQL - in a string, replace a character with a fixed one every 2 characters

I can't replace every 2 characters of a string with a '.'
select STUFF('abcdefghi', 3, 1, '.') c3,STUFF('abcdefghi', 5, 1,
'.') c5,STUFF('abcdefghi', 7, 1, '.') c7,STUFF('abcdefghi', 9, 1, '.')
c9
if I use STUFF I should subsequently overlap the strings c3, c5, c7 and c9. but I can't find a method
can you help me?
initial string:
abcdefghi
the result I would like is
ab.de.gh.
the string can be up to 50 characters
Create a numbers / tally / digits table, if you don't have one already, then you can use this to target each character position:
with digits as ( /* This would be a real table, here it's just to test */
select n from (values(1),(2),(3),(4),(5),(6),(7),(8),(9),(10))x(n)
), t as (
select 'abcdefghi' as s
)
select String_Agg( case when d.n%3 = 0 then '.' else Substring(t.s, d.n, 1) end, '')
from t
cross apply digits d
where d.n <Len(t.s)
Using for xml with existing table
with digits as (
select n from (values(1),(2),(3),(4),(5),(6),(7),(8),(9),(10))x(n)
),
r as (
select t.id, case when d.n%3=0 then '.' else Substring(t.s, d.n, 1) end ch
from t
cross apply digits d
where d.n <Len(t.s)
)
select result=(select '' + ch
from r r2
where r2.id=r.id
for xml path('')
)
from r
group by r.id
You can try it like this:
Easiest might be a quirky update ike here:
DECLARE #string VARCHAR(100)='abcdefghijklmnopqrstuvwxyz';
SELECT #string = STUFF(#string,3*A.pos,1,'.')
FROM (SELECT TOP(LEN(#string)/3) ROW_NUMBER() OVER(ORDER BY (SELECT NULL))
FROM master..spt_values) A(pos);
SELECT #string;
Better/Cleaner/Prettier was a recursive CTE:
We use a declared table to have some tabular sample data
DECLARE #tbl TABLE(ID INT IDENTITY, SomeString VARCHAR(200));
INSERT INTO #tbl VALUES('')
,('a')
,('ab')
,('abc')
,('abcd')
,('abcde')
,('abcdefghijklmnopqrstuvwxyz');
--the query
WITH recCTE AS
(
SELECT ID
,SomeString
,(LEN(SomeString)+1)/3 AS CountDots
,1 AS OccuranceOfDot
,SUBSTRING(SomeString,4,LEN(SomeString)) AS RestString
,CAST(LEFT(SomeString,2) AS VARCHAR(MAX)) AS Growing
FROM #tbl
UNION ALL
SELECT t.ID
,r.SomeString
,r.CountDots
,r.OccuranceOfDot+2
,SUBSTRING(RestString,4,LEN(RestString))
,CONCAT(Growing,'.',LEFT(r.RestString,2))
FROM #tbl t
INNER JOIN recCTE r ON t.ID=r.ID
WHERE r.OccuranceOfDot/2<r.CountDots-1
)
SELECT TOP 1 WITH TIES ID,Growing
FROM recCTE
ORDER BY ROW_NUMBER() OVER(PARTITION BY ID ORDER BY OccuranceOfDot DESC);
--the result
1
2 a
3 ab
4 ab
5 ab
6 ab.de
7 ab.de.gh.jk.mn.pq.st.vw.yz
The idea in short
We use a recursive CTE to walk along the string
we add the needed portion together with a dot
We stop, when the remaining length is to short to continue
a little magic is the ORDER BY ROW_NUMBER() OVER() together with TOP 1 WITH TIES. This will allow all first rows (frist per ID) to appear.

How can I calculate the correlation between two columns in Amazon Redshift?

Let's say that I have the following table
create temporary table source( col1 float, col2 float)
;
insert into source(col1,col2)
values (1, 10020)
,(2, 20062)
,(3, 30083)
,(4, 40099)
,(5, 50012)
,(6, 60035)
,(7, 70087)
,(8, 80015)
,(9, 90039)
,(10,100099);
where the col1, col2 are clearly correlated (col2 = col1 * 10000 + small random quantity)
Is there any easy way to calculate any kind of correlation coefficient between columns in redshift?
As far as I know Redshift does not support CORR(x,y) or COVAR_POP(x,y), as it is explicitly mentioned in Unsupported PostgreSQL functions.
The (unsupported) PostgresSQL's corr(y,x) would have calculated the correlation coefficient.
Since that function is not supported in Redshift you can calculate the correlation coefficent in plain SQL with
with t1 as (
select
col1
,avg(col1) over() as avg_col1
,col2
,avg(col2) over() as avg_col2
from source
)
select
sum( (col1-avg_col1) *(col2-avg_col2)) as numerator
,sqrt(sum( (col1-avg_col1)^2)) * sqrt(sum( (col2-avg_col2)^2)) as denominator
,numerator/denominator as correlation_coeff
from t1;
which gives the following result using the OP dataset:
numerator,denominator,correlation_coeff
825098.5,825099.0469993587,0.999999337050066
with values :
drop table source;
create temporary table source( col1 float, col2 float)
;
insert into source(col1,col2)
values (1, 10020)
,(2, 20062)
,(3, 30083)
,(4, 40099)
,(5, 50012)
,(6, 60035)
,(7, 70087)
,(8, 40015)
,(9, 40039)
,(10,70000);
it gives a correlation coefficient of 0.7599329507675849 which matches what I get using R to calculate the same:
var1 <- c(1,2,3,4,5,6,7,8,9,10)
var2 <- c(10020,20062, 30083, 40099, 50012,60035, 70087, 40015, 40039, 70000)
cor.test(var1,var2, method="pearson")$estimate # 0.759933

Using IN with a range of values?

I have an attribute
earlier it could only have one fixed value.
Where xAttribute = fixedValue
But now it could be in a range of values,
I know I can use the IN operator
But How can I return/get the range of values?
I don't think I can use a table value function with IN?
WHERE xAttribute IN (myTableValueFunc())
or to make this actually work,
WHERE xAttribute IN (Select Id from myTableValueFunc())
Is there an another or a, better way?
If your table-valued function returns a table, then you can join with it...
Sample data
CREATE FUNCTION myTableValueFunc()
RETURNS TABLE
AS
RETURN
-- implement your function
SELECT x.Id
FROM (values (100), (200), (300)) x(Id);
create table SomeTable
(
SomeField nvarchar(1),
xAttribute int
);
insert into SomeTable (SomeField, xAttribute) values
('A', 100),
('B', 200),
('C', 300),
('D', 400),
('E', 500);
Solution
select st.SomeField, st.xAttribute
from SomeTable st
join myTableValueFunc() mtvf
on mtvf.Id = st.xAttribute;
Result
SomeField xAttribute
--------- ----------
A 100
B 200
C 300
Fiddle to see things in action.

T-SQL grouping question

Every once and a while I have a scenario like this, and can never come up with the most efficient query to pull in the information:
Let's say we have a table with three columns (A int, B int, C int). My query needs to answer a question like this: "Tell me what the value of column C is for the largest value of column B where A = 5." A real world scenario for something like this would be 'A' is your users, 'B' is the date something happened, and 'C' is the value, where you want the most recent entry for a specific user.
I always end up with a query like this:
SELECT
C
FROM
MyTable
WHERE
A = 5
AND B = (SELECT MAX(B) FROM MyTable WHERE A = 5)
What am I missing to do this in a single query (opposed to nesting them)? Some sort of 'Having' clause?
BoSchatzberg's answer works when you only care about the 1 result where A=5. But I suspect this question is the result of a more general case. What if you want to list the top record for each distinct value of A?
SELECT t1.*
FROM MyTable t1
INNER JOIN
(
SELECT A, MAX(B)
FROM MyTable
GROUP BY A
) t2 ON t1.A = t2.A AND t1.B = t2.B
--
SELECT C
FROM MyTable
INNER JOIN (SELECT A, MAX(B) AS MAX_B FROM MyTable GROUP BY A) AS X
ON MyTable.A = X.A
AND MyTable.B = MAX_B
--
WHERE MyTable.A = 5
In this case the first section (between the comments) can also easily be moved into a view for modularity or reuse.
You can do this:
SELECT TOP 1 C
FROM MyTable
WHERE A = 5
ORDER BY b DESC
I think you are close (and what you have would work). You could use something like the following:
select C
, max(B)
from MyTable
where A = 5
group by C
After a little bit of testing, I don't think that this can be done without doing it the way you're already doing it (i.e. a subquery). Since you need the max of B and you can't get the value of C without also including that in a GROUP BY or HAVING clause, a subquery seems to be the best way.
create table #tempints (
a int,
b int,
c int
)
insert into #tempints values (1, 8, 10)
insert into #tempints values (1, 8, 10)
insert into #tempints values (2, 4, 10)
insert into #tempints values (5, 8, 10)
insert into #tempints values (5, 3, 10)
insert into #tempints values (5, 7, 10)
insert into #tempints values (5, 8, 15)
/* this errors out with "Column '#tempints.c' is invalid in the select list because it is not contained in either an
aggregate function or the GROUP BY clause." */
select t1.c, max(t1.b)
from #tempints t1
where t1.a=5
/* this errors with "An aggregate may not appear in the WHERE clause unless it is in a subquery contained in a HAVING
clause or a select list, and the column being aggregated is an outer reference." */
select t1.c, max(t1.b)
from #tempints t1, #tempints t2
where t1.a=5 and t2.b=max(t1.b)
/* errors with "Column '#tempints.a' is invalid in the HAVING clause because it is not contained in either an aggregate
function or the GROUP BY clause." */
select c
from #tempints
group by b, c
having a=5 and b=max(b)
drop table #tempints