How can I gradually add calculated field results from previous rows? E.g.,
row A = 1 + (row A column 1 value * row A column 2 value)
row B = row A + (row B column 1 value * row B column 2 value)
row C = row B + (row C column 1 value * row C column 2 value)
I've been trying for quite some time to work this logic out in Postgres. Here is my pseudocode:
select
case
when row_number() = 1 then 1 + (colA * colB) as a
else (lag(a) partition by... ) + (colA * colB)
end as new_value
from
...
Essentially, I think I need a lag function to get previous calculated results, but value is held for the next row in field new_value. So then I need to set the next row to pass in new_value to the equation, and so on, recursively.
EDIT:
row A col_3 = if (row A col_1 <= 1 then 1), else if (row A col_1 = 2 then 6), else 1 * col_2
row B col_3 = if (row B col_1 <= 1 then 1), else if (row B col_1 = 2 then 6), else row A * col_2
row C col_3 = if (row C col_1 <= 1 then 1), else if (row C col_1 = 2 then 6), else row B * col_2
Related
How can I dynamically multiply the previous row's results:
row A col_3 = if (row A col_1 <= 1 then 1), else (1 * col_2)
row B col_3 = if (row B col_1 <= 1 then 1), else (row A * col_2)
row C col_3 = if (row C col_1 <= 1 then 1), else (row B * col_2)
I've attempted this in Postgres. Here's
sum (
case
when col_1 <= 1 then 1
else (lag(col_3) over (...)) * col_2 <-- I'm aware you cannot use a lag function within a sum/window function
end
) over (order by ...) as col_3
Note: I've asked a similar question here (thanks #Bergi!), but I'm not sure how to implement that answer for this purpose.
EDIT:
Logic:
if (previous_interval is null) previous_interval = 1
if (curr_repetition = 1) interval = 1
else if (curr_repetition = 2) interval = 6
else interval = previous_interval * easiness
I have a data frame like this one given below. Essentially it is a time series derived data frame.
My issue is that the Formula for n-th Row Col C is :-
Col(C) = (Col A(nth row) - Col A(n-1 th row)) + Col C(n-1)th row.
Hence Calculation of Col C is self referencing a previous value of Col C. I am using spark sql, can some one please advise how to proceed with this? For the calculation of Col A I am using LAG function
It seems colC is just colA minus colA in the first row.
e.g.
1 = 6-5,
0 = 5-5,
2 = 7-5,
3 = 8-5,
-2 = 3-5
So this query should work:
SELECT colA, colA - FIRST(colA) OVER (ORDER BY id) AS colC
Your formula is a cumulative sum. Here is a complete example:
SELECT rowid, a, SUM(c0) OVER(ORDER BY rowid) as c
FROM
(
SELECT rowid, a, a - LAG(a, 1) OVER(ORDER BY rowid) as c0
FROM
(
SELECT 1 as rowid, 5 as a union all
SELECT 2 as rowid, 6 as a union all
SELECT 3 as rowid, 5 as a union all
SELECT 4 as rowid, 7 as a union all
SELECT 5 as rowid, 8 as a union all
SELECT 6 as rowid, 3 as a
)t
)t
Issue
I'm working a project with kernels in databases, and my PostgreSQL skills have hit a wall. I am joining between two tables to compute the cross product, i.e.
SELECT (d1.a * d2.a + d1.b * d2.b) AS dot FROM data d1, data d2
This gives me the cross product between all vectors. Having the following data in my table
a | b | c
---+---+---
1 | 1 | 1
2 | 2 | 2
3 | 3 | 3
The above command yields
dot
-----
2
4
6
...
If I want to compute the dot product between, say, row 2 with its preceding row and its following row, how would I do that efficiently?
Attempts
I have tried to use window functions, but failed to do so since they only compute aggregate functions. I want to join a row with its neighbouring rows (i.e. its window), but not compute any aggregate over these. Something along these lines:
SELECT a * a + b * b + c * c
OVER(rows between 1 preceding and 1 following) as value FROM data data;
I have also tried to use row_number() OVER() which works. But this seems clumsy and inefficient with nested subqueries.
SELECT d1.a * d3.a + d1.b * d3.b + d1.c * d3.c
FROM data d1,
(SELECT * FROM
(SELECT *, row_number() OVER() as index from data) d2
WHERE d2.index >= 1 AND d2.index <=3) d3;
Lastly, I tried to dig into LATERALs with no luck.
Any thoughts?
You can get the values of preceding/following rows by lag()/lead().
If the order of rows is determined by a, the query would be like:
SELECT
a,
(lag(a, 1, 0) OVER (ORDER BY a)) * (lead(a, 1, 0) OVER (ORDER BY a))
+ (lag(b, 1, 0) OVER (ORDER BY a)) * (lead(b, 1, 0) OVER (ORDER BY a))
+ (lag(c, 1, 0) OVER (ORDER BY a)) * (lead(c, 1, 0) OVER (ORDER BY a)) AS dot_preceding_and_following
FROM ( VALUES
(1, 1, 1),
(2, 2, 2),
(3, 3, 3)
) T(a, b, c)
ORDER BY
a
;
I am building one complex report with T-SQL. User gave me the table that is source for the report and has about 10 milion rows. Table contains descriptive attributes and numeric columns, something like this:
segment product_group gmis lpt numeric_field1 numeric_field2 numeric_field3
Report have about thousand rows, and the definition of the report goes row by row something like this:
'Name of the row one' - sum of the numeric_field1 for segment in 3,4,5 and lpt = 3 and gmis <> 50
'Row number two' - sum of num2 for segment in 1,2,3 and lpt <> 5, and gmis = 7
'Row number 3' - you take num2 + num3 for product_id = 7
'Row number 4' - row 1 + row 2
So I end up with t-sql query that have separate select for each row followed by union all:
'Row number 1' name, (select sum(num1) from source_table where segment in (3,4,5) and lpt=3 and gmis <> 50) value
union
'Row number 2' , (select sum(num2) from source_table where segment in (1,2,3) and lpt<> 5 and gmis = 7)
union
'Row number 3' , (select sum(num2 + num3) from source_table where product_id = 7)
.....
....
etc
Am I missing some smarter way to do this kind of query?? Because the report is very slow....
Assuming that you are always selecting from the same base table and always aggregating without grouping, the following should suit your purpose.
Instead of combining multiple selects, you should aim to perform a set of operations on a single table. Take your where clauses and combine them into a select statement as a set of when clauses to calculate the base values used for aggregation. This means that you only have to read the source table once:
Select
Case When segment in (3,4,5) and lpt = 3 and gmis <> 50 Then num1 Else 0 End As row1,
Case When segment in (1,2,3) and lpt<> 5 and gmis = 7 Then num1 Else 0 End As row2,
Case When product_id = 7 then (num2+num3) Else 0 End As row3
From
source_table
Use this as a table expression and perform the aggregation selecting from the table expression. In this example, I have used a CTE (Common Table Expression):
;With Column_Values
(
row1,
row2,
row3
)
As
(
Select
Case When segment in (3,4,5) and lpt = 3 and gmis <> 50 Then num1 Else 0 End As row1,
Case When segment in (1,2,3) and lpt<> 5 and gmis = 7 Then num1 Else 0 End As row2,
Case When product_id = 7 then (num2+num3) Else 0 End As row3
From
source_table
)
Select
name,
value
From
(
Select
Sum(row1) As [Row number 1],
Sum(row2) As [Row number 2],
Sum(row3) As [Row number 3],
Sum(row1) + Sum(row2) As [Row number 4]
From
Column_Values
)
You now have the values as a series of columns. To convert them to rows, use the unpivot command:
;With Column_Values
(
row1,
row2,
row3
)
As
(
Select
Case When segment in (3,4,5) and lpt = 3 and gmis <> 50 Then num1 Else 0 End As row1,
Case When segment in (1,2,3) and lpt<> 5 and gmis = 7 Then num1 Else 0 End As row2,
Case When product_id = 7 then (num2+num3) Else 0 End As row3
From
source_table
)
Select
name,
value
From
(
Select
Sum(row1) As [Row number 1],
Sum(row2) As [Row number 2],
Sum(row3) As [Row number 3],
Sum(row1 + row2) As [Row number 4]
From
Column_Values
) pvt
Unpivot
(
value For name In
(
[Row number 1],
[Row number 2],
[Row number 3],
[Row number 4]
)
) As upvt
If you are still having performance issues, there are the following options you may wish to consider, which will depend on your requirements:
• If you don't need to report live data you could pre-calculate the values, store them in another table and report from this table.
• If you have the Enterprise edition and don't need to report live data, you could put a columnstore index on the source table - in SQL Server 2012 this makes the table read only, so it would have to dropped and recreated each time new data is loaded into the source table. Columnstore indexes can offer a massive performance boost when aggregating large tables.
• If you need to report live data, you can use an indexed view. This may improve performance when aggregating data in the view.
I've this table
col 1 col 2 col 3 .... col N
-------------------------------------
1 A B fooa
10 A foo cc
4 A B fooa
it is possible with a tsql query to return only one row with a value only where the values are ALL the same?
col 1 col 2 col 3 .... col N
-------------------------------------
-- A -- --
SELECT
CASE WHEN COUNT(col1) = COUNT(*) AND MIN(col1) = MAX(col1) THEN MIN(col1) END AS col1,
CASE WHEN COUNT(col2) = COUNT(*) AND MIN(col2) = MAX(col2) THEN MIN(col2) END AS col2,
...
FROM yourtable
You have to allow for NULLs in the column:
COUNT(*) counts them
COUNT(col1) doesn't count them
That is, a columns with a mix of As and NULLs isn't one value. MIN and MAX would both give A because they ignore NULLs.
Edit:
removed DISTINCT to get counts the same for NULL check
added MIN/MAX check (as per Mark Byers deleted answer) to check uniqueness