postgreSQL, first date when cummulative sum reaches mark - postgresql

I have the following sample table
And the output should be the first date (for each id) when cum_rev reaches the 100 mark.
I tried the following, because I taught with group bz trick and the where condition i will only get the first occurrence of value higher than 100.
SELECT id
,pd
,cum_rev
FROM (
SELECT id
,pd
,rev
,SUM(rev) OVER (
PARTITION BY id
ORDER BY pd
) AS cum_rev
FROM tab1
)
WHERE cum_rev >= 100
GROUP BY id
But it is not working, and I get the following error. And also when I add an alias is not helping
ERROR: subquery in FROM must have an alias LINE 4: FROM (
^ HINT: For example, FROM (SELECT ...) [AS] foo.
So the desired output is:
2 2015-04-02 135.70
3 2015-07-03 102.36
Do I need another approach? Can anyone help?
Thanks

demo:db<>fiddle
SELECT
id, total
FROM (
SELECT
*,
SUM(rev) OVER (PARTITION BY id ORDER BY pd) - rev as prev_total,
SUM(rev) OVER (PARTITION BY id ORDER BY pd) as total
FROM tab1
) s
WHERE total >= 100 AND prev_total < 100
You can use the cumulative SUM() window function for each id group (partition). To find the first which goes over a threshold you need to check the previous value for being under the threshold while the current one meets it.
PS: You got the error because your subquery is missing an alias. In my example its just s

Related

How can I rank a table in postgresql and then find the rank of a specific row?

I have a postgresql table
cubing=# SELECT * FROM times;
count | name | time
-------+---------+--------
4 | sean | 32.97
5 | Austin | 15.64
6 | Kirk | 117.02
I retrieve all from it with SELECT * FROM times ORDER BY time ASC. But now I want to give the user the option to search for a specific value (say, WHERE name = Austin) and have it tell them what rank they are in the table. Right now, I have SELECT name,time, RANK () OVER ( ORDER BY time ASC) rank_number FROM times. From how I understand it, that is giving me the rank of the entire table. I would like the rank, name, and time of who I am searching for. I am afraid if I added a where clause to my last SELECT statement with the name Austin, it would only find where the name equals Austin and rank those, rather than the rank of Austin in the rest of the table.
thanks for reading
I think the behavior you want here is to first rank your current data, then query it with some WHERE filter:
WITH cte AS (
SELECT *, RANK() OVER (ORDER BY time) rank_number
FROM times
)
SELECT count, name, time
FROM cte
WHERE name = 'Austin';
The point here is that at the time we do a query searching for Austin, the ranks for each row in your original table have already been generated.
Edit:
If you're running this query from an application, it would probably be best to avoid CTE syntax. Instead, just inline the CTE as a subquery:
SELECT count, name, time, rank_number
FROM
(
SELECT *, RANK() OVER (ORDER BY time) rank_number
FROM times
) t
WHERE name = 'Austin';

SQL query to put a number in a column and put an incremented number when there is a new text in a column

I have a query SELECT * from TABLE which gives the result as below table:
Expected column is as below:
I want to frame a new column like whenever we get the value as 0 then the number should be incremented by 1. I tried DENSE_RANK() , ROW_NUMBER() but couldn't get the exact result which mentioned. Is that possible in PostgreSQL.
Try This:
select name, value,
sum(case when value=0 then 1 else 0 end) over (order by "sno")
from (
select row_number() over() as "sno",* from example
) tab
DEMO
NOTE: Please note that there is no guaranteed that you will get same output always due no ordering field in your raw data.
So Better approach is to add some field in your view output by which it can be ordered and run the query like below:(assuming you have a ID field)
select
name,
value,
sum(case when value=0 then 1 else 0 end) over (order by id)
from example
DEMO

Managing overflows in LISTAGG on Amazon Redshift

Using the example from this post: https://blogs.oracle.com/datawarehousing/entry/managing_overflows_in_listagg
The following statement:
SELECT
deptno,
LISTAGG(ename, ';') WITHIN GROUP (ORDER BY empno) AS namelist
FROM emp
GROUP BY deptno;
will generate the following output:
DEPTNO NAMELIST
---------- ----------------------------------------
10 CLARK;KING;MILLER
20 SMITH;JONES;SCOTT;ADAMS;FORD
30 ALLEN;WARD;MARTIN;BLAKE;TURNER;JAMES
Let’s assume that the above statement does not run and that we have a limit of 15 characters that can be returned by each row in our LISTAGG function. This is in actuality 65535 on Amazon Redshift.
We would want the following to be returned in this case:
DEPTNO NAMELIST
---------- ----------------------------------------
10 CLARK;KING
10 MILLER
20 SMITH;JONES
20 SCOTT;ADAMS
20 FORD
30 ALLEN;WARD
30 MARTIN;BLAKE
30 TURNER;JAMES
What would be the best way to recreate this result in Amazon Redshift to avoid any data loss and taking speed into consideration?
It's possible to achieve this with 2 subquery:
First:
SELECT id, field,
sum(length(field) + 1) over
(partition by id order by RANDOM() rows unbounded preceding) as total_length_now
from my_schema.my_table)
Initially we want to calculate how many chars we have for each id in our table. We can use a window function to calculate it incrementally for each row. In the 'order by' statement you can use any unique field that you have. If you don't have one, you can simply use random or an hash function, but is mandatory that the field is unique, if not, the function will not work as we want.
The '+1' in the length represent the semicolon that we will use in the listagg function.
Second:
SELECT id, field, total_length_now / 65535 as sub_id
FROM (sub_query_1)
Now we create a sub_id based on the length that we calculated before. If the total_length_now exceed the limit size (in this case 65535) the division's rest will return a new sub_id.
Last Step
SELECT id, sub_id, listagg(field, ';') as namelist
FROM (sub_query_2)
GROUP BY id, sub_id
ORDER BY id, sub_id
Now we can simply call the listagg function grouping by id and sub_id, since each group cannot exceed the size limit.
Complete query
SELECT id, sub_id, listagg(field, ';') as namelist
FROM (
SELECT id, field, total_length_now / 65535 as sub_id
FROM (SELECT id,
field,
sum(length(field) + 1) over
(partition by id order by field rows unbounded preceding) as total_length_now
from support.test))
GROUP BY id, sub_id
order by id, sub_id
Example with your data (with size limit = 10)
First and second query output:
id, field, total_length_now, sub_id
10,KING,5,0
10,CLARK,11,1
10,MILLER,18,1
20,ADAMS,6,0
20,SMITH,12,1
20,JONES,18,1
20,FORD,23,2
20,SCOTT,29,2
30,JAMES,6,0
30,BLAKE,12,1
30,WARD,17,1
30,MARTIN,24,2
30,TURNER,31,3
30,ALLEN,37,3
Final query output:
id,sub_id,namelist
10,0,KING
10,1,CLARK;MILLER
20,0,ADAMS
20,1,SMITH;JONES
20,2,FORD;SCOTT
30,0,JAMES
30,1,BLAKE;WARD
30,2,MARTIN
30,3,TURNER;ALLEN
It is possible to create a partial list, and then the rest of values as separate rows in one go, but if the number of rows is unconstrained you really need a loop statement to then convert that into a list, and the rows for remaining and so on.
So this is really a task for Apache Spark (or any other map-reduce technology).

How to reference output rows with window functions?

Suppose I have a table with quantity column.
CREATE TABLE transfers (
user_id integer,
quantity integer,
created timestamp default now()
);
I'd like to iteratively go thru a partition using window functions, but access the output rows, not the input table rows.
To access the input table rows I could do something like this:
SELECT LAG(quantity, 1, 0)
OVER (PARTITION BY user_id ORDER BY created)
FROM transfers;
I need to access the previous output row to calculate the next output row. How can i access the lag row in the output? Something like:
CREATE VIEW balance AS
SELECT LAG(balance.total, 1, 0) + quantity AS total
OVER (PARTITION BY user_id ORDER BY created)
FROM transfers;
Edit
This is a minimal example to support the question of how to access the previous output row within a window partition. I don't actually want a sum.
It seems you attempt to calculate a running sum. Luckily that's just what Sum() window function does:
WITH transfers AS(
SELECT i, random()-0.3 AS quantity FROM generate_series(1,100) as i
)
SELECT i, quantity, sum(quantity) OVER (ORDER BY i) from transfers;
I guess, looking at the question, that the only you need is to calculate a cumulative sum.
To calculate a cumulative summ use this query:
SELECT *,
SUM( CASE WHEN quantity IS NULL THEN 0 ELSE quantity END)
OVER ( PARTITION BY user_id ORDER BY created
ROWS BETWEEN unbounded preceding AND current row
) As cumulative_sum
FROM transfers
ORDER BY user_id, created
;
But if you want more complex calculations, especially containing some conditions (decisions) that depend on a result from prevoius row, then you need a recursive approach.

Trouble in calculating the field while creating view in postgresql

I have two tables q1data and q1lookup in postgres database. q1data contains 3 columns (postid, reasonid, other) and q1lookup contains 2 columns (reasonid, reason).
I am trying to create a view which will include 4 columns (reasonid, reason, count, percentage). count is the count of each reason and percentage should be each count divided by total of count(*) from q1data (i.e. total rows if reasonid).
But it gives an error and says syntax error near count(*). The following is the code I am using. Please help.
select
cwfis_web.q1data.reasonid AS reasonid,
cwfis_web.q1lookup.reason AS reason,
count(cwfis_web.q1data.reasonid) AS count,
round(
(
(
count(cwfis_web.q1data.reasonid)
/
(select count(0) AS count(*) from cwfis_web.q1data)
) * 100
)
,0) AS percentage
from
cwfis_web.q1data
join
cwfis_web.q1lookup
ON cwfis_web.q1data.reasonid = cwfis_web.q1lookup.reasonid
group by
cwfis_web.q1data.reasonid;
Firstly, you have a completely invalid piece of syntax there: count(0) AS count(*). Replacing that with a plain count(*), and adding the missing Group By entry for reason, gives this:
select
cwfis_web.q1data.reasonid AS reasonid,
cwfis_web.q1lookup.reason AS reason,
count(cwfis_web.q1data.reasonid) AS count,
round(
(
(
count(cwfis_web.q1data.reasonid)
/
(select count(*) from cwfis_web.q1data)
) * 100
)
,0) AS percentage
from
cwfis_web.q1data
join
cwfis_web.q1lookup
ON cwfis_web.q1data.reasonid = cwfis_web.q1lookup.reasonid
group by
cwfis_web.q1data.reasonid,
cwfis_web.q1lookup.reason;
However, as this live demo shows this doesn't give the right value for percentage, because count(cwfis_web.q1data.reasonid) and (select count(*) from cwfis_web.q1data) are both of type integer, so integer division is performed, and the result truncated to 0.
If you cast these to numeric (the expected argument type of the 2-parameter round() function, you get this:
select
cwfis_web.q1data.reasonid AS reasonid,
cwfis_web.q1lookup.reason AS reason,
count(cwfis_web.q1data.reasonid) AS count,
round(
(
(
count(cwfis_web.q1data.reasonid)::numeric
/
(select count(*) from cwfis_web.q1data)::numeric
) * 100
)
,0) AS percentage
from
cwfis_web.q1data
join
cwfis_web.q1lookup
ON cwfis_web.q1data.reasonid = cwfis_web.q1lookup.reasonid
group by
cwfis_web.q1data.reasonid,
cwfis_web.q1lookup.reason;
Which as this live demo shows gives something more like you were hoping for. (Alternatively, you can cast to float, and lose the ,0 argument to round(), as in this demo.)
Try changing your subquery from
select count(0) AS count(*) from cwfis_web.q1data
to
select count(0) from cwfis_web.q1data
Also you need to add cwfis_web.q1lookup.reason to group by.