How to count rows after the occurence of a value by group (postgresql) - postgresql

I have for example the following table:
Name
Day
Healthy
Jon
1
No
Jon
2
Yes
Jon
3
Yes
Jon
4
Yes
Jon
5
No
Mary
1
Yes
Mary
2
No
Mary
3
Yes
Mary
4
No
Mary
5
Yes
I want to add a column which counts the number of following days after day X a person was healthy:
Name
Day
Healthy
Number of days the person was healthy after day X (incl.)
Jon
1
No
3
Jon
2
Yes
3
Jon
3
Yes
2
Jon
4
Yes
1
Jon
5
No
0
Mary
1
Yes
3
Mary
2
No
2
Mary
3
Yes
2
Mary
4
No
1
Mary
5
Yes
1
Is it possible to use some sort of window function to create such a column? Thanks a lot for the help!

There are a couple of ways to do this with a window function. One is to order by day descending and use the default window. The other is to specify the window from the current row to the end of the partition.
This example casts the boolean healthy as an int so that it can be summed. If your table has literal Yes and No strings, then you can use sum((healthy = 'yes')::int) over (...) to achieve the same thing.
select name, day,
sum(healthy::int)
over (partition by name
order by day
rows between current row
and unbounded following) as num_subsequent_health_days
from my_table;
name | day | num_subsequent_health_days
:--- | --: | -------------------------:
Jon | 1 | 3
Jon | 2 | 3
Jon | 3 | 2
Jon | 4 | 1
Jon | 5 | 0
Mary | 1 | 3
Mary | 2 | 2
Mary | 3 | 2
Mary | 4 | 1
Mary | 5 | 1
db<>fiddle here

I assume your relation has the following schema:
CREATE TABLE test(name text, day int, healthy boolean);
Then this should produce the desired result:
SELECT name, day, sum(mapped) OVER (PARTITION BY name ORDER BY day DESC RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) FROM (SELECT name, day, CASE WHEN healthy THEN 1 ELSE 0 END AS mapped FROM test) sub ORDER BY name, day;

Related

How would you create a group identifier based on one column, but sorted by another?

I am attempting to create column Group via T-SQL.
If a cluster of accounts are in a row, consider that as one group. if the account is seen again lower in the list (cluster or not), then consider it a new group. This seems straight forward, but I cannot seem to see the solution... Below there are three clusters of account 3456, each having a different group number (Group 1,4, and 6)
+-------+---------+------+
| Group | Account | Sort |
+-------+---------+------+
| 1 | 3456 | 1 |
| 1 | 3456 | 2 |
| 2 | 9878 | 3 |
| 3 | 5679 | 4 |
| 4 | 3456 | 5 |
| 4 | 3456 | 6 |
| 4 | 3456 | 7 |
| 5 | 1295 | 8 |
| 6 | 3456 | 9 |
+-------+---------+------+
UPDATE: I left this out of the original requirements, but a cluster of accounts could have more than two accounts. I updated the example data to include this scenario.
Here's how I'd do it:
--Sample Data
DECLARE #table TABLE (Account INT, Sort INT);
INSERT #table
VALUES (3456,1),(3456,2),(9878,3),(5679,4),(3456,5),(3456,6),(1295,7),(3456,8);
--Solution
SELECT [Group] = DENSE_RANK() OVER (ORDER BY grouper.groupID), grouper.Account, grouper.Sort
FROM
(
SELECT t.*, groupID = ROW_NUMBER() OVER (ORDER BY t.sort) +
CASE t.Account WHEN LEAD(t.Account,1) OVER (ORDER BY t.sort) THEN 1 ELSE 0 END
FROM #table AS t
) AS grouper;
Results:
Group Account Sort
------- ----------- -----------
1 3456 1
1 3456 2
2 9878 3
3 5679 4
4 3456 5
4 3456 6
5 1295 7
6 3456 8
Update based on OPs comment below (20190508)
I spent a couple days banging my head on how to handle groups of three or more; it was surprisingly difficult but what I came up with handles bigger clusters and is way better than my first answer. I updated the sample data to include bigger clusters.
Note that I include a UNIQUE constraint for the sort column - this creates a unique index. You don't need the constraint for this solution to work but, having an index on that column (clustered, nonclustered unique or just nonclustered) will improve the performance dramatically.
--Sample Data
DECLARE #table TABLE (Account INT, Sort INT UNIQUE);
INSERT #table
VALUES (3456,1),(3456,2),(9878,3),(5679,4),(3456,5),(3456,6),(1295,7),(1295,8),(1295,9),(1295,10),(3456,11);
-- Better solution
WITH Groups AS
(
SELECT t.*, Grouper =
CASE t.Account WHEN LAG(t.Account,1,t.Account) OVER (ORDER BY t.Sort) THEN 0 ELSE 1 END
FROM #table AS t
)
SELECT [Group] = SUM(sg.Grouper) OVER (ORDER BY sg.Sort)+1, sg.Account, sg.Sort
FROM Groups AS sg;
Results:
Group Account Sort
----------- ----------- -----------
1 3456 1
1 3456 2
2 9878 3
3 5679 4
4 3456 5
4 3456 6
5 1295 7
5 1295 8
5 1295 9
5 1295 10
6 3456 11

Take new columns as output table - KDB

I have a query which returns results of data, which runs on a frequent basis. The new table will contain results of the old table as well but I only want to take whatever is in new in the most recent run of the new table and send that as an email. I already have the line for the email and trade data but just need a way to be able to:
display the results of the new table to be emailed
save the complete results of the new table to be used in the next run of the query
e.g.
Old results: tbl
| idx | name | age |
| 0 | Tom | 30 |
| 1 | Jerry | 25 |
| 2 | Bob | 30 |
| 3 | Ken | 45 |
New results: tbl
| idx | name | age |
| 0 | Tom | 30 |
| 1 | Jerry | 25 |
| 2 | Bob | 30 |
| 3 | Ken | 45 |
| 4 | Sam | 40 |
output required:
| 4 | Sam | 40 |
and then save the New results to be used in the next run
Thanks! :)
If the only changes between runs is that records are being appended onto the new table, you could just keep a variable denoting the last index seen and then select only those rows where idx is larger than that.
If the indexes are always increasing, this could be achieved using a query like
lastidx:exec last idx from tbl
select from tbl where idx>lastidx
If the idx values don't always increase monotonically, you could keep a count of the number of rows instead and only
lasti:count tbl
select from tbl where i>=lasti
This doesn't require saving the whole table in memory for use in the next iteration.
E.g to start with the old table had 4 rows so lasti = 4
q)tbl
idx name age
-------------
0 Tom 30
1 Jerry 25
2 Bob 30
3 Ken 45
q)lasti
4
The new table comes in and running the command selects the new row
q)tbl
idx name age
-------------
0 Tom 30
1 Jerry 25
2 Bob 30
3 Ken 45
4 Sam 40
q)select from tbl where i>lasti
idx name age
------------
4 Sam 40
lasti can then be updated to reflect the new count
q)lasti:count tbl
q)lasti
5
One way you can get this done, assuming the idx is the unique key :
q)old:([] idx:0 1 2 3; name:`T`J`B`K; age: 30 25 30 45)
q)new:old,enlist `idx`name`age!(4; `S;40) //new output from your query
q)out:()
q)if[0<count i:new[`idx] except old[`idx] ; out:new i ; old:new]
q)out
idx name age
------------
4 S 40
Another way, if your new records are always added to the last of old records:
q)old:([] idx:0 1 2 3; name:`T`J`B`K; age: 30 25 30 45)
q)i:count old
q)new:old,enlist `idx`name`age!(4; `S;40) //new output from your query
q)out:()
q)if[i<c:count new ; out:(i-c)#new ; old:new; i:c]
q)out
idx name age
------------
4 S 40

Tableau calculated field summing up the values

I have a table like this
----------------------------------------------
ID Name Value |
---------------------------------------------|
1 Bob 4 |
2 Mary 3 |
3 Bob 5 |
4 Jane 3 |
5 Jane 1 |
----------------------------------------------
Is there any ways to do out a calculated field where if the name is "Bob" , it'll sum up all the values that have the name "Bob"?
Thanks in advance!
If Name = “Bob” then Value end

Using DECLARE to create column numbers

When running a query I need column numbers to be applied to each row so that when I use the query to create a report in SSRS I can tell the report which data to put in which column. Example:
Case 1 | Jane Doe | Col 1
Case 1 | John Doe | Col 2
Case 2 | Sally Smith | Col 1 (only name in case)
My current query uses:
DECLARE #NumOfCols int=2;
And then this to tell it how to separate the columns:
(row_number() over (partition by case_num order by child_first) + #NumOfCols - 1)% #NumOfCols + 1 as DisplayCol
The problem is, when I run the query, even if a result only has one name (so only one column is needed) my data is getting duplicated. It seems like it is making it a mandatory column 1 and column 2 even if there is no data for a second column. Like this:
Case 1 | Jane Doe | Col 1
Case 1 | John Doe | Col 2
Case 2 | Sally Smith | Col 1 (only name in case)
Case 2 | Sally Smith | Col 2 (duplicating)
I hope this makes sense. Any ideas on how to eliminate duplicating the data?

Max consecutive years for each customer in Tableau

I am trying to find for each customer the Max consecutive years he buys something. I tried to create a calculated field but to no avail.
I created two calculated fields
Consecutive: if max([Count])>0 then previous_value(0)+1+index()-index() else 0 end
max: window_max([Consecutive])
My data looks something like:
Year | Customer | Count
1996 | a | 2
1996 | b | 1
1997 | a | 1
1997 | b | 2
1998 | b | 1
So the result would be
a:2
b:3
Use nested table calcs.
The first calc, call it running_good_years, is a running count of consecutive years with sales.
If count(Sales) = 0 then 0 else previous_value(0) + 1 end
The second just returns the max
Window_max(running_good_years)
With table calcs, defining the partitioning and addressing is critical. Partition by Customer, Address by year