TSQL - Max per group? - tsql

I have a table that looks like this:
GroupID UserID Value
1 1 10
1 2 20
1 3 30
1 4 40
1 5 45
1 6 49
1 7 80
1 8 90
2 1 2
2 2 24
2 3 34
2 4 48
2 5 56
3 1 etc.
3 2
3 3
3 4
4 1
4 2
4 3
I am trying to write a LEAD function that will give me the midpoint between each value. To do this I have written the following:
SELECT
[GroupID]
, [UserID]+0.5
, (LEAD ([Value], 1) OVER (ORDER BY GroupID, UserID) + [Value])/2 as [Value]
from dbo.myTable
The problem with this function is that when it gets to the last User in the group, it gives me a bad value because it's taking the [Value] on the current row and the value from the next row.
What I want to do is stop it when it reaches the maximum UserID for each Group. In other words, when it gets to GroupID = 1 and UserID = 8, it should end and start at the next Group. I do not want a row that looks like this:
GroupID UserID Value
1 8.5 46
I could run a DELETE statement after I INSERT the rows into the original table, but I don't have anything to identify when a row is the "maximum" User for it's Group. Ideally, I would like to somehow tell the lead statement not to calculate it in the first place.

Related

KDB+: How to retrieve the rows immediately before and after a given row that conform to a specific logic?

Given the following table
time kind counter key1 value
----------------------------------------
1 1 1 1 1
2 0 1 1 2
3 0 1 2 3
5 0 1 1 4
5 1 2 2 5
6 0 2 3 6
7 0 2 2 7
8 1 3 3 8
9 1 4 3 9
How would one select the value in the first row
immediately after and immediately before each
row of kind 1 ordered by time where the key1
value is the same in both instances .i.e:
time value prevvalue nextvalue
---------------------------------------------
1 1 0n 2
5 5 3 7
8 8 6 0n
9 9 6 0n
Here are some of the things I have tried, though
to be honest I have no idea how to canonically achieve
something like this in q whereby the prior value has a
variable offset to the current row?
select
prev[value],
next[value],
by key1 where kind<>1
update 0N^prevval,0N^nextval from update prevval:prev value1,nextval:next value1 by key1 from table
Some advice or a pointer on how to achieve this would be great!
Thanks
I was able to use the following code to return a table meeting your requirements. If this is correct, the sample table you have provided is incorrect, otherwise I have misunderstood the question.
q)table:([] time:1 2 3 5 5 6 7 8 9;kind:1 0 0 0 1 0 0 1 1;counter:1 1 1 1 2 2 2 3 4;key1:1 1 2 1 2 3 2 3 3;value1:1 2 3 4 5 6 7 8 9)
q)tab2:update 0N^prevval,0N^nextval from update prevval:prev value1,nextval:next value1 by key1 from table
q)tab3:select from tab2 where kind=1
time value1 prevval nextval
---------------------------
1 1 2
5 5 3 7
8 8 6 9
9 9 8
The update statement in tab2:
update 0N^prevval,0N^nextval from update prevval:prev value1,nextval:next value1 by key1 from table
is simply adding 2 columns onto the original table with the previous and next values for each row. 0^ is filling the empty fields with nulls.
The select statement in tab3:
tab3:select from tab2 where kind=1
is filtering tab2 for rows where kind=1.
The final select statement:
select time,value1,prevval,nextval from tab3
is selecting the rows you want to be returned in the final result.
Hope this answers your question.
Thanks,
Caitlin

Recursive Cumulative Sum up to a certain value Postgres

I have my data that looks like this:
user_id touchpoint_number days_difference
1 1 5
1 2 20
1 3 25
1 4 10
2 1 2
2 2 30
2 3 4
I would like to create one more column that would create a cumulative sum of the days_difference, partitioned by user_id, but would reset whenever the value reaches 30 and starts counting from 0. I have been trying to do it, but I couldn't figure it out how to do it in PostgreSQL, because it has to be recursive.
The outcome I would like to have would be something like:
user_id touchpoint_number days_difference cum_sum_upto30
1 1 5 5
1 2 20 25
1 3 25 0 --- new count all over again
1 4 10 10
2 1 2 2
2 2 30 0 --- new count all over again
2 3 4 4
Do you have any cool ideas how this could be done?
This should do what you want:
with cte as (
select t.a, t.b, t.c, t.c as sumc
from t
where b = 1
union all
select t.a, t.b, t.c,
(case when t.c + cte.sumc > 30 then 0 else t.c + cte.sumc end)
from t join
cte
on t.b = cte.b + 1 and t.a = cte.a
)
select *
from cte
order by a, b;
Here is a rextester.

PostgreSQL window function & difference between dates

Suppose I have data formatted in the following way (FYI, total row count is over 30K):
customer_id order_date order_rank
A 2017-02-19 1
A 2017-02-24 2
A 2017-03-31 3
A 2017-07-03 4
A 2017-08-10 5
B 2016-04-24 1
B 2016-04-30 2
C 2016-07-18 1
C 2016-09-01 2
C 2016-09-13 3
I need a 4th column, let's call it days_since_last_order which, in the case where order_rank = 1 then 0 else calculate the number of days since the previous order (with rank n-1).
So, the above would return:
customer_id order_date order_rank days_since_last_order
A 2017-02-19 1 0
A 2017-02-24 2 5
A 2017-03-31 3 35
A 2017-07-03 4 94
A 2017-08-10 5 38
B 2016-04-24 1 0
B 2016-04-30 2 6
C 2016-07-18 1 79
C 2016-09-01 2 45
C 2016-09-13 3 12
Is there an easier way to calculate the above with a window function (or similar) rather than join the entire dataset against itself (eg. on A.order_rank = B.order_rank - 1) and doing the calc?
Thanks!
use the lag window function
SELECT
customer_id
, order_date
, order_rank
, COALESCE(
DATE(order_date)
- DATE(LAG(order_date) OVER (PARTITION BY customer_id ORDER BY order_date))
, 0)
FROM <table_name>

select top n posts by score count

I am trying to get the top n users by post using hive. The table looks like this.
Score User
10 1
20 2
50 1
20 2
0 3
3 1
40 2
...
I want to generate output which shows like
Rows Users
3 1
3 2
1 3
here is my query
SELECT * FROM (SELECT COUNT(score) as Score, UserID AS COUNT FROM A WHERE UserID IS NOT NULL GROUP BY UserID,score LIMIT 10) A;
The output I get is something like this
0 0
0 1
0 2
0 3
0 4
0 5
0 6
0 7
0 8
0 9
Can someone guide me where I am going wrong.
SELECT COUNT(score) as Score, UserID FROM A WHERE UserID IS NOT NULL GROUP BY UserID LIMIT 10

tsql sum data and include default values for missing data

I would like a query that will show a sum of columns with a default value for missing data. For example assume I have a table as follows:
type_lookup:
id name
-----------
1 self
2 manager
3 peer
And a table as follows
data:
id type_lookup_id value
--------------------------
1 1 1
2 1 4
3 2 9
4 2 1
5 2 9
6 1 5
7 2 6
8 1 2
9 1 1
After running a query I would like a result set as follows:
type_lookup_id value
----------------------
1 13
2 25
3 0
I would like all rows in type_lookup table to be included in the result set - even if they don't appear in the data table.
It's a bit hard to read your data layout, but something like the following should do the trick:
SELECT tl.type_lookup_id, tl.name, sum(da.type_lookup_id) how_much
from type_lookup tl
left outer join data da
on da.type_lookup_id = tl.type_lookup_id
group by tl.type_lookup_id, tl.name
order by tl.type_lookup_id
[EDIT]
...subsequently edited by changing count() to sum().