count values over time interval

count values over time interval - tsql

Please could someone assist with a SQL View.
I have a table called LoginActivity
CREATE TABLE [dbo].[LoginActivity](
[LoAc_ActivityID] [int] IDENTITY(1,1) NOT NULL,
[LoAc_UserID] [int] NULL,
[LoAc_Login] [datetime] NULL,
[LoAc_logout] [datetime] NULL,
[LoAc_Duration] [numeric](24, 6) NULL,
)
Which records login/logout times with the user id and returns data such as below.
779 1 2017-11-03 08:07:41.000 2017-11-03 08:09:14.000 1.000000
780 1 2017-11-04 08:09:19.000 2017-11-04 08:27:19.000 17.000000
781 2 2017-11-04 08:27:22.000 2017-11-04 08:35:11.000 7.000000
782 3 2017-11-04 08:35:18.000 2017-11-04 08:58:12.000 19.000000
783 4 2017-11-04 08:35:22.000 2017-11-04 08:58:12.000 19.000000
I need to create a view that counts the number of users where LoAc_Login was within the past 1 hour, 2 hours, 3 hours and finally 4 hours. I would like to present the data like below.
WithinHour 1HourAgo 2HoursAgo 3HoursAgo 4HoursAgo
2 3 5 0 2
Thanks.

Something like;
SELECT
sum(CASE WHEN DATEDIFF(H,getdate(),LoAc_Login) >= 0 AND DATEDIFF(H,getdate(),LoAc_Login) < 1 THEN 1 ELSE 0 END AS WithinHour,
sum(CASE WHEN DATEDIFF(H,getdate(),LoAc_Login) >= 1 AND DATEDIFF(H,getdate(),LoAc_Login) < 2 THEN 1 ELSE 0 END AS OneHourAgo,
etc.
FROM LoginActivity
Not Tested, syntax may be off as typing quickly, but hopefully you get the idea?

Related

Pandas's `pct_change()` equivalent in postgres

Let's assume I have a table like this:
id
date
value
1
2021-04-05
100
1
2021-04-04
50
1
2021-04-03
25
1
2021-04-02
5
2
2021-04-05
80
2
2021-04-04
20
2
2021-04-03
15
2
2021-04-02
10
I need to add another column that groups by id and calculates a day-over-day percent change from the value with the date before it. So for this example it would look like this:
id
date
value
pct_change
1
2021-04-05
100
100
1
2021-04-04
50
100
1
2021-04-03
25
400
1
2021-04-02
5
NaN
2
2021-04-05
80
300
2
2021-04-04
20
33.33
2
2021-04-03
15
50
2
2021-04-02
10
NaN
In python this would be easy, I could do something like this:
df['pct_change'] = df.groupby('id').value.pct_change() * 100
But if I wanted to do this in the Postgres database call, I'd suddenly implode with stupidity... does anybody know how to do this?

Maybe something like this?
SELECT
id,
date,
value,
(value - prev_value) / prev_value AS pct_change
FROM
(
SELECT
id,
date,
value,
LAG(value) OVER (PARTITION BY id ORDER BY date
ROWS BETWEEN 1 PRECEDING AND
CURRENT ROW) AS prev_value
FROM
your_table
)
ORDER BY date, id

Tableau - How sum values with 12 last months

in Tableau I have a table with this form :
rows: Score.
columns:MY(month), sum(good), sum(bad).
This is the information when I use: month 201811
201611 201612 ... 201801 ... 201811 TOTAL
Score Good Bad Good Bad Good Bad ... Good Bad
1 3 0 7 3 6 3 2 1
2 5 1 1 1 1 1 4 4
3 10 3 2 1 0 3 3 3
I want to use a filter with 'Month' column ,when I filter month=201811, show since 201611 to 201711 (last 12 months) in Total column(Totals in Bad and Good columns) by Score.
Filter: 201811
Formula: sum(Good) and sum(Bad) since '201611' to '201711'
I trying "IF DATEDIFF('month', [Good], today()) <=12" but doesn't work.
Thanks for your help.

Try this:
If DATEDIFF("month",TODAY(),[Your Date Field],"Sunday") <= -12
then [Your Date Field] else null end
Then use that as your date column. The "Sunday" is supposed to be whatever you consider the starting day of the week. I wasn't sure what your date field is named so I named it "[Your Date Field]"

PostgreSQL window function & difference between dates

Suppose I have data formatted in the following way (FYI, total row count is over 30K):
customer_id order_date order_rank
A 2017-02-19 1
A 2017-02-24 2
A 2017-03-31 3
A 2017-07-03 4
A 2017-08-10 5
B 2016-04-24 1
B 2016-04-30 2
C 2016-07-18 1
C 2016-09-01 2
C 2016-09-13 3
I need a 4th column, let's call it days_since_last_order which, in the case where order_rank = 1 then 0 else calculate the number of days since the previous order (with rank n-1).
So, the above would return:
customer_id order_date order_rank days_since_last_order
A 2017-02-19 1 0
A 2017-02-24 2 5
A 2017-03-31 3 35
A 2017-07-03 4 94
A 2017-08-10 5 38
B 2016-04-24 1 0
B 2016-04-30 2 6
C 2016-07-18 1 79
C 2016-09-01 2 45
C 2016-09-13 3 12
Is there an easier way to calculate the above with a window function (or similar) rather than join the entire dataset against itself (eg. on A.order_rank = B.order_rank - 1) and doing the calc?
Thanks!

use the lag window function
SELECT
customer_id
, order_date
, order_rank
, COALESCE(
DATE(order_date)
- DATE(LAG(order_date) OVER (PARTITION BY customer_id ORDER BY order_date))
, 0)
FROM <table_name>

select top n posts by score count

I am trying to get the top n users by post using hive. The table looks like this.
Score User
10 1
20 2
50 1
20 2
0 3
3 1
40 2
...
I want to generate output which shows like
Rows Users
3 1
3 2
1 3
here is my query
SELECT * FROM (SELECT COUNT(score) as Score, UserID AS COUNT FROM A WHERE UserID IS NOT NULL GROUP BY UserID,score LIMIT 10) A;
The output I get is something like this
0 0
0 1
0 2
0 3
0 4
0 5
0 6
0 7
0 8
0 9
Can someone guide me where I am going wrong.

SELECT COUNT(score) as Score, UserID FROM A WHERE UserID IS NOT NULL GROUP BY UserID LIMIT 10

Select data for 15 minute windows - PostgreSQL

Right so I have a table such as this in PostgreSQL:
timestamp duration
2013-04-03 15:44:58 4
2013-04-03 15:56:12 2
2013-04-03 16:13:17 9
2013-04-03 16:16:30 3
2013-04-03 16:29:52 1
2013-04-03 16:38:25 1
2013-04-03 16:41:37 9
2013-04-03 16:44:49 1
2013-04-03 17:01:07 9
2013-04-03 17:07:48 1
2013-04-03 17:11:00 2
2013-04-03 17:11:16 2
2013-04-03 17:15:17 1
2013-04-03 17:16:53 4
2013-04-03 17:20:37 9
2013-04-03 17:20:53 3
2013-04-03 17:25:48 3
2013-04-03 17:29:26 1
2013-04-03 17:32:38 9
2013-04-03 17:36:55 4
And I would like to get the following output:
timestampwindowstart = 2013-04-03 15:44:58
duration count
1 0
2 1
3 0
4 1
9 0
timestampwindowstart = 2013-04-03 15:59:58
duration count
1 0
2 0
3 0
4 0
9 1
timestampwindowstart = 2013-04-03 16:14:58
duration count
1 1
2 0
3 1
4 0
9 0
timestampwindowstart = 2013-04-03 16:29:58
duration count
1 2
2 0
3 0
4 0
9 1
etc...
So basically it cycles through the timestamps in 15 minute windows and outputs the distinct duration values along with their frequency (count). The timestampwindowstart value is the earliest timestamp for the window (i.e timestampwindowfinish = timestampwindowstart + 15 minutes)
This is so I can then plot the 15 minute interval histograms...
I have tried reading up but it is a bit complicated for me to get my head around and I don't have much time...
Thanks for any help!

Quick and dirty way: http://sqlfiddle.com/#!1/bd2f6/21 I named my column tstamp instead of your timestamp
with t as (
select
generate_series(mitstamp,matstamp,'15 minutes') as int,
duration
from
(select min(tstamp) mitstamp, max(tstamp) as matstamp from tmp) a,
(select duration from tmp group by duration) b
)
select
int as timestampwindowstart,
t.duration,
count(tmp.duration)
from
t
left join tmp on
(tmp.tstamp >= t.int and
tmp.tstamp < (t.int + interval '15 minutes') and
t.duration = tmp.duration)
group by
int,
t.duration
order by
int,
t.duration
Brief explanation:
Calculate minimum and maximum timestamp
Generate 15 minutes intervals between minimum and maximum
Cross join results with unique values of duration
Left join original data (left join is important, because this will keep all possible combination in output and there will be null where duration does not exists for given interval.
Aggregate data. count(null)=0
In case you have more tables and the algorithm should be applied on their union. Suppose we have three tables tmp1, tmp2, tmp3 all with columns tstamp and duration. The we can extend the previous solution:
with
tmpout as (
select * from tmp1 union all
select * from tmp2 union all
select * from tmp3
)
,t as (
select
generate_series(mitstamp,matstamp,'15 minutes') as int,
duration
from
(select min(tstamp) mitstamp, max(tstamp) as matstamp from tmpout) a,
(select duration from tmpout group by duration) b
)
select
int as timestampwindowstart,
t.duration,
count(tmp.duration)
from
t
left join tmpout on
(tmp.tstamp >= t.int and
tmp.tstamp < (t.int + interval '15 minutes') and
t.duration = tmp.duration)
group by
int,
t.duration
order by
int,
t.duration
You should really know with clause in PostgreSQL. It is invaluable concept for any data analysis in PostgreSQL.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

count values over time interval - tsql

Related

Pandas's `pct_change()` equivalent in postgres

Tableau - How sum values with 12 last months

PostgreSQL window function & difference between dates

select top n posts by score count

Select data for 15 minute windows - PostgreSQL

Categories

Resources