Fill back historical missing dates in Postgres - postgresql

I have a requirement to find the missing days from the start date & end date of a particular ID and fill the missed date by taking the average of the preceding and succeeding rows .
For example:
In the input dataset "2023-01-23" this date is missing and I would like to add this date but the Value column should contain the AVERAGE of "2023-01-24" & "2023-01-22" this dates.
Any idea how i can get this done in Postgres (V 12.3)?
Expected output
ID. Value Date
8445 0.0000 "2023-01-25"
8445 0.0000 "2023-01-24"
8445 0.0000 "2023-01-23". --Value is the average of above and below row
8445 0.0000 "2023-01-22"
8445 0.0000 "2023-01-21" --Value is the average of above and below row
8445 0.0000 "2023-01-20"
Input Dataset
ID. Value Date
8445 0.0000 "2023-01-25"
8445 0.0000 "2023-01-24"
8445 0.0000 "2023-01-22"
8445 0.0000 "2023-01-20"
8445 0.0000 "2023-01-19"
8445 0.0000 "2022-12-29"
8445 0.0000 "2022-12-27"
8445 0.0000 "2022-12-26"
8445 0.0000 "2022-12-25"
8445 0.0000 "2022-12-23"
8445 0.0000 "2022-12-22"
8445 0.0000 "2022-12-21"
8445 0.0000 "2022-12-20"
8445 0.0000 "2022-12-18"
8445 0.0000 "2022-12-16"
8445 0.0000 "2022-12-15"
8445 0.0000 "2022-12-14"
8445 0.0000 "2022-12-13"
8445 0.0000 "2022-12-11"
8445 0.0000 "2022-12-10"
8445 0.0000 "2022-12-09"
8445 111.0000 "2022-12-07"
8445 624.0000 "2022-12-06"
8445 1010.0000 "2022-12-05"
8445 1305.0000 "2022-12-04"
8445 1479.0000 "2022-12-02"
8445 1708.0000 "2022-12-01"
8445 1911.0000 "2022-11-30"
8445 2264.0000 "2022-11-29"
8445 2675.0000 "2022-11-28"
8445 3347.0000 "2022-11-27"
8445 3895.0000 "2022-11-26"
8445 7873.0000 "2022-11-24"
8445 8486.0000 "2022-11-22"
8445 8725.0000 "2022-11-20"
8445 9072.0000 "2022-11-19"
8445 9356.0000 "2022-11-18"
8445 9986.0000 "2022-11-17"
8445 10178.0000 "2022-11-16"
8445 10507.0000 "2022-11-15"
8445 10771.0000 "2022-11-14"
8445 11096.0000 "2022-11-13"
8445 11452.0000 "2022-11-12"
8445 11677.0000 "2022-11-11"
8445 11966.0000 "2022-11-10"
8445 12229.0000 "2022-11-09"
8445 13128.0000 "2022-11-08"
8445 13488.0000 "2022-11-07"
8445 14406.0000 "2022-11-05"
8445 14737.0000 "2022-11-03"
8445 15045.0000 "2022-11-02"
8445 15360.0000 "2022-11-01"
8445 15822.0000 "2022-10-31"
8445 16166.0000 "2022-10-30"
8445 16477.0000 "2022-10-29"
8445 16697.0000 "2022-10-28"
8445 16973.0000 "2022-10-27"
8445 17285.0000 "2022-10-26"
8445 17585.0000 "2022-10-25"
8445 17879.0000 "2022-10-24"
8445 18253.0000 "2022-10-23"
8445 18614.0000 "2022-10-22"
8445 18829.0000 "2022-10-21"
8445 19169.0000 "2022-10-20"
8445 19446.0000 "2022-10-19"
8445 11286.0000 "2022-10-18"
8445 11650.0000 "2022-10-17"
8445 2975.0000 "2022-10-16"
8445 3379.0000 "2022-10-15"
8445 1263.0000 "2022-10-14"
8445 267.0000 "2022-10-12"
8445 944.0000 "2022-10-10"
8445 1254.0000 "2022-10-09"
8445 1459.0000 "2022-10-08"
8445 156.0000 "2022-10-07"
8445 469.0000 "2022-10-06"
8445 1076.0000 "2022-10-04"
8445 1447.0000 "2022-10-03"
8445 4856.0000 "2022-09-22"
8445 6019.0000 "2022-09-20"
8445 7027.0000 "2022-09-17"
8445 5248.0000 "2022-09-16"
8445 0.0000 "2022-09-14"

Related

time bucketing with cumsum condition

Hello Fellow Kdb Mortals :D
Stuck on a pretty weird problem here. I have a table like
time col is xbar-ed to 5-mins
time code name count
--------------------------------
00:00 SPY S&P.. 15
00:00 QQQ ... 88
00:00 IWM ... 100
00:00 XLE ... 80
00:05 QQQ ... 20
00:05 SPY ... 75
00:10 QQQ ... 22
00:10 XLE ... 10
00:15 SPY ... 23
.....
.....
23:40 XLE ... 11
23:50 SPY ... 16
23:55 IWM ... 100
23:55 QQQ ... 10
What I want to be returned is a table like (from asc time)
code name stime etime cumcount
------------------------------------------------
SPY S&P... 00:00 00:15 123 <-- 15+75+23
QQQ ... 00:00 00:05 108 <-- 88+20
IWM ... 00:00 00:00 100 <-- 100
XLE ... 00:00 23:40 101 <-- 80+10+11
Notice there is a condition on this time bucket, where the first cumulative sum by (code,name) is greater than or equal to 100.
I can also generate another table from bottoms up (desc time)
code name stime etime cumcount
------------------------------------------------
SPY ... 23:50 20:10 103
QQQ ... 23:55 21:45 118
IWM ... 23:55 23:55 100
XLE ... 23:40 00:00 101 <-- 11+10+80
I have been at this for a couple of hours, but can't get this working. Basic select and sums don't get me anywhere. I could use loops but thought I should check in here first before I go down that lane.
Any help is appreciated :D
Assuming you have a table sorted ascending on time i.e.:
`time xasc `t
Something like this could work
q)t1:update cumcount:sums cnt,stime:first time by code,name from t
q)select code,name,stime,etime:time, cumcount from t1 where cumcount>=100,i=(first;i) fby ([]code;name)
Notice that I have relabelled count as cnt to prevent a clash with the count function that already exists in the q language.
So first you calculate your cumulative count in the update statement.
Then select from the resulting table in such a way that first you pull out only those records where the count is > 100, then you use fby to filter down on this again to pull out the first record for each distinct (code;name) pair.
In this example stime is the time of the first entry for each (code;name) pair and etime will be time when it first exceeds 100.
I prefer Seans solution, but for the sake of alternative:
q)t:update name:string lower code from([]time:"u"$0 0 0 0 5 5 10 10 15 1420 1430 1435 1435;code:`SPY`QQQ`IWM`XLE 0 1 2 3 1 0 1 3 0 3 0 2 1;cnt:15 88 100 80 20 75 22 10 23 11 16 100 10);
q)exec{x x[`cumcnt]binr 100}[([]stime:first time;etime:time;cumcnt:sums cnt)]by code,name from t
code name | stime etime cumcnt
----------| ------------------
IWM "iwm"| 00:00 00:00 100
QQQ "qqq"| 00:00 00:05 108
SPY "spy"| 00:00 00:15 113
XLE "xle"| 00:00 23:40 101
Summing from the bottom would be:
q)exec{x x[`cumcnt]binr 100}[([]stime:last time;etime:reverse time;cumcnt:sums reverse cnt)]by code,name from t
code name | stime etime cumcnt
----------| ------------------
IWM "iwm"| 23:55 23:55 100
QQQ "qqq"| 23:55 00:00 140
SPY "spy"| 23:50 00:05 114
XLE "xle"| 23:40 00:00 101

Update column in KDB table based on condition of multiple columns in table

When all columns (d1;d4) = "NC" for each pair I am looking to update adjustment to "NO ADJ"
In the below case only the first row satisfies the condition and should be updated
t:flip (`pair`d1`d2`d3`d4`vol`adjustment)!(`pair1`pair2`pair3`pair4;("NC";"3/-0.09";"1/-0.09";"NC");("NC";"4/-0.09";"-1/-0.09";"NC");("NC";"2/-0.09";"1/0.09";"2/0.3");("NC";"4/-0.09";"0/-0.09";"NC");0 89.68 78.3 0;("0.1bp";"0.1bp";"0.1bp";"0.1bp"))
Thanks in advance!
While Eliot's answer is perfectly correct for the question at face value. If you have many columns of name like "d*" then you can generalise to this form and it will check all columns (d1,d2,...,dn) for "NC".
![`t;{(like;x;"NC")}each ((cols t) where (cols t) like "d*");0b;(enlist `adjustment)!enlist (enlist;"NO ADJ")]
What this update does is take each column like "d*" and if all columns are "NC" it will change adjustment to "NO ADJ" as requested.
This update statement can be modified as necessary for different column groups.
Edit: As Dunny's comment suggests, having `t at ![`t;... will do the change in place. If you would like to test and see what it looks like without changing the table in memory then changing `t to t would work for that purpose
This update statement will do what you need:
update adjustment:enlist"NO ADJ" from t where all(d1;d2;d3;d4)~\:\:"NC"
Two each-lefts are needed since (d1;...;d4) is a list of list of strings. The result of this comparison is
q)(t`d1;t`d2;t`d3;t`d4)~\:\:"NC"
1001b
1001b
1000b
1001b
then applying the all keyword will "squash" these booleans into one list:
q)all(t`d1;t`d2;t`d3;t`d4)~\:\:"NC"
1000b
Finally, you need to use enlist on the string "NO ADJ", otherwise there will be a length error due to kdb trying to update the adjustement column pairwise for each character in the string "NO ADJ".
This should provide the solution for the data you provided. You need to use each left to apply the where clause across all columns. This will leave you with a list of 4 booleans of where the condition is met within each row. This is why we then use the all keyword aswell to ensure we only update the result which matches every column.
q)update adjustment:enlist"NO ADJ" from t where all (d1;d2;d3;d4) like\:"NC"
pair d1 d2 d3 d4 vol adjustment
---------------------------------------------------------------
pair1 "NC" "NC" "NC" "NC" 0 "NO ADJ"
pair2 "3/-0.09" "4/-0.09" "2/-0.09" "4/-0.09" 89.68 "0.1bp"
pair3 "1/-0.09" "-1/-0.09" "1/0.09" "0/-0.09" 78.3 "0.1bp"
pair4 "NC" "NC" "2/0.3" "NC" 0 "0.1bp"
One way of doing this
q)t
pair d1 d2 d3 d4 vol adjustment
---------------------------------------------------------------
pair1 "NC" "NC" "NC" "NC" 0 "0.1bp"
pair2 "3/-0.09" "4/-0.09" "2/-0.09" "4/-0.09" 89.68 "0.1bp"
pair3 "1/-0.09" "-1/-0.09" "1/0.09" "0/-0.09" 78.3 "0.1bp"
pair4 "NC" "NC" "2/0.3" "NC" 0 "0.1bp"
q)update adjustment:enlist"NO ADJ" from t where([]d1;d2;d3;d4)~\:`d1`d2`d3`d4!4#enlist"NC"
pair d1 d2 d3 d4 vol adjustment
---------------------------------------------------------------
pair1 "NC" "NC" "NC" "NC" 0 "NO ADJ"
pair2 "3/-0.09" "4/-0.09" "2/-0.09" "4/-0.09" 89.68 "0.1bp"
pair3 "1/-0.09" "-1/-0.09" "1/0.09" "0/-0.09" 78.3 "0.1bp"
pair4 "NC" "NC" "2/0.3" "NC" 0 "0.1bp"
This works by first creating an intermediary dictionary
q)`d1`d2`d3`d4!4#enlist"NC"
d1| "NC"
d2| "NC"
d3| "NC"
d4| "NC"
and then checking is each element of the table ([]d1;d2;d3;d4) exactly equal to this dictionary
e.g similar to constructs like
1 2 3~\:1
In this case we're using the fact that all a table is is a list of dictionaries and, with some lateral thinking, we can often find ways to exploit this fact
So far all of currently suggested solutions fail in the case of multiple replacements
q)t:100000 # flip (`pair`d1`d2`d3`d4`vol`adjustment)!(`pair1`pair2`pair3`pair4;("NC";"3/-0.09";"1/-0.09";"NC");("NC";"4/-0.09";"-1/-0.09";"NC");("NC";"2/-0.09";"1/0.09";"2/0.3");("NC";"4/-0.09";"0/-0.09";"NC");0 89.68 78.3 0;("0.1bp";"0.1bp";"0.1bp";"0.1bp"))
q)update adjustment:enlist"NO ADJ" from t where all(d1;d2;d3;d4)~\:\:"NC"
'length
[0] update adjustment:enlist"NO ADJ" from t where all(d1;d2;d3;d4)~\:\:"NC"
^
q)update adjustment:enlist"NO ADJ" from t where all (d1;d2;d3;d4) like\:"NC"
'length
[0] update adjustment:enlist"NO ADJ" from t where all (d1;d2;d3;d4) like\:"NC"
q)![`t;{(like;x;"NC")}each ((cols t) where (cols t) like "d*");0b;(enlist `adjustment)!enlist (enlist;"NO ADJ")]
'length
[0] ![`t;{(like;x;"NC")}each ((cols t) where (cols t) like "d*");0b;(enlist `adjustment)!enlist (enlist;"NO ADJ")]
^
This is because there is no easy way to determine the number of "NO ADJ" strings you need to substitute in.
'length
[0] update adjustment:enlist "adfjkl" from t where i in 1 2 3
^
q)update adjustment:3#enlist "adfjkl" from t where i in 1 2 3
pair d1 d2 d3 d4 vol adjustment
---------------------------------------------------------------
pair1 "NC" "NC" "NC" "NC" 0 "0.1bp"
pair2 "3/-0.09" "4/-0.09" "2/-0.09" "4/-0.09" 89.68 "adfjkl"
pair3 "1/-0.09" "-1/-0.09" "1/0.09" "0/-0.09" 78.3 "adfjkl"
pair4 "NC" "NC" "2/0.3" "NC" 0 "adfjkl"
pair1 "NC" "NC" "NC" "NC" 0 "0.1bp"
The best way to handle this type of vector replacment is through a vector conditional
q)update adjustment:?[&/[{"NC" ~/:x} each (d1;d2;d3;d4)];(count adjustment)#enlist "NO ADJ";adjustment] from t
pair d1 d2 d3 d4 vol adjustment
---------------------------------------------------------------
pair1 "NC" "NC" "NC" "NC" 0 "NO ADJ"
pair2 "3/-0.09" "4/-0.09" "2/-0.09" "4/-0.09" 89.68 "0.1bp"
pair3 "1/-0.09" "-1/-0.09" "1/0.09" "0/-0.09" 78.3 "0.1bp"
pair4 "NC" "NC" "2/0.3" "NC" 0 "0.1bp"
pair1 "NC" "NC" "NC" "NC" 0 "NO ADJ"
The performance of this will also generally be excellent.
Examining my conditional
?[&/[{"NC" ~/:x} each (d1;d2;d3;d4)];(count adjustment)#enlist "NO ADJ";adjustment]
In my where clause, {"NC" ~/:x} each (d1;d2;d3;d4) produces 4 vector boolean lists, I then collapse them with & (and) and the converge operation, to get where all conditions are true.
The other two components are my replacement vectors, they must be of equal length. This condition avoids the pitfalls of the other two attempts where you cannot know the number of replacements required.

Pandas convert integer into date

So I have a DataFrame object called 'df' and im trying to covert the 'timestamp' into a actual readable date.
timestamp
0 1465893683657
1 1457783741932
2 1459730006393
3 1459744745346
4 1459744756375
Ive tried
df['timestamp'] = pd.to_datetime(df['timestamp'],unit='s')
but this gives
timestamp
0 1970-01-01 00:24:25.893683657
1 1970-01-01 00:24:17.783741932
2 1970-01-01 00:24:19.730006393
3 1970-01-01 00:24:19.744745346
4 1970-01-01 00:24:19.744756375
which is clearly wrong since I know the date should be either this year or last year.
What am i doing wrong?
Solution with unit ms:
print (pd.to_datetime(df.timestamp, unit='ms'))
0 2016-06-14 08:41:23.657
1 2016-03-12 11:55:41.932
2 2016-04-04 00:33:26.393
3 2016-04-04 04:39:05.346
4 2016-04-04 04:39:16.375
Name: timestamp, dtype: datetime64[ns]
You can reduce the significant digits or better use #jezrael's unit ('ms').
In [133]: pd.to_datetime(df.timestamp // 10**3, unit='s')
Out[133]:
0 2016-06-14 08:41:23
1 2016-03-12 11:55:41
2 2016-04-04 00:33:26
3 2016-04-04 04:39:05
4 2016-04-04 04:39:16
Name: timestamp, dtype: datetime64[ns]

SQL-Server calculating how many instances on a day

I have a table that has ID, start date, and end date
Start_date End_Date ID
2016-03-01 06:30:00.000 2016-03-07 17:30:00.000 782772
2016-03-01 09:09:00.000 2016-03-07 10:16:00.000 782789
2016-03-01 11:17:00.000 2016-03-08 20:10:00.000 782882
2016-03-01 12:22:00.000 2016-03-21 19:40:00.000 782885
2016-03-01 13:15:00.000 2016-03-24 13:37:00.000 783000
2016-03-01 13:23:00.000 2016-03-07 19:15:00.000 782964
2016-03-01 13:55:00.000 2016-03-14 15:45:00.000 782972
2016-03-01 14:05:00.000 2016-03-07 20:32:00.000 783065
2016-03-01 18:06:00.000 2016-03-09 12:42:00.000 782988
2016-03-01 19:05:00.000 2016-04-01 20:00:00.000 782942
2016-03-01 19:15:00.000 2016-03-10 13:30:00.000 782940
2016-03-01 19:15:00.000 2016-03-07 18:00:00.000 783111
2016-03-01 20:10:00.000 2016-03-08 14:05:00.000 783019
2016-03-01 22:15:00.000 2016-03-24 12:46:00.000 782979
2016-03-02 08:00:00.000 2016-03-08 09:02:00.000 783222
2016-03-02 09:31:00.000 2016-03-15 09:16:00.000 783216
2016-03-02 11:04:00.000 2016-03-19 18:49:00.000 783301
2016-03-02 11:23:00.000 2016-03-14 19:49:00.000 783388
2016-03-02 11:46:00.000 2016-03-08 18:10:00.000 783368
2016-03-02 12:03:00.000 2016-03-23 08:44:00.000 783246
2016-03-02 12:23:00.000 2016-03-11 14:45:00.000 783302
2016-03-02 12:24:00.000 2016-03-12 15:30:00.000 783381
2016-03-02 12:30:00.000 2016-03-09 13:58:00.000 783268
2016-03-02 13:00:00.000 2016-03-10 11:30:00.000 783391
2016-03-02 13:35:00.000 2016-03-17 04:40:00.000 783309
2016-03-02 15:05:00.000 2016-04-04 11:52:00.000 783295
2016-03-02 15:08:00.000 2016-03-15 16:15:00.000 783305
2016-03-02 15:32:00.000 2016-03-08 14:20:00.000 783384
2016-03-02 16:49:00.000 2016-03-08 11:40:00.000 783367
2016-03-02 16:51:00.000 2016-03-11 16:00:00.000 783387
2016-03-02 18:00:00.000 2016-03-10 17:00:00.000 783242
2016-03-02 18:37:00.000 2016-03-25 13:30:00.000 783471
2016-03-02 18:45:00.000 2016-03-11 20:15:00.000 783498
2016-03-02 19:41:00.000 2016-03-17 12:34:00.000 783522
2016-03-02 20:08:00.000 2016-03-22 15:30:00.000 783405
2016-03-02 20:16:00.000 2016-03-31 12:30:00.000 783512
2016-03-02 21:45:00.000 2016-03-15 12:25:00.000 783407
2016-03-03 09:59:00.000 2016-03-09 15:00:00.000 783575
2016-03-03 11:18:00.000 2016-03-16 10:30:00.000 783570
2016-03-03 11:27:00.000 2016-03-15 17:28:00.000 783610
2016-03-03 11:36:00.000 2016-03-11 16:05:00.000 783572
2016-03-03 11:55:00.000 2016-03-10 20:15:00.000 783691
2016-03-03 12:10:00.000 2016-03-09 19:50:00.000 783702
2016-03-03 12:11:00.000 2016-03-15 14:08:00.000 783611
2016-03-03 12:55:00.000 2016-03-10 11:50:00.000 783571
2016-03-03 13:20:00.000 2016-04-20 20:37:00.000 783856
2016-03-03 14:08:00.000 2016-03-10 16:00:00.000 783728
2016-03-03 15:10:00.000 2016-03-10 17:00:00.000 783727
2016-03-03 15:20:00.000 2016-03-17 15:14:00.000 783768
2016-03-03 16:55:00.000 2016-03-09 14:09:00.000 783812
2016-03-03 17:00:00.000 2016-03-12 12:33:00.000 783978
2016-03-03 17:17:00.000 2016-03-10 16:00:00.000 783729
2016-03-03 17:42:00.000 2016-03-10 12:13:00.000 783975
2016-03-03 18:23:00.000 2016-03-09 17:00:00.000 783820
2016-03-03 18:31:00.000 2016-03-11 14:00:00.000 783891
2016-03-03 18:59:00.000 2016-03-10 17:00:00.000 783772
2016-03-03 19:48:00.000 2016-03-11 17:30:00.000 783724
2016-03-03 19:50:00.000 2016-03-09 18:00:00.000 783829
2016-03-03 20:48:00.000 2016-03-11 11:04:00.000 783745
2016-03-03 23:00:00.000 2016-03-13 10:59:00.000 783983
2016-03-04 02:50:00.000 2016-03-10 10:45:00.000 783991
2016-03-04 11:25:00.000 2016-03-14 14:50:00.000 784102
2016-03-04 11:28:00.000 2016-03-18 16:21:00.000 784011
2016-03-04 12:01:00.000 2016-03-11 13:20:00.000 784014
2016-03-04 12:15:00.000 2016-03-11 08:00:00.000 784004
2016-03-04 13:06:00.000 2016-03-11 15:00:00.000 784012
2016-03-04 13:37:00.000 2016-03-10 18:00:00.000 784200
2016-03-04 13:52:00.000 2016-04-22 21:30:00.000 784132
2016-03-04 14:11:00.000 2016-03-14 19:00:00.000 784136
2016-03-04 14:17:00.000 2016-03-11 16:52:00.000 784176
2016-03-04 14:42:00.000 2016-03-13 15:25:00.000 784070
2016-03-04 16:00:00.000 2016-03-11 17:30:00.000 784655
2016-03-04 16:30:00.000 2016-03-10 23:30:00.000 784652
2016-03-04 17:25:00.000 2016-03-22 14:00:00.000 784028
2016-03-04 19:50:00.000 2016-03-10 12:42:00.000 784303
2016-03-04 20:00:00.000 2016-03-10 16:13:00.000 784006
2016-03-04 21:30:00.000 2016-03-10 18:00:00.000 784042
2016-03-04 22:25:00.000 2016-04-02 19:40:00.000 784044
2016-03-04 22:40:00.000 2016-03-15 17:30:00.000 784276
2016-03-04 22:55:00.000 2016-03-13 13:50:00.000 784257
2016-03-04 23:10:00.000 2016-03-15 13:19:00.000 784266
2016-03-05 10:30:00.000 2016-03-11 07:45:00.000 784295
2016-03-05 10:30:00.000 2016-03-16 19:00:00.000 784305
2016-03-05 11:05:00.000 2016-03-17 15:26:00.000 784320
2016-03-05 12:30:00.000 2016-03-14 11:25:00.000 784368
2016-03-05 12:50:00.000 2016-03-17 13:27:00.000 784419
2016-03-05 13:01:00.000 2016-03-11 17:00:00.000 784298
2016-03-05 14:34:00.000 2016-03-11 19:00:00.000 784286
2016-03-05 14:45:00.000 2016-04-07 12:01:00.000 784316
2016-03-05 16:00:00.000 2016-03-24 17:00:00.000 784334
2016-03-05 19:22:00.000 2016-04-12 15:56:00.000 784335
2016-03-05 19:25:00.000 2016-03-14 11:59:00.000 784346
2016-03-05 19:25:00.000 2016-03-11 16:10:00.000 784399
2016-03-05 20:15:00.000 2016-03-15 16:20:00.000 784362
2016-03-05 20:26:00.000 2016-03-12 15:03:00.000 784347
2016-03-05 23:30:00.000 2016-03-17 16:45:00.000 784476
2016-03-06 11:57:00.000 2016-03-15 21:00:00.000 784524
2016-03-06 13:17:00.000 2016-03-29 08:09:00.000 784472
2016-03-06 14:07:00.000 2016-03-15 13:55:00.000 784497
2016-03-06 15:00:00.000 2016-03-16 12:24:00.000 784474
What i am looking to do is for ever day I get a count of how many entries occur
Example Output
date Instances
01/03/2016 113
02/03/2016 100
03/03/2016 106
04/03/2016 127
05/03/2016 81
06/03/2016 59
07/03/2016 115
08/03/2016 104
09/03/2016 92
10/03/2016 105
11/03/2016 128
12/03/2016 71
13/03/2016 64
14/03/2016 99
15/03/2016 106
16/03/2016 101
17/03/2016 96
18/03/2016 127
19/03/2016 75
20/03/2016 62
21/03/2016 93
22/03/2016 109
23/03/2016 102
24/03/2016 104
25/03/2016 85
26/03/2016 87
27/03/2016 72
28/03/2016 61
29/03/2016 86
30/03/2016 90
31/03/2016 122
This is the query i am using is
with [dates] as (
select convert(datetime, '2016-01-01') as [date] --start
union all
select dateadd(day, 1, [date])
from [dates]
where [date] < GETDATE() --end
)
select [date]
,Sum (Case when [date] between ws._start_dttm and Case when Cast(ws.End_DTTM as date) is null then [date]
else Cast(ws._End_DTTM as date) end then 1 else 0 end)
from [dates]
Join [STAYS] ws on Case when Cast(ws.End_DTTM as date) is null then GETDATE()-1
else Cast(ws.End_DTTM as date) end = dates.date
where END_DTTM between '2016-01-01' and GETDATE()
Group BY date
Order by [date]
option (maxrecursion 0)
however am not getting the right answer as this currently done in Excel:
Date Instances
01/03/2016 343
02/03/2016 326
03/03/2016 327
04/03/2016 332
05/03/2016 318
06/03/2016 317
07/03/2016 337
08/03/2016 332
09/03/2016 345
10/03/2016 349
11/03/2016 341
12/03/2016 323
13/03/2016 333
14/03/2016 349
15/03/2016 344
16/03/2016 358
17/03/2016 349
18/03/2016 350
19/03/2016 347
20/03/2016 351
21/03/2016 371
22/03/2016 369
23/03/2016 340
24/03/2016 335
25/03/2016 319
26/03/2016 341
27/03/2016 355
28/03/2016 351
29/03/2016 367
30/03/2016 379
31/03/2016 385
Updated as Per Op comment:
In summary for below row
Start_date End_Date ID
2016-03-01 06:30:00.000 2016-03-07 17:30:00.000 782772
Expected output would be:
01/03/2016 1
02/03/2016 1
03/03/2016 1
04/03/2016 1
05/03/2016 1
06/03/2016 1
07/03/2016 1
Like this i want to calculate for all rows per date
select convert(varchar(10),startdate,103) as datee,count(*) as occurences
from table
group by convert(varchar(10),startdate,103)
Update:
Try this
;with cte
as
(
select
startdate,enddate
datediff(day,enddate,startdate) as cnt
from
table
)
select
convert(varchar(10),startdate,103)as date,
sum(cnt)
from
cte
group by
convert(varchar(10),startdate,103)

How to resample time vector data matlab

I have to resample the following cell array:
dateS =
'2004-09-02 06:00:00'
'2004-09-02 07:30:00'
'2004-09-02 12:00:00'
'2004-09-02 18:00:00'
'2004-09-02 19:30:00'
'2004-09-03 00:00:00'
'2004-09-03 05:30:00'
'2004-09-03 06:00:00'
following an irregular spacing, e.g. between 1st and 2nd rows there are 5 readings, while between 2 and 3rd there are 10. The number of intermediates 'readings' are stored in a vector 'v'. So, what I need is a new vector with all the intermediate dates/times in the same format at dateS.
EDIT:
There's 1h30min = 90min between the first 2 readings in the list. Five intervals b/w them amounts to 90 mins / 5 = 18 mins. Now insert five 'readings' between (1) and (2), each separated by 18mins. I need to do that for all the dateS.
Any ideas? Thanks!
You can interpolate the serial dates with interp1():
% Inputs
dates = [
'2004-09-02 06:00:00'
'2004-09-02 07:30:00'
'2004-09-02 12:00:00'
'2004-09-02 18:00:00'
'2004-09-02 19:30:00'
'2004-09-03 00:00:00'
'2004-09-03 05:30:00'
'2004-09-03 06:00:00'];
v = [5 4 3 2 4 5 3];
% Serial dates
serdates = datenum(dates,'yyyy-mm-dd HH:MM:SS');
% Interpolate
x = cumsum([1 v]);
resampled = interp1(x, serdates, x(1):x(end))';
The result:
datestr(resampled)
ans =
02-Sep-2004 06:00:00
02-Sep-2004 06:18:00
02-Sep-2004 06:36:00
02-Sep-2004 06:54:00
02-Sep-2004 07:12:00
02-Sep-2004 07:30:00
02-Sep-2004 08:37:30
02-Sep-2004 09:45:00
02-Sep-2004 10:52:30
02-Sep-2004 12:00:00
02-Sep-2004 14:00:00
02-Sep-2004 16:00:00
02-Sep-2004 18:00:00
02-Sep-2004 18:45:00
02-Sep-2004 19:30:00
02-Sep-2004 20:37:30
02-Sep-2004 21:45:00
02-Sep-2004 22:52:30
03-Sep-2004 00:00:00
03-Sep-2004 01:06:00
03-Sep-2004 02:12:00
03-Sep-2004 03:18:00
03-Sep-2004 04:24:00
03-Sep-2004 05:30:00
03-Sep-2004 05:40:00
03-Sep-2004 05:50:00
03-Sep-2004 06:00:00
The following code does what you want (I picked arbitrary values for v - as long as the number of elements in vector v is one less than the number of entries in dateS this should work):
dateS = [
'2004-09-02 06:00:00'
'2004-09-02 07:30:00'
'2004-09-02 12:00:00'
'2004-09-02 18:00:00'
'2004-09-02 19:30:00'
'2004-09-03 00:00:00'
'2004-09-03 05:30:00'
'2004-09-03 06:00:00'];
% "stations":
v = [6 5 4 3 5 6 4];
dn = datenum(dateS);
df = diff(dn)'./v;
newDates = [];
for ii = 1:numel(v)
newDates = [newDates dn(ii) + (0:v(ii))*df(ii)];
end
newStrings = datestr(newDates, 'yyyy-mm-dd HH:MM:SS');
The array newStrings ends up containing the following: for example, you can see that the interval between the first and second time has been split into 6 15 minute segments
2004-09-02 06:00:00
2004-09-02 06:15:00
2004-09-02 06:30:00
2004-09-02 06:45:00
2004-09-02 07:00:00
2004-09-02 07:15:00
2004-09-02 07:30:00
2004-09-02 08:24:00
2004-09-02 09:18:00
2004-09-02 10:12:00
2004-09-02 11:06:00
2004-09-02 12:00:00
2004-09-02 13:30:00
2004-09-02 15:00:00
2004-09-02 16:30:00
2004-09-02 18:00:00
2004-09-02 18:30:00
2004-09-02 19:00:00
2004-09-02 19:30:00
2004-09-02 20:24:00
2004-09-02 21:18:00
2004-09-02 22:12:00
2004-09-02 23:06:00
2004-09-03 00:00:00
2004-09-03 00:55:00
2004-09-03 01:50:00
2004-09-03 02:45:00
2004-09-03 03:40:00
2004-09-03 04:35:00
2004-09-03 05:30:00
2004-09-03 05:37:30
2004-09-03 05:45:00
2004-09-03 05:52:30
The code relies on a few concepts:
A date can be represented as a string or a datenum. I use built in functions to go between them
Once you have the date/time as a number, it is easy to interpolate
I use the diff function to find the difference between successive times
I don't attempt to "vectorize" the code - you were not asking for efficient code, and for an example like this the clarity of a for loop trumps everything.