It is hard for me to describe in words what this function does, but I have some working code.
f:{[n;k] sum flip k </: til n}
i:i: 3 4 6 7 13;
f[30;i]
0 0 0 0 1 2 2 3 4 4 4 4 4 4 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5i
In am concerned that the flip operation may be expensive for large input values. Is there a way to do this without the flip that is more efficient?
Being just as concise would be a nice-to-have.
For your inputs you can achieve the same result with
{[n;k] k binr til n}
This should work so long as k remains in ascending order.
Docs for binr are here: https://code.kx.com/q/ref/bin/
Seans answer is probably more efficient (test it), but why not change your each-right to each-left to avoid the flip?
q)g:{[n;k] sum k<\:til n}
q)f[30;i]~g[30;i]
1b
There are already moving average in kdb/q.
https://code.kx.com/q/ref/avg/#mavg
But how do I compute moving median?
Here is a naive approach. It starts with an empty list and null median and iterates over the list feeding in a new value each time.
Sublist is used fix the window, and this window is passed along with the median as the state of into the next iteration.
At the end scan \ will output the state at every iteration from which we take the median (first element) from each one
mmed:{{(med l;l:neg[x] sublist last[y],z)}[x]\[(0n;());y][;0]}
q)mmed[5;til 10]
0 0.5 1 1.5 2 3 4 5 6 7
q)i:4 9 2 7 0 1 9 2 1 8
q)mmed[3;i]
4 6.5 4 7 2 1 1 2 2 2
There's also a generic "sliding window" function here which you can pass your desired aggregator into: https://code.kx.com/q/kb/programming-idioms/#how-do-i-apply-a-function-to-a-sequence-sliding-window
q)swin:{[f;w;s] f each { 1_x,y }\[w#0;s]}
q)swin[avg; 3; til 10]
0 0.33333333 1 2 3 4 5 6 7 8
q)update newcol:swin[med;10;mycol] from tab
I'm struggling to understand this q code programming idiom from the kx cookbook:
q)swin:{[f;w;s] f each { 1_x,y }\[w#0;s]}
q)swin[avg; 3; til 10]
0 0.33333333 1 2 3 4 5 6 7 8
The notation is confusing. Is there an easy way to break it down as a beginner?
I get that the compact notation for the function is probably equivalent to this
swin:{[f;w;s] f each {[x; y] 1_x, y }\[w#0;s]}
w#0 means repeat 0 w times (w is some filler for the first couple of observations?), and 1_x, y means join x, after dropping the first observation, to y. But I don't understand how this then plays out with f = avg applied with each. Is there a way to understand this easily?
http://code.kx.com/q/ref/adverbs/#converge-iterate
Scan (\) on a binary (two-param) function takes the first argument as the seed value - in this case 3#0 - and iterates through each of the items in the second list - in this case til 10 - applying the function (append new value, drop first).
q){1_x,y}\[3#0;til 10]
0 0 0
0 0 1
0 1 2
1 2 3
2 3 4
3 4 5
4 5 6
5 6 7
6 7 8
7 8 9
So now you have ten lists and you can apply a function to each list - in this case avg but it could be any other function that applies to a list
q)med each {1_x,y}\[3#0;til 10]
0 0 1 2 3 4 5 6 7 8f
q)
q)first each {1_x,y}\[3#0;til 10]
0 0 0 1 2 3 4 5 6 7
q)
q)last each {1_x,y}\[3#0;til 10]
0 1 2 3 4 5 6 7 8 9
I have the following vector a:
a=[8,8,9,9,1,1,2,2,3,3,4,4,5,5,6,6,7,7,8,8]
From a I want to delete all "adjacent" repetitions to obtain:
b=[8,9,1,2,3,4,5,6,7,8]
However, when I do:
unique(a,'stable')
ans =
8 9 1 2 3 4 5 6 7
You see, unique only really gets the unique elements of a, whereas what I want is to delete the "duplicates"... How do I do this?
It looks like a run-length-encoding problem (check here). You can modify Mohsen's solution to get the desired output. (i.e. I claim no credit for this code, yet the question is not a duplicate in my opinion).
Here is the code:
a =[8,8,9,9,1,1,2,2,3,3,4,4,5,5,6,6,7,7,8,8]
F=find(diff([a(1)-1, a]));
Since diff(a) returns an array of length (length(a) -1), we want to add a value at the beginning (i.e the a(1)) to get a vector the same size as a. Here we subtract 1 so that, as mentioned by #surgical_tubing, the command find effectively finds it because it looks for non zero elements, so we want to make sure the value is non zero.
Hence diff([a(1)-1, a]) looks like this:
Columns 1 through 8
1 0 1 0 -8 0 1 0
Columns 9 through 16
1 0 1 0 1 0 1 0
Columns 17 through 20
1 0 1 0
Now having found the repeated elements, we index back into a with the positions found by find:
newa=a(F)
and output:
newa =
Columns 1 through 8
8 9 1 2 3 4 5 6
Columns 9 through 10
7 8
In my data I have stock volumes for order sequence and times, I need to go through each part of the order and find when it ends, by grabbing the next part of the chains time.
I am just starting in python and I would do this by subsetting each stock into its own pool, then adding then do another loop to find the time of the next order for that sequence. Ultimately, in R/Matlab you could go X$time[1:end-1] <- X$time[2:end,]
My question: can I use the df.groupby['sequence'].{for each entry get the time from the subsequent entry}???
I think last() would give me the last value of that entire sequence, I would like the time of that the next sequence starts/ appears
I have a set of type:
sequence time
a 1
b 1
a 3
a 5
b 2
I would like
sequence time nexttime
a 1 3
b 1 2
a 3 5
a 5 999
b 2 999
In [24]: df
Out[24]:
sequence time
0 a 1
1 b 1
2 a 3
3 a 5
4 b 2
In [25]: df['nexttime'] = df.groupby('sequence').time.shift(-1).fillna(999)
In [26]: df
Out[26]:
sequence time nexttime
0 a 1 3
1 b 1 2
2 a 3 5
3 a 5 999
4 b 2 999