groupby functions to get subsequent value - group-by

In my data I have stock volumes for order sequence and times, I need to go through each part of the order and find when it ends, by grabbing the next part of the chains time.
I am just starting in python and I would do this by subsetting each stock into its own pool, then adding then do another loop to find the time of the next order for that sequence. Ultimately, in R/Matlab you could go X$time[1:end-1] <- X$time[2:end,]
My question: can I use the df.groupby['sequence'].{for each entry get the time from the subsequent entry}???
I think last() would give me the last value of that entire sequence, I would like the time of that the next sequence starts/ appears
I have a set of type:
sequence time
a 1
b 1
a 3
a 5
b 2
I would like
sequence time nexttime
a 1 3
b 1 2
a 3 5
a 5 999
b 2 999

In [24]: df
Out[24]:
sequence time
0 a 1
1 b 1
2 a 3
3 a 5
4 b 2
In [25]: df['nexttime'] = df.groupby('sequence').time.shift(-1).fillna(999)
In [26]: df
Out[26]:
sequence time nexttime
0 a 1 3
1 b 1 2
2 a 3 5
3 a 5 999
4 b 2 999

Related

Counting presence or absence of tallies by rows

Hi I am not sure how to explain what I need but I'll try. I need a query (if there is one) for counting if a tally was present in a point (column). So all species with more than one tally in a point will count only as one.
This is how the data looks:
Sp Site Pnt1 Pnt2 Pnt3 Total
A 1 1 1 1 3
A 2 1 1 2
A 3 1 1
B 1 1 1 1 3
B 2 1 1
C 1 1 1 2
C 2 0
I want to count if the sites have tally or not and if they are repeated by points (for the same species) I want to count them as one. I would like the resulting table to look like this.
Sp Pnt1 Pnt2 Pnt3 Total
A 1 1 1 3
B 1 1 1 3
C 0 1 1 2
Thanks for all the help you can provide.

How do I implement convolution in KDB?

I'm trying to convolve a small vector (kernel) across a longer series.
The simplest possible kernel is (-1 1), which is the equivalent of the diff operator.
My failed attempt is:
{sum (x;y)*(-1;1)} scan til 10
0 1 1 2 2 3 3 4 4 5
This doesn't work, as it supplies the previous evaluation as the right component of the binary fn. What I should be doing is evaluating the function on each pair and storing the result. The result I'm looking for is:
1 1 1 1 1 1 1 1 1
What I can't figure out is an elegant KDB way of doing this calculation.
Is there an elegant way to do this in KDB?
You could use the each prior iterator to do this - https://code.kx.com/q/ref/maps/#each-prior
q)-':[til 10]
0 1 1 1 1 1 1 1 1 1
Maybe something like:
q)1_sum(1;-1)*1 prev\til 10
1 1 1 1 1 1 1 1 1
Can you provide more examples?

KDB/Q how do we calculate the moving median

There are already moving average in kdb/q.
https://code.kx.com/q/ref/avg/#mavg
But how do I compute moving median?
Here is a naive approach. It starts with an empty list and null median and iterates over the list feeding in a new value each time.
Sublist is used fix the window, and this window is passed along with the median as the state of into the next iteration.
At the end scan \ will output the state at every iteration from which we take the median (first element) from each one
mmed:{{(med l;l:neg[x] sublist last[y],z)}[x]\[(0n;());y][;0]}
q)mmed[5;til 10]
0 0.5 1 1.5 2 3 4 5 6 7
q)i:4 9 2 7 0 1 9 2 1 8
q)mmed[3;i]
4 6.5 4 7 2 1 1 2 2 2
There's also a generic "sliding window" function here which you can pass your desired aggregator into: https://code.kx.com/q/kb/programming-idioms/#how-do-i-apply-a-function-to-a-sequence-sliding-window
q)swin:{[f;w;s] f each { 1_x,y }\[w#0;s]}
q)swin[avg; 3; til 10]
0 0.33333333 1 2 3 4 5 6 7 8
q)update newcol:swin[med;10;mycol] from tab

(q/kdb+) Merge items in a list

I have a list of items and need to merge them into a single column
using the list
list:(1 2;3 4 5 7;0 1 3)
index value
0 1 2
1 3 4 5 7
2 0 1 3
my goal is
select from list2
value
1
2
3
4
5
7
0
1
3
'raze' function flattens out 1 level of the list.
q) raze (1 2;3 4 5 7;0 1 3)
q) 1 2 3 4 5 7 0 1 3
If you have list with multi level indexing then use 'over' adverb with raze:
q) (raze/)(1 2 3;(11 12;33 44);5 6)
To convert that to table column:
q) t:([]c:raze list)
ungroup would also work provided your table doesn't have multiple columns with different nesting (or strings)
q)ungroup ([]list)
list
----
1
2
3
4
5
7
0
1
3
If you just wanted your list to appear like that I would do the following.
1 cut raze list
I see that you have used a select statement, however if you want your column defined as this in your table do the following
a:raze list
tab:([] b:a)
Your output from this should look like this
q)tab
b
-
1
2
3
4
5
7
0
1
3
Overall, a more concise way to achieve what you want to do would be
select from ([]raze list)
To avoid any errors you should not call the column header 'value' as this is a protected keyword in kdb+ and when you try to reassign it as a column header kdb will through an assign error
`assign
Hope this helps

Delete adjacent repeated terms

I have the following vector a:
a=[8,8,9,9,1,1,2,2,3,3,4,4,5,5,6,6,7,7,8,8]
From a I want to delete all "adjacent" repetitions to obtain:
b=[8,9,1,2,3,4,5,6,7,8]
However, when I do:
unique(a,'stable')
ans =
8 9 1 2 3 4 5 6 7
You see, unique only really gets the unique elements of a, whereas what I want is to delete the "duplicates"... How do I do this?
It looks like a run-length-encoding problem (check here). You can modify Mohsen's solution to get the desired output. (i.e. I claim no credit for this code, yet the question is not a duplicate in my opinion).
Here is the code:
a =[8,8,9,9,1,1,2,2,3,3,4,4,5,5,6,6,7,7,8,8]
F=find(diff([a(1)-1, a]));
Since diff(a) returns an array of length (length(a) -1), we want to add a value at the beginning (i.e the a(1)) to get a vector the same size as a. Here we subtract 1 so that, as mentioned by #surgical_tubing, the command find effectively finds it because it looks for non zero elements, so we want to make sure the value is non zero.
Hence diff([a(1)-1, a]) looks like this:
Columns 1 through 8
1 0 1 0 -8 0 1 0
Columns 9 through 16
1 0 1 0 1 0 1 0
Columns 17 through 20
1 0 1 0
Now having found the repeated elements, we index back into a with the positions found by find:
newa=a(F)
and output:
newa =
Columns 1 through 8
8 9 1 2 3 4 5 6
Columns 9 through 10
7 8