KDB+/Q: How to create a column that increments the occurrence of unique values of another column? - kdb

I am trying to create a column that increments the occurrence of unique (not the same as the previous) values in another column as such:
x y
=====
1 | 0
1 | 0
2 | 1
4 | 2
1 | 3
How could one achieve this functionality in kdb+?
Thanks

Does this work?
q)t:([]x:1 1 2 4 1)
q)update y:(sums 0b,1_differ x)from t
x y
---
1 0
1 0
2 1
4 2
1 3
differ looks at a list (or column of a table) and returns a list that is 1b in positions where the item is different to the item before that. It always starts with 1b though, so we have to drop the first element of the list using 1_ and add a 0b at the beginning with 0b,. Then we just take the running sum using sums.

Related

Consecutive episode

Good afternoon.
I have data like this
ID Indicator
1 0
1 1
1 0
1 1
1 0
1 1
2 0
2 1
2 1
2 1
2 1
2 1
2 1
2 1
I need to get ID which has at least 4 consecutive indicators =1. In this example I should get ID = 2, since it has 4 consecutive indicators= 1. Please help me how to do this in SPSS Modeler. Thank you so much for your help. ID 1 has first indicator=0, 2=1, 3=0,4=1, 5=0 , 6=1, ID 2 has first indicator=o, and others all = 1. There are two columns ID and Indicator, ID 1 has 6 rows and 2 has 8 rows.
To be precise: I want to output the ID that has 4 or more indicators set to 1 consecutively.
What you first need as a way to count the number of consecutive Indicator = 1 records for the same ID.
For this, you can use the "Derive" node with the following settings:
Set the 'Derive as' option to Count
Set the 'Increment when' to ID = #OFFSET(ID, 1) and INDICATOR = 1
Set the 'Increment by' to 1
Set the 'Reset when' to INDICATOR = 0
Following the 'Derive' node, you can then use a 'Select' node to only select the records where the number of consecutive 1's is equal to 4, and finally, use a 'Distinct' node to keep only one record for each ID.
I have shared a sample stream that shows the process here.

(q/kdb+) Merge items in a list

I have a list of items and need to merge them into a single column
using the list
list:(1 2;3 4 5 7;0 1 3)
index value
0 1 2
1 3 4 5 7
2 0 1 3
my goal is
select from list2
value
1
2
3
4
5
7
0
1
3
'raze' function flattens out 1 level of the list.
q) raze (1 2;3 4 5 7;0 1 3)
q) 1 2 3 4 5 7 0 1 3
If you have list with multi level indexing then use 'over' adverb with raze:
q) (raze/)(1 2 3;(11 12;33 44);5 6)
To convert that to table column:
q) t:([]c:raze list)
ungroup would also work provided your table doesn't have multiple columns with different nesting (or strings)
q)ungroup ([]list)
list
----
1
2
3
4
5
7
0
1
3
If you just wanted your list to appear like that I would do the following.
1 cut raze list
I see that you have used a select statement, however if you want your column defined as this in your table do the following
a:raze list
tab:([] b:a)
Your output from this should look like this
q)tab
b
-
1
2
3
4
5
7
0
1
3
Overall, a more concise way to achieve what you want to do would be
select from ([]raze list)
To avoid any errors you should not call the column header 'value' as this is a protected keyword in kdb+ and when you try to reassign it as a column header kdb will through an assign error
`assign
Hope this helps

Update multiple columns based on a single condition in kdb

I have a table -
q)t
a b c
--------
1 10 100
3 20 200
2 30 300
1 40 400
2 50 500
I wish to update column b and c values based on a single 'if' condition on column a. For example -
t:update b:0 from t where a=1
t:update c:0 from t where a=1
I could use vector conditional but don't want to as it would evaluate the condition twice for each row and my table has large number of rows.
update b:?[a=1;0;b], c:?[a=1;0;c] from t
Is there any way I can do it in so that 'a=1' condition is evaluated only once for each row?
Edit : I earlier missed mentioning that I want 'b' and 'c' to take some other values in 'else' condition and not just retain their original values -
update b:?[a=1;0;-1], c:?[a=1;0;-1] from t
update b:0, c:0 from t where a=1
If you'd like to use a vector conditional without evaluating the condition twice, you can evaluate it first e.g.
q)x:t.a=1
q)x
10010b
q)update b:?[x;0;-1],c:?[x;0;-1] from t
a b c
--------
1 0 0
3 -1 -1
2 -1 -1
1 0 0
2 -1 -1
Here you evaluate the condition and store the result in a variable, and then use that in the vector conditional
Alternatively you could do two update statements e.g.
t:update b:0, c:0 from t where a=1
t:update b:-1, c:-1 from t where a<>1
You can make a dictionary in your update with associated values for each column related to the a column.
update b:![1 2 3;-1 0 1]a,c:![1 2 3;-10 0 10]a from t
a b c
--------
1 -1 -10
3 1 10
2 0 0
1 -1 -10
2 0 0

Create a Boolean column displaying comparison between 2 other columns in kdb+

I'm currently learning kdb+/q.
I have a table of data. I want to take 2 columns of data (just numbers), compare them and create a new Boolean column that will display whether the value in column 1 is greater than or equal to the value in column 2.
I am comfortable using the update command to create a new column, but I don't know how to ensure that it is Boolean, how to compare the values and a method to display the "greater-than-or-equal-to-ness" - is it possible to do a simple Y/N output for that?
Thanks.
/ dummy data
q) show t:([] a:1 2 3; b: 0 2 4)
a b
---
1 0
2 2
3 4
/ add column name 'ge' with value from b>=a
q) update ge:b>=a from t
a b ge
------
1 0 0
2 2 1
3 4 1
Use a vector conditional:
http://code.kx.com/q/ref/lists/#vector-conditional
q)t:([]c1:1 10 7 5 9;c2:8 5 3 4 9)
q)r:update goe:?[c1>=c2;1b;0b] from t
c1 c2 goe
-------------
1 8 0
10 5 1
7 3 1
5 4 1
9 9 1
Use meta to confirm the goe column is of boolean type:
q)meta r
c | t f a
-------| -----
c1 | j
c2 | j
goe | b
The operation <= works well with vectors, but in some cases when a function needs atoms as input for performing an operation, you might want to use ' (each-both operator).
e.g. To compare the length of symbol string with another column value
q)f:{x<=count string y}
q)f[3;`ab]
0b
q)t:([] l:1 2 3; s: `a`bc`de)
q)update r:f'[l;s] from t
l s r
------
1 a 1
2 bc 1
3 de 0

Delete adjacent repeated terms

I have the following vector a:
a=[8,8,9,9,1,1,2,2,3,3,4,4,5,5,6,6,7,7,8,8]
From a I want to delete all "adjacent" repetitions to obtain:
b=[8,9,1,2,3,4,5,6,7,8]
However, when I do:
unique(a,'stable')
ans =
8 9 1 2 3 4 5 6 7
You see, unique only really gets the unique elements of a, whereas what I want is to delete the "duplicates"... How do I do this?
It looks like a run-length-encoding problem (check here). You can modify Mohsen's solution to get the desired output. (i.e. I claim no credit for this code, yet the question is not a duplicate in my opinion).
Here is the code:
a =[8,8,9,9,1,1,2,2,3,3,4,4,5,5,6,6,7,7,8,8]
F=find(diff([a(1)-1, a]));
Since diff(a) returns an array of length (length(a) -1), we want to add a value at the beginning (i.e the a(1)) to get a vector the same size as a. Here we subtract 1 so that, as mentioned by #surgical_tubing, the command find effectively finds it because it looks for non zero elements, so we want to make sure the value is non zero.
Hence diff([a(1)-1, a]) looks like this:
Columns 1 through 8
1 0 1 0 -8 0 1 0
Columns 9 through 16
1 0 1 0 1 0 1 0
Columns 17 through 20
1 0 1 0
Now having found the repeated elements, we index back into a with the positions found by find:
newa=a(F)
and output:
newa =
Columns 1 through 8
8 9 1 2 3 4 5 6
Columns 9 through 10
7 8