Update multiple columns based on a single condition in kdb - kdb

I have a table -
q)t
a b c
--------
1 10 100
3 20 200
2 30 300
1 40 400
2 50 500
I wish to update column b and c values based on a single 'if' condition on column a. For example -
t:update b:0 from t where a=1
t:update c:0 from t where a=1
I could use vector conditional but don't want to as it would evaluate the condition twice for each row and my table has large number of rows.
update b:?[a=1;0;b], c:?[a=1;0;c] from t
Is there any way I can do it in so that 'a=1' condition is evaluated only once for each row?
Edit : I earlier missed mentioning that I want 'b' and 'c' to take some other values in 'else' condition and not just retain their original values -
update b:?[a=1;0;-1], c:?[a=1;0;-1] from t

update b:0, c:0 from t where a=1

If you'd like to use a vector conditional without evaluating the condition twice, you can evaluate it first e.g.
q)x:t.a=1
q)x
10010b
q)update b:?[x;0;-1],c:?[x;0;-1] from t
a b c
--------
1 0 0
3 -1 -1
2 -1 -1
1 0 0
2 -1 -1
Here you evaluate the condition and store the result in a variable, and then use that in the vector conditional
Alternatively you could do two update statements e.g.
t:update b:0, c:0 from t where a=1
t:update b:-1, c:-1 from t where a<>1

You can make a dictionary in your update with associated values for each column related to the a column.
update b:![1 2 3;-1 0 1]a,c:![1 2 3;-10 0 10]a from t
a b c
--------
1 -1 -10
3 1 10
2 0 0
1 -1 -10
2 0 0

Related

KDB+/Q: How to create a column that increments the occurrence of unique values of another column?

I am trying to create a column that increments the occurrence of unique (not the same as the previous) values in another column as such:
x y
=====
1 | 0
1 | 0
2 | 1
4 | 2
1 | 3
How could one achieve this functionality in kdb+?
Thanks
Does this work?
q)t:([]x:1 1 2 4 1)
q)update y:(sums 0b,1_differ x)from t
x y
---
1 0
1 0
2 1
4 2
1 3
differ looks at a list (or column of a table) and returns a list that is 1b in positions where the item is different to the item before that. It always starts with 1b though, so we have to drop the first element of the list using 1_ and add a 0b at the beginning with 0b,. Then we just take the running sum using sums.

Comparing elements in each row of a matrix and count the similar values

I have a matrix like this:
line=[1 3 5 0 0 4 2;
1 3 8 0 8 2 2 ]
I want to compare the rows in this matrix. If the 1st column of the first row is the same as 1st column of second row then increase a counter. But if the value is zero, then the counter should not be increased.
For the example above I expect the output to be match = 3
where the matching values are 1,3,2 so the match = 3
I would go for this:
match = sum((line(1, :) == line(2, :)) & (line(1, :) != 0))
The Array comparison line(1, :) == line(2, :) will give you (logical) 1 at the points, where both rows have identical values:
ans =
1 1 0 1 0 0 1
Next, you need to exclude possible 0 values. That can be done by findind non-zero elements just in the first row (line(1, :) != 0), and then using the & operator on the results. You'll get:
ans =
1 1 0 0 0 0 1
At last, you just have to count the ones using sum.
You can check if the sum of each column divided by the first line equal 2.
So:
count = sum(sum(x)./x(1,:)==2)
Since 0/0 is indetermined, 0 will not be taken into account.

Create a Boolean column displaying comparison between 2 other columns in kdb+

I'm currently learning kdb+/q.
I have a table of data. I want to take 2 columns of data (just numbers), compare them and create a new Boolean column that will display whether the value in column 1 is greater than or equal to the value in column 2.
I am comfortable using the update command to create a new column, but I don't know how to ensure that it is Boolean, how to compare the values and a method to display the "greater-than-or-equal-to-ness" - is it possible to do a simple Y/N output for that?
Thanks.
/ dummy data
q) show t:([] a:1 2 3; b: 0 2 4)
a b
---
1 0
2 2
3 4
/ add column name 'ge' with value from b>=a
q) update ge:b>=a from t
a b ge
------
1 0 0
2 2 1
3 4 1
Use a vector conditional:
http://code.kx.com/q/ref/lists/#vector-conditional
q)t:([]c1:1 10 7 5 9;c2:8 5 3 4 9)
q)r:update goe:?[c1>=c2;1b;0b] from t
c1 c2 goe
-------------
1 8 0
10 5 1
7 3 1
5 4 1
9 9 1
Use meta to confirm the goe column is of boolean type:
q)meta r
c | t f a
-------| -----
c1 | j
c2 | j
goe | b
The operation <= works well with vectors, but in some cases when a function needs atoms as input for performing an operation, you might want to use ' (each-both operator).
e.g. To compare the length of symbol string with another column value
q)f:{x<=count string y}
q)f[3;`ab]
0b
q)t:([] l:1 2 3; s: `a`bc`de)
q)update r:f'[l;s] from t
l s r
------
1 a 1
2 bc 1
3 de 0

How to select specific rows based upon column attribute values in matlab?

I have [sentence cross words] logical matrix where value = 1 shows presence of a word in that sentence and 0 shows absence like as follows:
0 0 1 1
1 0 1 0
0 0 0 1
1 1 0 0
I have done some processing and selected specific words i.e. 2 & 3
result = 2 3
Now, I want to select only those rows in which value of words 2 & 3 are equal to 1 and return there row number as follows:
row = 1 2 4
This should be done for every word that is in result variable - thanks.
Think you are looking for something like this, assuming A as the input binary array -
result = [2 3]; %// select words by IDs
row = find(any(A(:,result),2))
Sample run -
A =
0 0 1 1
1 0 1 0
0 0 0 1
1 1 0 0
row =
1
2
4
For fun-sake, you can also use matrix-multiplication as an alternative approach -
row = find(A(:,result)*ones(numel(result),1))
First choose the columns that you want to extract and create a matrix that concatenates all of these columns together. Next, use any and operate along the columns in combination with find to obtain the desired locations.
Therefore, given your logical matrix stored in X, do:
ind = [2 3];
matr = X(:,ind);
vals = find(any(matr, 2));
With your above example, we get:
vals =
1
2
4

How to count patterns columnwise in Matlab?

I have a matrix S in Matlab that looks like the following:
2 2 1 2
2 3 1 1
3 3 1 1
3 4 1 1
3 1 2 1
4 1 3 1
1 1 3 1
I would like to count patterns of values column-wise. I am interested into the frequency of the numbers that follow right after number 3 in any of the columns. For instance, number 3 occurs three times in the first column. The first time we observe it, it is followed by 3, the second time it is followed by 3 again and the third time it is followed by 4. Thus, the frequency for the patters observed in the first column would look like:
3-3: 66.66%
3-4: 33.33%
3-1: 0%
3-2: 0%
To generate the output, you could use the convenient tabulate
S = [
2 2 1 2
2 3 1 1
3 3 1 1
3 4 1 1
3 1 2 1
4 1 3 1
1 1 3 1];
idx = find(S(1:end-1,:)==3);
S2 = S(2:end,:);
tabulate(S2(idx))
Value Count Percent
1 0 0.00%
2 0 0.00%
3 4 66.67%
4 2 33.33%
Here's one approach, finding the 3's then looking at the following digits
[i,j]=find(S==3);
k=i+1<=size(S,1);
T=S(sub2ind(size(S),i(k)+1,j(k))) %// the elements of S that are just below a 3
R=arrayfun(#(x) sum(T==x)./sum(k),1:max(S(:))).' %// get the number of probability of each digit
I'm going to restate your problem statement in a way that I can understand and my solution will reflect this new problem statement.
For a particular column, locate the locations that contain the number 3.
Look at the row immediately below these locations and look at the values at these locations
Take these values and tally up the total number of occurrences found.
Repeat these for all of the columns and update the tally, then determine the percentage of occurrences for the values.
We can do this by the following:
A = [2 2 1 2
2 3 1 1
3 3 1 1
3 4 1 1
3 1 2 1
4 1 3 1
1 1 3 1]; %// Define your matrix
[row,col] = find(A(1:end-1,:) == 3);
vals = A(sub2ind(size(A), row+1, col));
h = 100*accumarray(vals, 1) / numel(vals)
h =
0
0
66.6667
33.3333
Let's go through the above code slowly. The first few lines define your example matrix A. Next, we take a look at all of the rows except for the last row of your matrix and see where the number 3 is located with find. We skip the last row because we want to be sure we are within the bounds of your matrix. If there is a number 3 located at the last row, we would have undefined behaviour if we tried to check the values below the last because there's nothing there!
Once we do this, we take a look at those values in the matrix that are 1 row beneath those that have the number 3. We use sub2ind to help us facilitate this. Next, we use these values and tally them up using accumarray then normalize them by the total sum of the tallying into percentages.
The result would be a 4 element array that displays the percentages encountered per number.
To double check, if we look at the matrix, we see that the value of 3 follows other values of 3 for a total of 4 times - first column, row 3, row 4, second column, row 2 and third column, row 6. The value of 4 follows the value of 3 two times: first column, row 6, second column, row 3.
In total, we have 6 numbers we counted, and so dividing by 6 gives us 4/6 or 66.67% for number 3 and 2/6 or 33.33% for number 4.
If I got the problem statement correctly, you could efficiently implement this with MATLAB's logical indexing and an approach that is essentially of two lines -
%// Input 2D matrix
S = [
2 2 1 2
2 3 1 1
3 3 1 1
3 4 1 1
3 1 2 1
4 1 3 1
1 1 3 1]
Labels = [1:4]'; %//'# Label array
counts = histc(S([false(1,size(S,2)) ; S(1:end-1,:) == 3]),Labels)
Percentages = 100*counts./sum(counts)
Verify/Present results
The styles for presenting the output results listed next use MATLAB's table for a well human-readable format of data.
Style #1
>> table(Labels,Percentages)
ans =
Labels Percentages
______ ___________
1 0
2 0
3 66.667
4 33.333
Style #2
You can do some fancy string operations to present the results in a more "representative" manner -
>> Labels_3 = strcat('3-',cellstr(num2str(Labels','%1d')'));
>> table(Labels_3,Percentages)
ans =
Labels_3 Percentages
________ ___________
'3-1' 0
'3-2' 0
'3-3' 66.667
'3-4' 33.333
Style #3
If you want to present them in descending sorted manner based on the percentages as listed in the expected output section of the question, you can do so with an additional step using sort -
>> [Percentages,idx] = sort(Percentages,'descend');
>> Labels_3 = strcat('3-',cellstr(num2str(Labels(idx)','%1d')'));
>> table(Labels_3,Percentages)
ans =
Labels_3 Percentages
________ ___________
'3-3' 66.667
'3-4' 33.333
'3-1' 0
'3-2' 0
Bonus Stuff: Finding frequency (counts) for all cases
Now, let's suppose you would like repeat this process for say 1, 2 and 4 as well, i.e. find occurrences after 1, 2 and 4 respectively. In that case, you can iterate the above steps for all cases and for the same you can use arrayfun -
%// Get counts
C = cell2mat(arrayfun(#(n) histc(S([false(1,size(S,2)) ; S(1:end-1,:) == n]),...
1:4),1:4,'Uni',0))
%// Get percentages
Percentages = 100*bsxfun(#rdivide, C, sum(C,1))
Giving us -
Percentages =
90.9091 20.0000 0 100.0000
9.0909 20.0000 0 0
0 60.0000 66.6667 0
0 0 33.3333 0
Thus, in Percentages, the first column are the counts of [1,2,3,4] that occur right after there is a 1 somewhere in the input matrix. As as an example, one can see column -3 of Percentages is what you had in the sample output when looking for elements right after 3 in the input matrix.
If you want to compute frequencies independently for each column:
S = [2 2 1 2
2 3 1 1
3 3 1 1
3 4 1 1
3 1 2 1
4 1 3 1
1 1 3 1]; %// data: matrix
N = 3; %// data: number
r = max(S(:));
[R, C] = size(S);
[ii, jj] = find(S(1:end-1,:)==N); %// step 1
count = full(sparse(S(ii+1+(jj-1)*R), jj, 1, r, C)); %// step 2
result = bsxfun(#rdivide, count, sum(S(1:end-1,:)==N)); %// step 3
This works as follows:
find is first applied to determine row and col indices of occurrences of N in S except its last row.
The values in the entries right below the indices of step 1 are accumulated for each column, in variable count. The very convenient sparse function is used for this purpose. Note that this uses linear indexing into S.
To obtain the frequencies for each column, count is divided (with bsxfun) by the number of occurrences of N in each column.
The result in this example is
result =
0 0 0 NaN
0 0 0 NaN
0.6667 0.5000 1.0000 NaN
0.3333 0.5000 0 NaN
Note that the last column correctly contains NaNs because the frequency of the sought patterns is undefined for that column.