Distribute elements of one list over elements of another list - kdb

I have two lists:
l1:`a`b`c;
l2: til 20;
I am trying to create a dictionary 'd' that contains the elements of 'l1' as key and the elements of 'l2' evenly distributed over it. So like this:
d:(`a`b`c)!(0j, 3j, 6j, 9j, 12j, 15j, 18j;1j, 4j, 7j, 10j, 13j, 16j, 19j;2j, 5j, 8j, 11j, 14j, 17j)
The order of the elements is not relevant, I just need them balanced. I was able to achieve that in an iterative way (happy to add the code, if that's considered helpful), but there must be a more elegant way (potentially with adverbs?).

It can be done using the group :
q)group (count[l2]#l1)
(`a`b`c)!(0j, 3j, 6j, 9j, 12j, 15j, 18j;1j, 4j, 7j, 10j, 13j, 16j, 19j;2j, 5j, 8j, 11j, 14j, 17j)
If your l2 is something else instead of til 20 , then you have to lookup the items back after grouping :
q)l2: 20#.Q.a
q)l2
"abcdefghijklmnopqrst"
q)l2 group (count[l2]#l1) // lookup the items back from l2 after grouping
(`a`b`c)!("adgjmps";"behknqt";"cfilor")

You can use the reshape functionality of the take operator #. It takes two arguments: a LHS of at least 2 dimensions and the list to reshape.
For example (3;4)#til 12 will reshape the list 0 1 ... 12 into a 3 by 4 matrix
In our case, the number of the number of elements in l1 will will not necessary divide exactly into the number of elements in l2 (we don't want a rectangular matrix). Instead we can supply a null as the second dimension which will take care of distributing the remainders.
q) l1!(count[l1];0N)#l2
a| 0 1 2 3 4 5
b| 6 7 8 9 10 11 12
c| 13 14 15 16 17 18 19
This method performs very well for larger input lists.
As a side note, when using .Q.fc to split a vector argument over n slaves for multi-threading, kdb uses the # operator to reshape the vector into n vectors, one for each slave.

q)d:`a`b`c!{a where x = (a:til 20) mod y}'[til 3;3]
q)d
a| 0 3 6 9 12 15 18
b| 1 4 7 10 13 16 19
c| 2 5 8 11 14 17

Related

Applying elements to object vs applying them to object's symbol name

I saw a technique to access directory elements by its symbol rather then its name (see q.k):
`.q `svar`sdev`scov`med / instead of .q `svar`sdev`scov`med
Why and when this approach is useful?
Also for some reason the behaviour is reversed compared to the # apply:
q)l:til 5
q)`l[2 3]: 20 30 / 'assign, `l not changed
/ (upd: `l is not just a symbol here, it is exactly refers to a list: `l[2 3] gets l elements)
q)l[2 3]: 21 31; l / l changed
0 1 21 31 4
But when we use # apply syntax, the result is reversed:
q)#[l;2 3;:;22 32]; l / l not changed
0 1 21 31 4
q)#[`l;2 3;:;23 33]; l / `l changed
0 1 23 33 4
upd:
Applying indexes to a symbol does no work in shakti, looks like this idea hadn't withstand the test of time.
By directory, I think you mean dictionary as this is the data structure which is returned when you call .q.
You can access dictionary elements a number of ways:
q)d:`a`b`c!1 2 3
q)d
a| 1
b| 2
c| 3
q)d`a
1
q)`d `a
1
q)d[`a]
1
q)#[d;`a]
1
q)#[`d;`a]
1
q) / etc ...
which are all syntactic sugar for each other, it just depends which you prefer (or what the situation dictates is better).
In the code below,
q)`l[2 3]:20 30
'assign
[0] `l[2 3]:20 30
`l is simply the symbol `l, not a reference to the list l, which is why you get an assign error.
The # operator is slightly different,
q)#[l;0;:;20]
20 1 2 3 4
q)l
0 1 2 3 4
q) / -vs-
q)#[`l;0;:;20]
`l
q)l
20 1 2 3 4
adding the backtick to l is telling q that you want to update l not just apply the operation to the list and return the result.

How can we use iterators in q to apply a list of functions to each of a list of arguments?

In q/kdb, we can apply a function to a number of arguments, as below:
f each (1;2;3)
We can also apply a defined argument to a list of functions:
flist: (f1:{x+y+z},f2:{x+y-z},f3:{x-y+z});
flist .\: 1 2 3
What is the most efficient way to combine both of these- to apply every function in a list to each value in a list as parameters. For example, to apply 3 unary functions, f1, f2 and f3, to a list containing values 1, 2 and 3 (producing 9 calls).
Any help with this is much appreciated!
You can use the eachboth (') operator:
q)f1:1+;f2:2+;f3:3+
q)(f1;f2;f3) #' 10 20 30
11 22 33
or in the case of multi-argument functions,
q)g1:+;g2:-;g3:*
q)(g1;g2;g3) .' (2 1;3 2;2 2)
3 1 4
and if you want to apply each function to each value, you need to form a cross product first:
q)(#/)each(f1;f2;f3) cross 10 20 30
11 21 31 12 22 32 13 23 33
You can use the unary apply-at # (since you are dealing with unary functions), in combination with each-left & each-right. For example:
q)({x+1};{neg x};{x*x}) #\:/: (1 2 3)
2 -1 1
3 -2 4
4 -3 9

Matlab, Sum Function for a Matrix row

Basically the sum function calculate the sum of the columns, that is to say if we have a 4x4 matrix we would get a 1X4 vector
A = magic(4)
A =
16 2 3 13
5 11 10 8
9 7 6 12
4 14 15 1
sum(A)
ans =
34 34 34 34
But if I want to get the Summation of the rows then i have 2 methods, the first is to get the transpose of the matrix then get the summation of the transposed matrix,and finally get the transpose of the result...., The Second method is to use dimension argument for the Sum function "sum(A, 2)"
A = magic(4)
A =
16 2 3 13
5 11 10 8
9 7 6 12
4 14 15 1
sum(A,2)
ans =
34
34
34
34
The problem is here I cannot understand how this is done, If anyone could please tell me the idea/concept behind this method,
It's hard to tell exactly how sum internally works, but we can guess it does something similar to this.
Matlab stores matrices (or N-dimensional arrays) in memory using column-major order. This means the order for the elements in memory for a 3 x 4 matrix is
1 4 7 10
2 5 8 11
3 6 9 12
So it first stores element (1,1), then (1,2), then (13), then (2,1), ...
In fact, this is the order you use when you apply linear indexing (that is, index a matrix with a single number). For example, let
A = [7 8 6 2
9 0 3 5
6 3 2 1];
Then A(4) gives 8.
With this in mind, it's easy to guess that what sum(A,1) does is traverse elements consecutively: A(1)+A(2)+A(3) to obtain the sum of the first column, then A(4)+A(5)+A(6) to sum the second column, etc. In contrast, sum(A,2) proceeds in steps of size(A,1) (3 in this example): A(1)+A(4)+A(7)+A(10) to compute the sum of the first row, etc.
As a side note, this is probably related with the observed fact that sum(A,1) is faster than sum(A,2).
I'm really not sure what you are asking. sum takes two inputs, the first of which is a multidimensional array A, say.
Now let's take sA = size(A), and d between 1 and ndims(A).
To understand what B = sum(A,d) does, first we find out what the size of B is.
That's easy, sB = sA; sB(d) = 1;. So in a way, it will "reduce" the size of A along dimension d.
The rest is trivial: every element in B is the sum of elements in A along dimension d.
Basically, sum(A) = sum(A,1) which outputs the sum of the columns in the matrix. 1 indicates the columns. So, sum(A,2) outputs the sum of the rows in the matrix. 2 indicating the rows. More than that, the sum command will output the entire matrix because there is only 2 dimensions (rows and columns)

Finding index of vector from its original matrix

I have a matrix of 2d lets assume the values of the matrix
a =
17 24 1 8 15
23 5 7 14 16
4 6 13 20 22
10 12 19 21 3
17 24 1 8 15
11 18 25 2 9
This matrix is going to be divided into three different matrices randomly let say
b =
17 24 1 8 15
23 5 7 14 16
c =
4 6 13 20 22
11 18 25 2 9
d =
10 12 19 21 3
17 24 1 8 15
How can i know the index of the vectors in matrix d for example in the original matrix a,note that the values of the matrix can be duplicated.
for example if i want to know the index of {10 12 19 21 3} in matrix a?
or the index of {17 24 1 8 15} in matrix a,but for this one should return only on index value?
I would appreciate it so much if you can help me with this. Thank you in advance
You can use ismember with the 'rows' option. For example:
tf = ismember(a, c, 'rows')
Should produce:
tf =
0
0
1
0
0
1
To get the indices of the rows, you can apply find on the result of ismember (note that it's redundant if you're planning to use this vector for matrix indexing). Here find(tf) return the vector [3; 6].
If you want to know the number of the row in matrix a that matches a single vector, you either use the method explained and apply find, or use the second output parameter of ismember. For example:
[tf, loc] = ismember(a, [10 12 19 21 3], 'rows')
returns loc = 4 for your example. Note that here a is the second parameter, so that the output variable loc would hold a meaningful result.
Handling floating-point numbers
If your data contains floating point numbers, The ismember approach is going to fail because floating-point comparisons are inaccurate. Here's a shorter variant of Amro's solution:
x = reshape(c', size(c, 2), 1, []);
tf = any(all(abs(bsxfun(#minus, a', x)) < eps), 3)';
Essentially this is a one-liner, but I've split it into two commands for clarity:
x is the target rows to be searched, concatenated along the third dimension.
bsxfun subtracts each row in turn from all rows of a, and the magnitude of the result is compared to some small threshold value (e.g eps). If all elements in a row fall below it, mark this row as "1".
It depends on how you build those divided matrices. For example:
a = magic(5);
d = a([2 1 2 3],:);
then the matching rows are obviously: 2 1 2 3
EDIT:
Let me expand on the idea of using ismember shown by #EitanT to handle floating-point comparisons:
tf = any(cell2mat(arrayfun(#(i) all(abs(bsxfun(#minus, a, d(i,:)))<1e-9,2), ...
1:size(d,1), 'UniformOutput',false)), 2)
not pretty but works :) This would be necessary for comparisons such as: 0.1*3 == 0.3
(basically it compares each row of d against all rows of a using an absolute difference)

Find the increasing and decreasing trend in a curve MATLAB

a=[2 3 6 7 2 1 0.01 6 8 10 12 15 18 9 6 5 4 2].
Here is an array i need to extract the exact values where the increasing and decreasing trend starts.
the output for the array a will be [2(first element) 2 6 9]
a=[2 3 6 7 2 1 0.01 6 8 10 12 15 18 9 6 5 4 2].
^ ^ ^ ^
| | | |
Kindly help me to get the result in MATLAB for any similar type of array..
You just have to find where the sign of the difference between consecutive numbers changes.
With some common sense and the functions diff, sign and find, you get this solution:
a = [2 3 6 7 2 1 0.01 6 8 10 12 15 18 9 6 5 4 2];
sda = sign(diff(a));
idx = [1 find(sda(1:end-1)~=sda(2:end))+2 ];
result = a(idx);
EDIT:
The sign function messes things up when there are two consecutive numbers which are the same, because sign(0) = 0, which is falsely identified as a trend change. You'd have to filter these out. You can do this by first removing the consecutive duplicates from the original data. Since you only want the values where the trend change starts, and not the position where it actually starts, this is easiest:
a(diff(a)==0) = [];
This is a great place to use the diff function.
Your first step will be to do the following:
B = [0 diff(a)]
The reason we add the 0 there is to keep the matrix the same length because of the way the diff function works. It will start with the first element in the matrix and then report the difference between that and the next element. There's no leading element before the first one so is just truncates the matrix by one element. We add a zero because there is no change there as it's the starting element.
If you look at the results in B now it is quite obvious where the inflection points are (where you go from positive to negative numbers).
To pull this out programatically there are a number of things you can do. I tend to use a little multiplication and the find command.
Result = find(B(1:end-1).*B(2:end)<0)
This will return the index where you are on the cusp of the inflection. In this case it will be:
ans =
4 7 13