PDL pairwise row comparison - perl

I have created a PDL matrix. I need to do a pairwise comparison between each row. Currently I am using the 'where' and 'cov' command to return the pairwise comparison for two slices (generated in a perl loop).
My question: How can I use 'range' and 'slice' to loop over the rows in a pairwise fashion? How can I return my index position? I have looped over the matrix using perl. I have read that looping with perl really cripples the power of PDL.
Desired output:
indexA indexB Value
pos1 pos5 1
pos1 pos6 5
pos1 pos0 7
To be clear I only want to use PDL functionality.
Here is some pseudo code that will (hopeful) illustrate my point better.
p $b
[
[1 0 3 0]
[0 1 0 1]
[1 3 1 3] <- example piddle y
[0 1 0 1] <- example piddle z
]
my concept function{
slice $b (grab row z) - works fine
slice $b (grab row y) - works fine
($a, $b) = where($a,$b, $a < 3 && $b < 3 ) - works fine
p $a [1 1]
p $b [0 0]
cov($a $b) - works just fine.
}
I just need a way to execute pairwise across all rows. I will need to do factorial(n rows) comparisons.

PDL threading is the concept you are looking for here. The general technique for looping along dimensions is to add dummy dimensions in the appropriate places so that the calculation generates implicit threadloops needed. For a multi-dimensional problem, there can be a number of different ways to add dims and hence to create the threadloops.
For your pairwise row calculation, you can choose two nested loops over slice indexes which has perl loops over the two index counts and will generate PDL threading along the rows. You could use just one perl loop over indexes but take advantage of implicit threadlooping to calculate for all rows at once.
A fully PDL-threadloop computation would be to add a dummy dimension for the loop over rows for each of the arguments so that you would calculate the entire N**2 row calculations at once. Here is an example for a shape [4,3] array with the calculation being the == operator:
pdl> $b = floor(random(4,3)*5)
pdl> p $b
[
[0 4 3 3]
[3 3 4 2]
[4 0 1 4]
]
pdl> p $b(,*3)==$b(,,*3)
[
[
[1 1 1 1]
[0 0 0 0]
[0 0 0 0]
]
[
[0 0 0 0]
[1 1 1 1]
[0 0 0 0]
]
[
[0 0 0 0]
[0 0 0 0]
[1 1 1 1]
]
]
The result is a shape [4,3,3] piddle with the 0th dimension corresponding to the rows resulting from the pairwise calculation and the 1st and 2nd dims correspond to the row indexes involved in the == operation.
If you need an index value from or for one of these threadloop calculations, use the xvals, yvals, zvals, or axisvals to generate a piddle with the index values corresponding to that array axis.
pdl> p $b->xvals
[
[0 1 2 3]
[0 1 2 3]
[0 1 2 3]
]
pdl> p $b->yvals
[
[0 0 0 0]
[1 1 1 1]
[2 2 2 2]
]
There are a lot of details relating to the implementation of the PDL threading (not the same as perl threading or posix threads). I recommend the perldl mailing list for reference and discussion with other PDL users and developers. Also, see the first on-line draft of the PDL Book which has more comprehensive coverage of PDL computation and threading.

I think what you're looking for is a method to find all different pairs of rows in the array and then process each pair using cov? If that's correct then I haven't heard of cov and a quick search through the documentation doesn't help. However I can say a few things that may help.
I think you're being overly cautious about dropping out of PDL into Perl code, which will be fine if all you are doing is looping over the indices of all row pairs and pulling those rows out using slice. This is shown in the some sample code below.
Also you can't call where like that as $a < 3 etc. are piddles themselves and the boolean operator won't do what you want on them. Use the & operator instead, and add some parentheses to make sure the expression gets executed in the right order.
Beyond that I can't help unless you correct my understanding of your question or direct me to some documentation of the cov subroutine.
use strict;
use warnings;
use PDL;
my $dat = pdl <<END;
[
[1 0 3 0]
[0 1 0 1]
[1 3 1 3]
[0 1 0 1]
]
END
my $max2 = $dat->dim(1) - 1;
for my $i (0 .. $max2 - 1) {
for my $j ($i + 1 .. $max2) {
my $row1 = $dat->slice(",($i)");
my $row2 = $dat->slice(",($j)");
($row1, $row2) = where($row1, $row2, ($row1 < 3) & ($row2 < 3));
cov($row1, $row2);
}
}

Related

assign new matrix values based on row and column index vectors

New to MatLab here (R2015a, Mac OS 10.10.5), and hoping to find a solution to this indexing problem.
I want to change the values of a large 2D matrix, based on one vector of row indices and one of column indices. For a very simple example, if I have a 3 x 2 matrix of zeros:
A = zeros(3, 2)
0 0
0 0
0 0
I want to change A(1, 1) = 1, and A(2, 2) = 1, and A(3, 1) = 1, such that A is now
1 0
0 1
1 0
And I want to do this using vectors to indicate the row and column indices:
rows = [1 2 3];
cols = [1 2 1];
Is there a way to do this without looping? Remember, this is a toy example that needs to work on a very large 2D matrix. For extra credit, can I also include a vector that indicates which value to insert, instead of fixing it at 1?
My looping approach is easy, but slow:
for i = 1:length(rows)
A(rows(i), cols(i)) = 1;
end
sub2ind can help here,
A = zeros(3,2)
rows = [1 2 3];
cols = [1 2 1];
A(sub2ind(size(A),rows,cols))=1
A =
1 0
0 1
1 0
with a vector to 'insert'
b = [1,2,3];
A(sub2ind(size(A),rows,cols))=b
A =
1 0
0 2
3 0
I found this answer online when checking on the speed of sub2ind.
idx = rows + (cols - 1) * size(A, 1);
therefore
A(idx) = 1 % or b
5 tests on a big matrix (~ 5 second operations) shows it's 20% faster than sub2ind.
There is code for an n-dimensional problem here too.
What you have is basically a sparse definition of a matrix. Thus, an alternative to sub2ind is sparse. It will create a sparse matrix, use full to convert it to a full matrix.
A=full(sparse(rows,cols,1,3,2))

Adding additional ones that surround other values of one in a vector in MATLAB

Given a vector of zeros and ones in MATLAB, where the zeros represent an event in time, I would like to add additional ones before and after the existing ones in order to capture additional variation.
Example: I would like to turn [0;0;1;0;0] into [0;1*;1;1*;0] where 1* are newly added ones.
Assuming A to be the input column vector -
%// Find all neighbouring indices with a window of [-1 1]
%// around the positions/indices of the existing ones
neigh_idx = bsxfun(#plus,find(A),[-1 1])
%// Select the valid indices and set them in A to be ones as well
A(neigh_idx(neigh_idx>=1 & neigh_idx<=numel(A))) = 1
Or use imdilate from Image Processing Toolbox with a vector kernel of ones of length 3 -
A = imdilate(A,[1;1;1])
You can do it convolving with [1 1 1], and setting to 1 all values greater than 0. This works for column or row vactors.
x = [0;0;1;0;0];
y = double(conv(x, [1 1 1],'same')>0)
Purely by logical indexing:
>> A = [0 1 1 0 0];
>> A([A(2:end) 0] == 1 | [0 A(1:end-1)] == 1) = 1;
>> disp(A);
A =
1 1 1 1 0
This probably merits an explanation. The fact that it's a 3 element local neighbourhood makes this easy. Essentially, take two portions of the input array:
Portion #1: A starting from the second element to the last element
Portion #2: A starting from the first element to the second-last element
We place the first portion into a new array and add 0 at the end of this array, and check to see which locations are equal to 1 in this new array. This essentially shifts the array A over to the left by 1. Whichever locations in this first portion are equal to 1, we set the corresponding locations in A to be 1. The same thing for the second portion where we are effectively shifting the array A over to the right by 1. To shift to the right by 1, we prepend a 0 at the beginning, then extract out the second portion of the array. Whichever locations in this second portion are equal to 1 are also set to 1.
At the end of this operation, you would essentially shift A to the left by 1 and save this as a separate array. Also, you would shift to the right by 1 and save this as another array. With these two, you simply overlap on top of the original to obtain the final result.
The benefit of this method over its predecessors in this post is that this doesn't require computations of any kind (bsxfun, conv, imdilate etc.) and purely relies on indexing into arrays and using logical operators1. This also handles boundary conditions and can work on either row or column vectors.
Some more examples with boundary cases
>> A = [0 0 1 1 0];
>> A([A(2:end) 0] == 1 | [0 A(1:end-1)] == 1) = 1
A =
0 1 1 1 1
>> A = [0 0 0 0 1];
>> A([A(2:end) 0] == 1 | [0 A(1:end-1)] == 1) = 1
A =
0 0 0 1 1
>> A = [1 0 1 0 1];
>> A([A(2:end) 0] == 1 | [0 A(1:end-1)] == 1) = 1
A =
1 1 1 1 1
1: This post is dedicated to Troy Haskin, one who believes that almost any question (including this one) can be answered by logical indexing.

How to generate random numbers from fixed set

I've got fixed numbers: -3, -1, 1, 3. How do I randomly generate a matrix like the following?
1 -1 -3 -1
3 -3 -3 3
3 3 1 -1
3 -3 3 -1
Use randi to create random index values into your vector of possible values:
x = [-3 -1 1 3]
y = randi(length(x),[5 5]);
y = x(y);
Although #nkjt's answer is probably the way to go, if you have the Statistics Toolbox you can simplify a little using randsample (with replacement):
result = NaN(3,6); %// define required size
result(:) = randsample([-3 -1 1 3], numel(result), true);
Or, if the original numbers are equally spaced as in your example, you can solve it in one line:
result = 2*randi(4,[3 6])-5; %// "2" and "5" as per your original values
You can use
randperm Random permutation

Two FOR statements coupled into one

Is is possible to put two for statements into one statement. Something like
A = [ 0 0 0 5
0 2 0 0
1 3 0 0
0 0 4 0];
a=size(A);
b=size(A);
ind=0;
c=0;
for ({i=1:a},{j=1:b})
end
Your question is very broad, but one thing to consider is that in MATLAB you can often take advantage of linear indexing (instead of subscripting), without actually having to reshape the array. For example,
>> A = [ 0 0 0 5
0 2 0 0
1 3 0 0
0 0 4 0];
>> A(3,2)
ans =
3
>> A(7) % A(3+(2-1)*size(A,1))
ans =
3
You can often use this to your advantage in a for loop over all the elements:
for ii=1:numel(A),
A(ii) = A(ii) + 1; % or something more useful
end
Is the same as:
for ii=1:size(A,2),
for jj=1:size(A,1),
A(jj,ii) = A(jj,ii) + 1;
end
end
But to address your specific goal in this problem, as you stated in the comments ("I am storing the non zero elements in another matrix; with elements like the index number, value, row number and column number."), of making sparse matrix representation, it comes to this:
>> [i,j,s] = find(A);
>> [m,n] = size(A);
>> S = sparse(i,j,s,m,n)
S =
(3,1) 1
(2,2) 2
(3,2) 3
(4,3) 4
(1,4) 5
But that's not really relevant to the broader question.
Actually you can combine multiple loops into one for, however it would require you to loop over a vector containing all elements rather than the individual elements.
Here is a way to do it:
iRange = 1:2;
jRange = 1:3;
[iL jL] = ndgrid(iRange,jRange);
ijRange = [iL(:) jL(:)]';
for ij = ijRange
i = ij(1); j = ij(2);
end
Note that looping over the variables may be simpler, but perhaps this method has some advantages as well.
No
read this http://www.mathworks.com/help/matlab/matlab_prog/loop-control-statements.html
i also don't see any added value even if it was possible
No I don't think you can put two for loops in one line.
Depends on your operation, you may be able to reshape it and use one for loop. If you are doing something as simple as just printing out all elements,
B = reshape(A,a*b,1);
for i=1:a*b
c = B(i);
...
end

MATLAB - Efficient methods for populating matrices using information in other (sparse) matrices?

Apologies for the awkward title, here is a more specific description of the problem. I have a large (e.g. 10^6 x 10^6) sparse symmetric matrix which defines bonds between nodes.
e.g. The matrix A = [0 1 0 0 0; 1 0 0 2 3; 0 0 0 4 0; 0 2 4 0 5; 0 3 0 5 0] would describe a 5-node system, such that nodes 1 and 2 are connected by bond number A(1,2) = 1, nodes 3 and 4 are connected by bond number A(3,4) = 4, etc.
I want to form two new matrices. The first, B, would list the nodes connected to each node (i.e. each row i of B has elements given by find(A(i,:)), and padded with zeros at the end if necessary) and the second, C, would list the bonds connected to that node (i.e. each row i of C has elements given by nonzeros(A(i,:)), again padded if necessary).
e.g. for the matrix A above, I would want to form B = [2 0 0; 1 4 5; 4 0 0; 2 3 5; 2 4 0] and C = [1 0 0; 1 2 3; 4 0 0; 2 4 5; 3 5 0]
The current code is:
B=zeros(length(A), max(sum(spones(A))))
C=zeros(length(A), max(sum(spones(A))))
for i=1:length(A)
B(i,1:length(find(A(i,:)))) = find(A(i,:));
C(i,1:length(nonzeros(A(i,:)))) = nonzeros(A(i,:));
end
which works, but is slow for large length(A). I have tried other formulations, but they all include for loops and don't give much improvement.
How do I do this without looping through the rows?
Hmm. Not sure how to vectorize (find returns linear indices when given a matrix, which is not what you want), but have you tried this:
B=zeros(length(A), 0);
C=zeros(length(A), 0);
for i=1:length(A)
Bi = find(A(i,:));
B(i,1:length(Bi)) = Bi;
Ci = nonzeros(A(i,:));
C(i,1:length(Ci)) = Ci;
end
I made two changes:
removed call to spones (seems unnecessary; the performance hit needed to expand the # of columns in B and C is probably minimal)
cached result of find() and nonzeros() so they're not called twice
I know it's hard to read, but that code is a vectorized version of your code:
[ i j k ] = find(A);
A2=(A~=0);
j2=nonzeros(cumsum(A2,2).*A2);
C2=accumarray([i,j2],k)
k2=nonzeros(bsxfun(#times,1:size(A,2),A2));
B2=accumarray([i,j2],k2);
Try it and tell me if it works for you.