How can I vectorize the entropy calculation?

How can I vectorize the entropy calculation? - matlab

I was trying to the entropy for every column, the matrix looks like this:
0.5 0.3333 0.2
0 0.3333 0.4
0.5 0.3333 0.4
Every column add up to one, however, there's some zeros in the matrix, so if I just log2(arr(i,:)), there will be an -Inf in the result so the whole thing won't work
In practice I have a huge matrix, so I want the program to run fast, is there a work around?
Here's my solution, does it works as fast as p .* log2(p)?
log2p = log2(p);
log2p(log2p==-Inf)=0;
entropy = entropy - p .* log2p;

In MATLAB 0^0 is equal to 1. And since log2(1)==0, You can use this and rewrite your entropy function as
p.*log2(p) = log2(p.^p)
Then for your example we get
>> log2(p.^p)
ans =
-0.5000 -0.5283 -0.4644
0 -0.5283 -0.5288
-0.5000 -0.5283 -0.5288

Use isinf
log2p = log2(p);
log2p( isinf(log2p) ) = 0;
entrpoy = -sum( p.*log2p , 1 )

You could use isnan in combination with the fact that 0*-inf==NaN:
E = p.*log2(p);
valid = ~isnan(E);
entropy(valid) = entropy(valid) - E(valid);
clear E valid
As found here, this should work without warnings if you have MATLAB newer than R2007a.

Use eps:
eps is minimum representable number in float so you will have your results without changing much(almost infinitesmall change).
log2(p)
ans =
-1.0000 -1.5851 -2.3219
-Inf -1.5851 -1.3219
-1.0000 -1.5851 -1.3219
log2(p+eps)
ans =
-1.0000 -1.5851 -2.3219
-52.0000 -1.5851 -1.3219
-1.0000 -1.5851 -1.3219
p2=p+eps;
entropy=-sum(p2.*log2(p2),1)
entropy =
1.0000 1.5849 1.5219

Related

(-8)^(-2/3) returns wrong result

I'm using Matlab R2020b (Mac OS 12.0.1).
When I enter (-8)^(-2/3), it returns:
ans =
-0.1250 - 0.2165i
Shouldn't it be 0.2500, instead?

Raising negative numbers to fractional powers is a complex multi-valued operation. MATLAB is simply picking one of the solutions for you. E.g., for the solutions to (-8)^(1/3). start with the roots to the following polynomial equation:
x^3+8=0
Using MATLAB for this:
>> roots([1 0 0 8])
ans =
-2.0000 + 0.0000i
1.0000 + 1.7321i
1.0000 - 1.7321i
Then raising this result to the -2 power yields:
>> ans.^-2
ans =
0.2500 + 0.0000i
-0.1250 - 0.2165i
-0.1250 + 0.2165i
MATLAB happened to give you the second solution above for the (-8)^(-2/3) calculation.
BOTTOM LINE: Whenever you are dealing with complex multi-valued operations, if you want specific results you will need to account for that in your code, because MATLAB might pick something else.

Depends on the order of calculation.
(-8)^(-2/3) means divide -2 by 3 and then raise -8 to that power.
But if you do ((-8)^(-2))^(1/3) or nthroot((-8)^(-2),3) instead, you'll get 0.25.

Using Matlab's backslash operator to invert sparse matrices is leading to some entries being rounded down to zero

I am using Matlab's backslash operator to solve a system of equations written as two matrices M1 and M2. These two matrices are square and tridiagonal, and so I have defined them as sparse. For example, with the dimensions of each being 5x5, they are defined as follows, with the values in each entry being dependent on some constant a:
N = 5;
a = 1e10;
M1 = spdiags([-a*ones(N,1)... % Sub diagonal
(1 + 2*a)*ones(N,1)... % Main Diagonal
-a*ones(N,1)],... % Super diagonal
-1:1,N,N);
M2 = spdiags([+a*ones(N,1)...
(1 - 2*a)*ones(N,1)...
+a*ones(N,1)],...
-1:1,N,N);
M_out = M1\M2;
So for example, M1 looks like the following in full form:
>> full(M1)
ans =
1.0e+10 *
2.0000 -1.0000 0 0 0
-1.0000 2.0000 -1.0000 0 0
0 -1.0000 2.0000 -1.0000 0
0 0 -1.0000 2.0000 -1.0000
0 0 0 -1.0000 2.0000
Now, if I examine the number of non-zero entries in the result M_out, then I can see they are all non-zero, which is fine:
>> nnz(M_out)
ans =
25
The problem is that I also need to do this for larger values of the constant a. However, if, for example, a=1e16 instead, then the off-diagonal entries of M_out are automatically set to zero, presumably because they have become too small:
>> nnz(M_out)
ans =
5
Is there a better way in Matlab of going about this problem of inverting sparse matrices? Or am I using the backslash operator in the wrong way?

If the size of your matrices doesn't grow too much, I recommend doing a full symbolic computation:
N = 5;
syms a
M1 = diag(-a*ones(N-1,1),-1) + diag((1 + 2*a)*ones(N,1),0) + diag(-a*ones(N-1,1),+1);
M2 = diag(+a*ones(N-1,1),-1) + diag((1 - 2*a)*ones(N,1),0) + diag(+a*ones(N-1,1),+1);
M_out = M1\M2;
M_num_1e10 = subs(M_out,a,1e10);
M_num_1e16 = subs(M_out,a,1e16);
vpa(M_num_1e10)
vpa(M_num_1e16)
In that case, you will need the Symbolic Math Toolbox. If you don't have it, I think you should considerer migrating to Python and work with SymPy.
EDIT:
Considering the way you defined your problem, you need extended precision for your computations. The double precision isn't enough. For example, in double precision (1e16+1) has to be rounded to (1e16), in other words (1e16+1)-(1e16) is equal to zero. So your problem starts in the main diagonal of your matrices. MATLAB only provides extended precision through its symbolic toolbox.
If you want to stick with double precision, you may extend the double precision yourself relying on the so called double-double arithmetic. I say that you will have to do it by yourself because I don't think there is a open source double-double library for MATLAB.

Write function for bsxfun with if/else

In my code, I need to divide each values of a matrix by the values of another. I could use A./B but some elements in B are 0. I know that if B(i,j) = 0 so A(i,j) = 0 too and I want to have 0/0 = 0. So I wrote a function div and I use bsxfun but I don't have 0, I have NaN :
A = [1,0;1,1];
B = [1,0;1,2];
function n = div(a,b)
if(b==0)
n = 0;
else
n = a./b;
end
end
C = bsxfun(#div,A,B);

Why not just replace the unwanted values after?
C=A./B;
C(A==0 & B==0)=0;
You could do C(isnan(C))=0;, but this will replace all NaN, even the ones not created by 0/0. If zeros always happen together then just C(B==0)=0; will do

If you know your non-zero values in B are never smaller than a very small number eps (for example 1e-300), a simple trick is to add eps to B. All non-zero values are unchanged, while all zero values become eps. When dividing 0/eps you get the wished result.

The reason this is happening is because bsxfun doesn't process the arrays element-wise. Consequently, your function doesn't get two scalars in. It is actually called only once. Your if statement does not work for non-scalar values of b.
Replacing bsxfun with arrayfun will call your function with scalar inputs, and will yield the expected result:
>> C = arrayfun(#div,A,B)
C =
1.0000 0
1.0000 0.5000
Nonetheless, either of the other two answers will be more efficient:
>> C = A./B;
>> C(B==0) = 0 % Ander's answer
C =
1.0000 0
1.0000 0.5000
or
C = A./(B+eps) % user10259794's answer
C =
1.0000 0
1.0000 0.5000

Matlab's way of getting p-values for correlation

I have a vector A of size N and I want to calculate a correlation coefficient and p-value for the correlation of A with some other vector B.
I used corrcoef in Matlab, something like this:
[R, P] = corrcoef(A, B)
And from what I understand, doing a t-test for this correlation R(1,2) to get a p-value equal to P(1,2) would mean calculating a test statistic
t = sqrt(N-2)*R./sqrt(1-R.^2)
and getting the p-value by
P = 1 - tcdf(t, N-2).
However, if I proceed in this way, the p-value that I get is not the same as the p-value Matlab calculated. Could someone explain why, or what am I missing in the calculation?
Thanks!
EDIT: Even if I do a two-sided test (P = 2*(1-tcdf(abs(t), N-2))), there's still a lot of differences in mine and Matlab's result.

Think you may have the formula computed incorrectly for your t-stat. Looking at a basic stats page, we see that the formula for the t-stat is shown as below.
It looks like you're doing a element-wise operation when one is not necessary.
Doing a test in matlab to prove this.
>> a=rand(14,1)
a =
0.6110
0.7788
0.4235
0.0908
0.2665
0.1537
0.2810
0.4401
0.5271
0.4574
0.8754
0.5181
0.9436
0.6377
>> b=rand(14,1)
b =
0.0358
0.1759
0.7218
0.4735
0.1527
0.3411
0.6074
0.1917
0.7384
0.2428
0.9174
0.2691
0.7655
0.1887
I first create two random vectors for a and b.
>> [R,p]=corrcoef(a,b)
R =
1.0000 0.2428
0.2428 1.0000
p =
1.0000 0.4030
0.4030 1.0000
R(1,2) is our rho in this case and my formula is computed exactly as above.
t=R(1,2)*sqrt((length(a)-2)/(1-R(1,2)^2))
t =
0.8670
>> p=2*(1-tcdf(t,length(a)-2))
p =
0.4030
You can see that the correlation coefficient does a 2 sided test.

I check the relevant source codes of matlab and octave for p-value. The source code of octave is more clear.
Changing
P = 2*(1-tcdf(abs(t), N-2))
to
s = tcdf(t,N-2);
P = 2 * min(s,1-s);
does the trick. Then you get same p results as corrcoef.

What's the most efficient/elegant way to delete elements from a matrix in MATLAB?

I want to delete several specific values from a matrix (if they exist). It is highly probable that there are multiple copies of the values in the matrix.
For example, consider an N-by-2 matrix intersections. If the pairs of values [a b] and [c d] exist as rows in that matrix, I want to delete them.
Let's say I want to delete rows like [-2.0 0.5] and [7 7] in the following matrix:
intersections =
-4.0000 0.5000
-2.0000 0.5000
2.0000 3.0000
4.0000 0.5000
-2.0000 0.5000
So that after deletion I get:
intersections =
-4.0000 0.5000
2.0000 3.0000
4.0000 0.5000
What's the most efficient/elegant way to do this?

Try this one-liner (where A is your intersection matrix and B is the value to remove):
A = [-4.0 0.5;
-2.0 0.5;
2.0 3.0;
4.0 0.5;
-2.0 0.5];
B = [-2.0 0.5];
A = A(~all(A == repmat(B,size(A,1),1),2),:);
Then just repeat the last line for each new B you want to remove.
EDIT:
...and here's another option:
A = A((A(:,1) ~= B(1)) | (A(:,2) ~= B(2)),:);
WARNING: The answers here are best used for cases where small floating point errors are not expected (i.e. with integer values). As noted in this follow-up question, using the "==" and "~=" operators can cause unwanted results. In such cases, the above options should be modified to use relational operators instead of equality operators. For example, the second option I added would be changed to:
tolerance = 0.001; % Or whatever limit you want to set
A = A((abs(A(:,1)-B(1)) > tolerance) | (abs(A(:,2)-B(2)) > tolerance),:);
Just a quick head's up! =)
SOME RUDIMENTARY TIMING:
In case anyone was really interested in efficiency, I just did some simple timing for three different ways to get the subindex for the matrix (the two options I've listed above and Fanfan's STRMATCH option):
>> % Timing for option #1 indexing:
>> tic; for i=1:10000, index = ~all(A == repmat(B,size(A,1),1),2); end; toc;
Elapsed time is 0.262648 seconds.
>> % Timing for option #2 indexing:
>> tic; for i=1:10000, index = (A(:,1) ~= B(1)) | (A(:,2) ~= B(2)); end; toc;
Elapsed time is 0.100858 seconds.
>> % Timing for STRMATCH indexing:
>> tic; for i=1:10000, index = strmatch(B,A); end; toc;
Elapsed time is 0.192306 seconds.
As you can see, the STRMATCH option is faster than my first suggestion, but my second suggestion is the fastest of all three. Note however that my options and Fanfan's do slightly different things: my options return logical indices of the rows to keep, and Fanfan's returns linear indices of the rows to remove. That's why the STRMATCH option uses the form:
A(index,:) = [];
while mine use the form:
A = A(index,:);
However, my indices can be negated to use the first form (indexing rows to remove):
A(all(A == repmat(B,size(A,1),1),2),:) = []; % For option #1
A((A(:,1) == B(1)) & (A(:,2) == B(2)),:) = []; % For option #2

The simple solution here is to look to set membership functions, i.e., setdiff, union, and ismember.
A = [-4 0.5;
-2 0.5;
2 3;
4 0.5;
-2 0.5];
B = [-2 .5;7 7];
See what ismember does with the two arrays. Use the 'rows' option.
ismember(A,B,'rows')
ans =
0
1
0
0
1
Since we wish to delete rows of A that are also in B, just do this:
A(ismember(A,B,'rows'),:) = []
A =
-4 0.5
2 3
4 0.5
Beware that set membership functions look for an EXACT match. Integers or multiples of 1/2 such as are in A satisfy that requirement. They are exactly represented in floating point arithmetic in MATLAB.
Had these numbers been real floating point numbers, I'd have been more careful. There I'd have used a tolerance on the difference. In that case, I might have computed the interpoint distance matrix between the two sets of numbers, removing a row of A only if it fell within some given distance of one of the rows of B.

You can also abuse the strmatch function to suit your needs: the following code removes all occurences of a given row b in a matrix A
A(strmatch(b, A),:) = [];
If you need to delete more than one row, such as all rows from matrix B, iterate over them:
for b = B'
A(strmatch(b, A),:) = [];
end

Not sure when this function was introduced (using 2012b) but you can just do:
setdiff(A, B, 'rows')
ans =
-4.0000 0.5000
2.0000 3.0000
4.0000 0.5000
Based on:
A = [-4.0 0.5;
-2.0 0.5;
2.0 3.0;
4.0 0.5;
-2.0 0.5];
B = [-2.0 0.5];

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

How can I vectorize the entropy calculation? - matlab

In MATLAB 0^0 is equal to 1. And since log2(1)==0, You can use this and rewrite your entropy function as p.*log2(p) = log2(p.^p) Then for your example we get >> log2(p.^p) ans = -0.5000 -0.5283 -0.4644 0 -0.5283 -0.5288 -0.5000 -0.5283 -0.5288

Use isinf log2p = log2(p); log2p( isinf(log2p) ) = 0; entrpoy = -sum( p.*log2p , 1 )

You could use isnan in combination with the fact that 0-inf==NaN: E = p.log2(p); valid = ~isnan(E); entropy(valid) = entropy(valid) - E(valid); clear E valid As found here, this should work without warnings if you have MATLAB newer than R2007a.

Related

(-8)^(-2/3) returns wrong result

Using Matlab's backslash operator to invert sparse matrices is leading to some entries being rounded down to zero

Write function for bsxfun with if/else

Matlab's way of getting p-values for correlation

What's the most efficient/elegant way to delete elements from a matrix in MATLAB?

Categories

Resources

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

How can I vectorize the entropy calculation? - matlab

In MATLAB 0^0 is equal to 1. And since log2(1)==0, You can use this and rewrite your entropy function as p.*log2(p) = log2(p.^p) Then for your example we get >> log2(p.^p) ans = -0.5000 -0.5283 -0.4644 0 -0.5283 -0.5288 -0.5000 -0.5283 -0.5288

Use isinf log2p = log2(p); log2p( isinf(log2p) ) = 0; entrpoy = -sum( p.*log2p , 1 )

You could use isnan in combination with the fact that 0*-inf==NaN: E = p.*log2(p); valid = ~isnan(E); entropy(valid) = entropy(valid) - E(valid); clear E valid As found here, this should work without warnings if you have MATLAB newer than R2007a.

Related

(-8)^(-2/3) returns wrong result

Using Matlab's backslash operator to invert sparse matrices is leading to some entries being rounded down to zero

Write function for bsxfun with if/else

Matlab's way of getting p-values for correlation

What's the most efficient/elegant way to delete elements from a matrix in MATLAB?

Categories

Resources

You could use isnan in combination with the fact that 0-inf==NaN: E = p.log2(p); valid = ~isnan(E); entropy(valid) = entropy(valid) - E(valid); clear E valid As found here, this should work without warnings if you have MATLAB newer than R2007a.