I have a matrix:
S = [ -1.0400 4.9100 4.1000 -3.5450 -0.6600 -0.9300 4.3950 -1.0650 2.9850 -4.9800 0.2100;
-0.5200 -4.3150 -3.0950 0.5700 4.4700 1.1500 3.1350 0.6450 0.3750 -4.9150 -2.1150;
5.0000 5.0000 5.0000 5.0000 5.0000 5.0000 5.0000 5.0000 5.0000 5.0000 5.0000 ];
I want to convert the columns to unit vectors, so I use a for loop
for i=1:size(S,2)
S(:,i) = S(:,i) / norm( S(:,i) );
end
Is there a way to do this more efficiently in MATLAB?
TLDR
If you have MATLAB 2016b or newer, and no compatibility concerns, I would use
S = S ./ sqrt(sum(S.^2,1));
Edit: See benchmark at the bottom for performance benchmark of alternatives.
Context
We can just manually calculate the norm and divide column-wise.
By definition, norm(x) = sqrt( sum( x(:).^2 ) ). I've used (:) here to show that norm is calculated over the whole matrix. What's useful for us is that sum works column-wise by default, so the column-wise norm is defined like so:
nrm = sqrt( sum( x.^2 ) );
Note that if there's a possibility of your matrix S only having 1 row, you should ecplicitly enforce column-wise summation using nrm = sqrt(sum(x.^2,1)).
Now we have several options for division:
Implicit expansion (MATLAB R2016b or newer)
S = S ./ nrm;
Implicit expansion using bsxfun (all MATLAB versions)
S = bsxfun( #mrdivide, S, nrm );
Manual expansion using repmat (all MATLAB versions)
S = S ./ repmat(nrm, size(S,1), 1);
If you have MATLAB R2017b or newer, and again no compatability concerns, you can use vecnorm, which can be used in place of the manual norm calculation
S = S ./ vecnorm(S, 2, 1);
Benchmark:
Since you asked for performance, here is a simple benchmark for testing the speed of these different methods. Specifically the original loop in your question versus implicit expansion with either vecnorm or the manual calculation.
Results (run using R2017b)
size(S): 1e3*1e2 1e5*1e3 1e3*1e6
Looping: 0.0005 1.0186 12.7788
Implicit manual: 0.0001 1.1236 10.4031
Implicit vecnorm: 0.0002 0.5774 6.8058
Conclusions
For relatively small arrays, all of the methods are very fast and I would opt for code clarity over performance.
If you only want to use MATLAB versions which support it, vecnorm is approximately twice as quick as other methods for large matrices.
For matrices of the order 1e5*1e3, looping is comparable to implicit expansion.
Code
function benchie()
S = rand( 1e3, 1e2 )*5;
f1 = #() loopingNorm(S);
f2 = #() implicitManual(S);
f3 = #() implicitVecnorm(S);
fprintf( 'Looping: %.4f\nImplicit manual: %.4f\nImplicit vecnorm: %.4f\n', ...
timeit(f1), timeit(f2), timeit(f3) );
end
function S = loopingNorm(S)
for ii = 1:size(S,2)
S(:,ii) = S(:,ii) / norm( S(:,ii) );
end
end
function S = implicitManual(S)
S = S ./ sqrt(sum(S.^2,1));
end
function S = implicitVecnorm(S)
S = S ./ vecnorm( S, 2, 1 );
end
Related
I am using Matlab's backslash operator to solve a system of equations written as two matrices M1 and M2. These two matrices are square and tridiagonal, and so I have defined them as sparse. For example, with the dimensions of each being 5x5, they are defined as follows, with the values in each entry being dependent on some constant a:
N = 5;
a = 1e10;
M1 = spdiags([-a*ones(N,1)... % Sub diagonal
(1 + 2*a)*ones(N,1)... % Main Diagonal
-a*ones(N,1)],... % Super diagonal
-1:1,N,N);
M2 = spdiags([+a*ones(N,1)...
(1 - 2*a)*ones(N,1)...
+a*ones(N,1)],...
-1:1,N,N);
M_out = M1\M2;
So for example, M1 looks like the following in full form:
>> full(M1)
ans =
1.0e+10 *
2.0000 -1.0000 0 0 0
-1.0000 2.0000 -1.0000 0 0
0 -1.0000 2.0000 -1.0000 0
0 0 -1.0000 2.0000 -1.0000
0 0 0 -1.0000 2.0000
Now, if I examine the number of non-zero entries in the result M_out, then I can see they are all non-zero, which is fine:
>> nnz(M_out)
ans =
25
The problem is that I also need to do this for larger values of the constant a. However, if, for example, a=1e16 instead, then the off-diagonal entries of M_out are automatically set to zero, presumably because they have become too small:
>> nnz(M_out)
ans =
5
Is there a better way in Matlab of going about this problem of inverting sparse matrices? Or am I using the backslash operator in the wrong way?
If the size of your matrices doesn't grow too much, I recommend doing a full symbolic computation:
N = 5;
syms a
M1 = diag(-a*ones(N-1,1),-1) + diag((1 + 2*a)*ones(N,1),0) + diag(-a*ones(N-1,1),+1);
M2 = diag(+a*ones(N-1,1),-1) + diag((1 - 2*a)*ones(N,1),0) + diag(+a*ones(N-1,1),+1);
M_out = M1\M2;
M_num_1e10 = subs(M_out,a,1e10);
M_num_1e16 = subs(M_out,a,1e16);
vpa(M_num_1e10)
vpa(M_num_1e16)
In that case, you will need the Symbolic Math Toolbox. If you don't have it, I think you should considerer migrating to Python and work with SymPy.
EDIT:
Considering the way you defined your problem, you need extended precision for your computations. The double precision isn't enough. For example, in double precision (1e16+1) has to be rounded to (1e16), in other words (1e16+1)-(1e16) is equal to zero. So your problem starts in the main diagonal of your matrices. MATLAB only provides extended precision through its symbolic toolbox.
If you want to stick with double precision, you may extend the double precision yourself relying on the so called double-double arithmetic. I say that you will have to do it by yourself because I don't think there is a open source double-double library for MATLAB.
I am trying to take advantage of vectorization in MATLAB for this, but I might have to resort to for loops. I really don't want to do that! Time to learn algorithms.
Given this (11-by-3) array:
x = [...
4.9000 -0.1000 -5.1000
4.6000 -0.4000 -5.4000
3.0000 -2.0000 -7.0000
2.9000 -2.1000 -7.1000
2.9000 -2.1000 -7.1000
2.9000 -2.1000 -7.1000
2.8000 -2.2000 -7.2000
2.7000 -2.3000 -7.3000
2.7000 -2.3000 -7.3000
2.2000 -2.8000 -7.8000
1.8000 -3.2000 -8.2000
];
I want to find all of the 3^11 = 177147 possible sums of 11 elements in the array, where each of the 11 elements comes from a different row. I want to then store the sums that exceed a threshold value of 16.0, along with the 11 elements that make up each of those sums, in a (12-by-?) array.
Any ideas to get me started? Thanks for the help.
Here's how to do it in a vectorized way:
TR = 16;
sets = num2cell(single(x),2);
c = cell(1, numel(sets));
[c{:}] = ndgrid( sets{:} );
cartProd = cell2mat( cellfun(#(v)v(:), c, 'UniformOutput',false) );
validRows = cartProd(sum(cartProd,2) > TR,:); % output is [353x11]
Notice how I use single to save space and make the computation slightly faster.
The above solution is an adaptation of this answer.
Upon further contemplation, I think I've come up with a way that should be both faster and more memory efficient. We do this by indexing x, and then doing the previous process on the indices. Why is this better, you might ask? This is because we can store the indices as uint8, which consumes considerably less memory than double or even single. Also we get to keep the full double precision of x. Thus:
function validRows = q42933114(x,thresh)
%% Input handling
if nargin < 2
thresh = 16;
end
if nargin < 1
x = [...
4.9000 -0.1000 -5.1000
4.6000 -0.4000 -5.4000
3.0000 -2.0000 -7.0000
2.9000 -2.1000 -7.1000
2.9000 -2.1000 -7.1000
2.9000 -2.1000 -7.1000
2.8000 -2.2000 -7.2000
2.7000 -2.3000 -7.3000
2.7000 -2.3000 -7.3000
2.2000 -2.8000 -7.8000
1.8000 -3.2000 -8.2000
];
end
I = reshape(uint8(1:numel(x)),size(x));
sets = num2cell(I,2);
c = cell(1, numel(sets));
[c{:}] = ndgrid( sets{:} );
cartProd = cell2mat( cellfun(#(v)v(:), c, 'UniformOutput',false) );
validRows = x(cartProd(sum(x(cartProd),2) > thresh,:));
Memory consumption comparison:
Method 1 (old):
>> whos
Name Size Bytes Class Attributes
c 1x11 7795700 cell
cartProd 177147x11 7794468 single
sets 11x1 1364 cell
validRows 353x11 15532 single
Method 2 (new):
>> whos
Name Size Bytes Class Attributes
c 1x11 1949849 cell
cartProd 177147x11 1948617 uint8
sets 11x1 1265 cell
validRows 353x11 31064 double
We see that the memory consumption is indeed smaller (in about 4 times), as expected.
Runtime comparison:
Method 1 -- 0.0110
Method 2 -- 0.0186
Here we see that the 2nd method is actually a bit slower. When profiling this, we see that the cause is x(...) which is relatively expensive.
I did it this way. There is obviously room for improvement in the variable names.
Notice there are 353 matching rows, which agrees with the answer from #Dev-iL.
p = 11;
[a1,a2,a3,a4,a5,a6,a7,a8,a9,a10,a11] = ...
ndgrid(x(1,:),x(2,:),x(3,:),x(4,:),x(5,:),x(6,:),x(7,:),x(8,:),x(9,:),x(10,:),x(11,:));
a = a1+a2+a3+a4+a5+a6+a7+a8+a9+a10+a11;
y = spalloc(p+1,3^p,(p+1)*3^p);
for i = 1:3^p
if a(i) >= 16.1
y(:,i) = [a1(i),a2(i),a3(i),a4(i),a5(i),a6(i),a7(i),a8(i),a9(i),a10(i),a11(i),a(i)];
end
end
nnz(y(p+1,:)); % 353 rows matching the criteria
I don't think you'll have better luck than using a for loop. There could be a Matlab function for generating all 3^11 combinations, and use that as a sort of index, but you'd have a lot of memory consumption that way.
The code would be hard to read, as well.
However, recent versions of Matlab don't behave that badly wrt for-loops, because the they JIT the code. Before it was just interpreted, or JIT-ing was used for specific purposes. You wouldn't want to reimplement the matrix routines in Matlab because of this, but for simple code like this one it should perform well.
Given two vectors containing numerical values, say for example
a=1.:0.1:2.;
b=a+0.1;
I would like to select only the differing values. For this Matlab provides the function setdiff. In the above example it is obvious that setdiff(a,b) should return 1. and setdiff(b,a) gives 2.1. However, due to computational precision (see the questions here or here) the result differs. I get
>> setdiff(a,b)
ans =
1.0000 1.2000 1.4000 1.7000 1.9000
Matlab provides a function which returns a lower limit to this precision error, eps. This allows us to estimate a tolerance like tol = 100*eps;
My question now, is there an intelligent and efficient way to select only those values whose difference is below tol? Or said differently: How do I write my own version of setdiff, returning both values and indexes, which includes a tolerance limit?
I don't like the way it is answered in this question, since matlab already provides part of the required functionality.
Introduction and custom function
In a general case with floating point precision issues, one would be advised to use a tolerance value for comparisons against suspected zero values and that tolerance must be a very small value. A little robust method would use a tolerance that uses eps in it. Now, since MATLAB basically performs subtractions with setdiff, you can use eps directly here by comparing for lesser than or equal to it to find zeros.
This forms the basis of a modified setdiff for floating point numbers shown here -
function [C,IA] = setdiff_fp(A,B)
%//SETDIFF_FP Set difference for floating point numbers.
%// C = SETDIFF_FP(A,B) for vectors A and B, returns the values in A that
%// are not in B with no repetitions. C will be sorted.
%//
%// [C,IA] = SETDIFF_FP(A,B) also returns an index vector IA such that
%// C = A(IA). If there are repeated values in A that are not in B, then
%// the index of the first occurrence of each repeated value is returned.
%// Get 2D matrix of absolute difference between each element of A against
%// each element of B
abs_diff_mat = abs(bsxfun(#minus,A,B.')); %//'
%// Compare each element against eps to "negate" the floating point
%// precision issues. Thus, we have a binary array of true comparisons.
abs_diff_mat_epscmp = abs_diff_mat<=eps;
%// Find indices of A that are exclusive to it
A_ind = ~any(abs_diff_mat_epscmp,1);
%// Get unique(to account for no repetitions and being sorted) exclusive
%// A elements for the final output alongwith the indices
[C,IA] = intersect(A,unique(A(A_ind)));
return;
Example runs
Case1 (With integers)
This will verify that setdiff_fp works with integer arrays just the way setdiff does.
A = [2 5];
B = [9 8 8 1 2 1 1 5];
[C_setdiff,IA_setdiff] = setdiff(B,A)
[C_setdiff_fp,IA_setdiff_fp] = setdiff_fp(B,A)
Output
A =
2 5
B =
9 8 8 1 2 1 1 5
C_setdiff =
1 8 9
IA_setdiff =
4
2
1
C_setdiff_fp =
1 8 9
IA_setdiff_fp =
4
2
1
Case2 (With floating point numbers)
This is to show that setdiff_fp produces the correct results, while setdiff doesn't. Additionally, this will also test out the output indices.
A=1.:0.1:1.5
B=[A+0.1 5.5 5.5 2.6]
[C_setdiff,IA_setdiff] = setdiff(B,A)
[C_setdiff_fp,IA_setdiff_fp] = setdiff_fp(B,A)
Output
A =
1.0000 1.1000 1.2000 1.3000 1.4000 1.5000
B =
1.1000 1.2000 1.3000 1.4000 1.5000 1.6000 5.5000 5.5000 2.6000
C_setdiff =
1.2000 1.4000 1.6000 2.6000 5.5000
IA_setdiff =
2
4
6
9
7
C_setdiff_fp =
1.6000 2.6000 5.5000
IA_setdiff_fp =
6
9
7
For Tolerance of 1 epsilon This should work:
a=1.0:0.1:2.0;
b=a+0.1;
b=[b b-eps b+eps];
c=setdiff(a,b)
The idea is to expand b to include also its closest values.
I have a set of data and i want to use curve fit toolbox in matlab to plot a spline graph for the data. i have done this:
x =
Columns 1 through 10
0 1.2500 1.8800 2.5000 5.0000 6.2500 6.8800 7.1900 7.5000 10.0000
Columns 11 through 13
12.5000 15.0000 20.0000
y =
Columns 1 through 10
-85.9300 -78.8200 -56.9500 -34.5600 -33.5700 -39.6400 -41.9600 -49.2800 -66.6000 -66.6100
Columns 11 through 13
-59.1600 -48.7800 -41.5300
cftool
[breaks,coefs,l,k,d] = unmkpp(pp)
breaks =
Columns 1 through 10
0 1.2500 1.8800 2.5000 5.0000 6.2500 6.8800 7.1900 7.5000 10.0000
Columns 11 through 13
12.5000 15.0000 20.0000
coefs =
-4.8535 30.6309 -25.0170 -85.9300
-4.8535 12.4304 28.8095 -78.8200
-11.9651 3.2573 38.6927 -56.9500
3.0330 -18.9977 28.9337 -34.5600
-0.2294 3.7501 -9.1852 -33.5700
-11.6351 2.8899 -0.8852 -39.6400
-68.6157 -19.1004 -11.0978 -41.9600
130.6350 -82.9130 -42.7220 -49.2800
-6.3971 38.5776 -56.4659 -66.6000
1.6010 -9.4008 16.4760 -66.6100
-0.2967 2.6064 -0.5099 -59.1600
-0.2967 0.3814 6.9597 -48.7800
l =
12
k =
4
d =
1
Correct me if i am wrong, is the command [breaks,coefs,l,k,d] = unmkpp(pp) able to help me get piecewise equations from the spline graph i obtained? If so, can i know how do i understand the command, so i can use to my own advantage and the significance of the values in coefs, k, d. Thanks! Basically i want to be able to obtain an equation/equations to describe the spline graph i obtained through the curve fit toolbox. any help would be greatly appreciated!
This tries to explain how you can pick apart and display splines generated in Matlab.
Generate mock data
xx = [1:10];
yy = cos(xx);
Fit the data with a cubic spline
pp = spline(xx,yy);
Interpolate with the piecewise polynomial, evaluating it over a finer grid in x
xxf = linspace(min(xx),max(xx),100);
yyf=ppval(pp,xxf);
Start by inspecting pp, which contains all of the information about the piecewise polynomial:
pp =
form: 'pp'
breaks: [1 2 3 4 5 6 7 8 9 10]
coefs: [9x4 double]
pieces: 9
order: 4
dim: 1
The function
[breaks,coefs,l,k,d] = unmkpp(pp)
merely unwraps the contents of structure pp, such that:
d = pp.dim;
l = pp.pieces;
breaks = pp.breaks;
coefs = pp.coefs;
k = pp.order;
Therefore it isn't necessary to call unmkpp if pp is a structure containing all of the info (as above), and you just want the coefficients and the breaks. Instead you can just type
breaks = pp.breaks;
coefs = pp.coefs;
and continue working with this information, as shown below.
Note that for a cubic spline, the order of the polynomials is 4, since the polynomials have the form
C(1)*X^(K-1) + C(2)*X^(K-2) + ... + C(K-1)*X + C(K)
with K = 4, and therefore each polynomial has 4 coefficients C. The highest order term X^3 is consistent with the spline being cubic.
To evaluate the piecewise polynomials:
(1) choose the piece over which you want to evaluate
the polynomial, defined by breaks
(2) pick the correct coefficients for that piece, stored in coefs.
Because these are piecewise polynomials, we evaluate them over the
range 0-1 and then stretch and shift them according to the actual value
of x. We use the range 0-1 to evaluate the polynomial coefficients for the selected piece using the standard function polyval to evaluate a polynomial with known coefficients over a range of interest.
So we find the coefficients cf corresponding to the piece and evaluate the polynomial at points xev:
xev = linspace(0,1,100);
cf = pp.coefs(1,:);
yyp=polyval(cf,xev);
We keep some additional info for plotting:
br = pp.breaks(1:2); % find the breaks (beginning and end of stretch of interest)
xxp = linspace(br(1),br(2),100);
We can generalize this procedure. Thus for the nth piece (say #6):
n = 6;
cf = pp.coefs(n,:);
yyp2=polyval(cf,xev);
br = pp.breaks(n:n+1);
xxp2 = linspace(br(1),br(2),100);
Of course you can skip the above and just use ppval (a function dedicated to work with the spline family of functions), which will do the
same for you, say for the 3rd piece:
br = pp.breaks(3:4); % limits of the piece
xxp3 = linspace(br(1),br(2),100);
yyp3=ppval(pp,xxp3);
Finally we plot all of the polynomials evaluated above
plot(xx,yy,'.')
hold on
plot(xxf,ppval(pp,xxf),'k:')
plot(xxp,yyp,'g-','linewidth',2)
plot(xxp2,yyp2,'r-','linewidth',2) % <-- generated with polyval
plot(xxp3,yyp3,'c-','linewidth',2) % <-- generated with ppval
axis tight
I want to delete several specific values from a matrix (if they exist). It is highly probable that there are multiple copies of the values in the matrix.
For example, consider an N-by-2 matrix intersections. If the pairs of values [a b] and [c d] exist as rows in that matrix, I want to delete them.
Let's say I want to delete rows like [-2.0 0.5] and [7 7] in the following matrix:
intersections =
-4.0000 0.5000
-2.0000 0.5000
2.0000 3.0000
4.0000 0.5000
-2.0000 0.5000
So that after deletion I get:
intersections =
-4.0000 0.5000
2.0000 3.0000
4.0000 0.5000
What's the most efficient/elegant way to do this?
Try this one-liner (where A is your intersection matrix and B is the value to remove):
A = [-4.0 0.5;
-2.0 0.5;
2.0 3.0;
4.0 0.5;
-2.0 0.5];
B = [-2.0 0.5];
A = A(~all(A == repmat(B,size(A,1),1),2),:);
Then just repeat the last line for each new B you want to remove.
EDIT:
...and here's another option:
A = A((A(:,1) ~= B(1)) | (A(:,2) ~= B(2)),:);
WARNING: The answers here are best used for cases where small floating point errors are not expected (i.e. with integer values). As noted in this follow-up question, using the "==" and "~=" operators can cause unwanted results. In such cases, the above options should be modified to use relational operators instead of equality operators. For example, the second option I added would be changed to:
tolerance = 0.001; % Or whatever limit you want to set
A = A((abs(A(:,1)-B(1)) > tolerance) | (abs(A(:,2)-B(2)) > tolerance),:);
Just a quick head's up! =)
SOME RUDIMENTARY TIMING:
In case anyone was really interested in efficiency, I just did some simple timing for three different ways to get the subindex for the matrix (the two options I've listed above and Fanfan's STRMATCH option):
>> % Timing for option #1 indexing:
>> tic; for i=1:10000, index = ~all(A == repmat(B,size(A,1),1),2); end; toc;
Elapsed time is 0.262648 seconds.
>> % Timing for option #2 indexing:
>> tic; for i=1:10000, index = (A(:,1) ~= B(1)) | (A(:,2) ~= B(2)); end; toc;
Elapsed time is 0.100858 seconds.
>> % Timing for STRMATCH indexing:
>> tic; for i=1:10000, index = strmatch(B,A); end; toc;
Elapsed time is 0.192306 seconds.
As you can see, the STRMATCH option is faster than my first suggestion, but my second suggestion is the fastest of all three. Note however that my options and Fanfan's do slightly different things: my options return logical indices of the rows to keep, and Fanfan's returns linear indices of the rows to remove. That's why the STRMATCH option uses the form:
A(index,:) = [];
while mine use the form:
A = A(index,:);
However, my indices can be negated to use the first form (indexing rows to remove):
A(all(A == repmat(B,size(A,1),1),2),:) = []; % For option #1
A((A(:,1) == B(1)) & (A(:,2) == B(2)),:) = []; % For option #2
The simple solution here is to look to set membership functions, i.e., setdiff, union, and ismember.
A = [-4 0.5;
-2 0.5;
2 3;
4 0.5;
-2 0.5];
B = [-2 .5;7 7];
See what ismember does with the two arrays. Use the 'rows' option.
ismember(A,B,'rows')
ans =
0
1
0
0
1
Since we wish to delete rows of A that are also in B, just do this:
A(ismember(A,B,'rows'),:) = []
A =
-4 0.5
2 3
4 0.5
Beware that set membership functions look for an EXACT match. Integers or multiples of 1/2 such as are in A satisfy that requirement. They are exactly represented in floating point arithmetic in MATLAB.
Had these numbers been real floating point numbers, I'd have been more careful. There I'd have used a tolerance on the difference. In that case, I might have computed the interpoint distance matrix between the two sets of numbers, removing a row of A only if it fell within some given distance of one of the rows of B.
You can also abuse the strmatch function to suit your needs: the following code removes all occurences of a given row b in a matrix A
A(strmatch(b, A),:) = [];
If you need to delete more than one row, such as all rows from matrix B, iterate over them:
for b = B'
A(strmatch(b, A),:) = [];
end
Not sure when this function was introduced (using 2012b) but you can just do:
setdiff(A, B, 'rows')
ans =
-4.0000 0.5000
2.0000 3.0000
4.0000 0.5000
Based on:
A = [-4.0 0.5;
-2.0 0.5;
2.0 3.0;
4.0 0.5;
-2.0 0.5];
B = [-2.0 0.5];