3D matrix Indexing using 2D matrix - matlab

Could anyone shed some light on how this for loop can be replaced by a single command in MATLAB?
for i = 1 : size(w,3)
x=w(:,:,i);
w1(i,:)=x(B(i),:);
end
clear x
Here, w is a 3D (x by y by z) matrix and B (1 by z) is a vector containing rows pertaining to each layer in w. This for loop takes about 150 seconds to execute when w is 500000 layers deep. I tried using,
Q = w(B,:,:);
Q = reshape(Q(1,:),[500000,2])';
This creates a matrix Q of size 500000 X 2 X 500000 and MATLAB threw me an error saying memory out of bound. Any help would be appreciated!

You are creating intermediate variables (such as x) and using a for loop. The core idea of the following approach is to first pre-populate the indices used and then use linear indexing to access all the elements at once. Then, we can reshape to get the desired result.
ind = [B(1)*ones(size(w,2),1) (1:size(w,2)).' 1*ones(size(w,2),1)];
ind = [ind; [B(2)*ones(size(w,2),1) (1:size(w,2)).' 2*ones(size(w,2),1)]];
ind = [ind; [B(3)*ones(size(w,2),1) (1:size(w,2)).' 3*ones(size(w,2),1)]];
lin_ind = sub2ind(size(w), ind(:,1), ind(:,2), ind(:,3));
w1 = reshape(w(lin_ind),size(w,2),size(w,3)).'
On my system, this matches with w1 computed with the loop given in your question. Note that you may need to use a for loop to pre-populate the indices. I wrote three expressions since I was experimenting with small matrices. Actually, the first three lines can be written in such a way that you don't need loops at all and it still works with any size. I will leave that up to you.

Related

How to make vectors be of same length in my Matlab script?

I am trying to plot some data. The script I wrote below has worked fine before, but now I have no idea why it's not working.
Here is the code:
x = [335,41,14,18,15,9,7,9,20607,5,5,143,3,5,72,134,2,28,172,3,72,173,280,186,20924,1,1,22,3,3,1,2,13,1,3,2,11,66,12983,176,123,192,64,258,182,123,299,58,198,7,113,342,72,8376,122,20,19,2,3,28,8,36,8,56,43,2,48,127,395,4664,186,46,236,219,258,69,203,189,169,72,100,78,109,46,112,3929,272,40,4,31,2,97,36,5,35,56,2,237,1672,256,224,28,163,341,151,263,157,397,94,380,173,75,87,272,1194,133,6,112,1,6,2,26,25,64,8,40,57,106,525,150,248,125,269,264,256,357,153,64,152,283,1,2,2,454,154,39,1,1,64,151,242,1,18,99,1,36,607,55,54,110,225,108,37,1,144,162,137,107,21,360,362,18,51,25,43,1,3,6,1,27,7,45,326,32,103,50,124,155,39,180,143,33,116,46,7,151,120,19,4,2,4,110,2,7,4,9,4,27,216,323,148,1,1,2,1,47,113,150,1,2,144,16,4827,1,1,1,14];
size = length(x);
disp(size);
z = 0;
for i = 1:size
z = z + 1;
y(i) = z;
end
scatter(x,y);
This code should ensure that y is of same length as x as we are only filling in y as long as x is (since we are using a for loop from 1 through to size, where size is basically the number of indices in x), but I keep getting this error. I checked with disp and it turns out that my x and y vectors have different lengths, x is 227 and y is 256. Can anyone help me out with this trivial issue?
This is most likely because y was created to be a different size before you ran that piece of code you showed us. Somewhere in your script before you call this piece of code, y was created to be a 256 element vector and now you are reusing this variable in this part of the code by populating elements in the y vector. The variable x has 227 elements and the loop you wrote will change y's first 227 elements as you have looped for as many times as there are elements in x. However, the remaining 29 elements are still there from before. The reusing of the variable y is probably why your script is failing as now the sizes between both variables are not the same. As such, explicitly recreate y before calling scatter.
Actually, that for loop is not needed at all. The purpose of the loop is to create an increasing array from 1 up to as many elements as you have in x.
Just do this instead:
y = 1:size;
I also do not like that you are creating a variable called size. It is overshadowing the function size, which finds the number of elements in each dimension that the input array contains.
With the recommendations I've stated above, replace your entire code with this:
x = [335,41,14,18,15,9,7,9,20607,5,5,143,3,5,72,134,2,28,172,3,72,173,280,186,20924,1,1,22,3,3,1,2,13,1,3,2,11,66,12983,176,123,192,64,258,182,123,299,58,198,7,113,342,72,8376,122,20,19,2,3,28,8,36,8,56,43,2,48,127,395,4664,186,46,236,219,258,69,203,189,169,72,100,78,109,46,112,3929,272,40,4,31,2,97,36,5,35,56,2,237,1672,256,224,28,163,341,151,263,157,397,94,380,173,75,87,272,1194,133,6,112,1,6,2,26,25,64,8,40,57,106,525,150,248,125,269,264,256,357,153,64,152,283,1,2,2,454,154,39,1,1,64,151,242,1,18,99,1,36,607,55,54,110,225,108,37,1,144,162,137,107,21,360,362,18,51,25,43,1,3,6,1,27,7,45,326,32,103,50,124,155,39,180,143,33,116,46,7,151,120,19,4,2,4,110,2,7,4,9,4,27,216,323,148,1,1,2,1,47,113,150,1,2,144,16,4827,1,1,1,14];
numX = numel(x);
y = 1 : numX;
scatter(x,y);
The vector y is now explicitly created instead of reusing the variable that was created with a previous size in the past. It also uses the colon operator to explicitly create this sequence instead of using a for loop. That for loop is just not needed. numel determines the total number of elements for an input matrix. I don't like using length as a personal preference because it finds the number of elements in the largest dimension. This may work fine for vectors, but it has really made some hard to spot bugs in code that I've written in the past.

Maintaining original order and dimensions of a 3D matrix while using sort

I'm working with a fairly large 3D matrix (32x87x378), and I want to be able to extract every Nth element of a matrix, while keeping them in the same order. Similar to a previous question I asked: Matlab: Extracting Nth element of a matrix, while maintaining the original order of matrix
The method I was given was quite practical (and simple) and works well in most instances. For a random (1x20) matrix, where I wanted every 5th value, beginning with 4 and 5 (so that I am left with a 1x8 matrix (ab) of elements 4,5,9,10,14,15,19,20). It is done as follows:
r = rand(1,20);
n = 5;
ab = r(sort([4:n:numel(r) 5:n:numel(r)]))
My question is, how can this method be used for a 3D matrix r for it's 3rd dimension (or can it?), such as this:
r = rand(2,5,20);
It should be fairly simple, such as this:
n = 5;
ab = r(sort([4:n:numel(r) 5:n:numel(r)],3));
However, this will then give me a 1x80 matrix, as it does not preserve the original dimensions. Is there a way to correct this using the sort function? I'm also open to other suggestions, but I just want to be sure I am not missing anything.
Thanks in advance.
See if this is what you are after -
ab = r(:,:,sort([4:n:size(r,3) 5:n:size(r,3)]))

Matlab fast neighborhood operation

I have a Problem. I have a Matrix A with integer values between 0 and 5.
for example like:
x=randi(5,10,10)
Now I want to call a filter, size 3x3, which gives me the the most common value
I have tried 2 solutions:
fun = #(z) mode(z(:));
y1 = nlfilter(x,[3 3],fun);
which takes very long...
and
y2 = colfilt(x,[3 3],'sliding',#mode);
which also takes long.
I have some really big matrices and both solutions take a long time.
Is there any faster way?
+1 to #Floris for the excellent suggestion to use hist. It's very fast. You can do a bit better though. hist is based on histc, which can be used instead. histc is a compiled function, i.e., not written in Matlab, which is why the solution is much faster.
Here's a small function that attempts to generalize what #Floris did (also that solution returns a vector rather than the desired matrix) and achieve what you're doing with nlfilter and colfilt. It doesn't require that the input have particular dimensions and uses im2col to efficiently rearrange the data. In fact, the the first three lines and the call to im2col are virtually identical to what colfit does in your case.
function a=intmodefilt(a,nhood)
[ma,na] = size(a);
aa(ma+nhood(1)-1,na+nhood(2)-1) = 0;
aa(floor((nhood(1)-1)/2)+(1:ma),floor((nhood(2)-1)/2)+(1:na)) = a;
[~,a(:)] = max(histc(im2col(aa,nhood,'sliding'),min(a(:))-1:max(a(:))));
a = a-1;
Usage:
x = randi(5,10,10);
y3 = intmodefilt(x,[3 3]);
For large arrays, this is over 75 times faster than colfilt on my machine. Replacing hist with histc is responsible for a factor of two speedup. There is of course no input checking so the function assumes that a is all integers, etc.
Lastly, note that randi(IMAX,N,N) returns values in the range 1:IMAX, not 0:IMAX as you seem to state.
One suggestion would be to reshape your array so each 3x3 block becomes a column vector. If your initial array dimensions are divisible by 3, this is simple. If they don't, you need to work a little bit harder. And you need to repeat this nine times, starting at different offsets into the matrix - I will leave that as an exercise.
Here is some code that shows the basic idea (using only functions available in FreeMat - I don't have Matlab on my machine at home...):
N = 100;
A = randi(0,5*ones(3*N,3*N));
B = reshape(permute(reshape(A,[3 N 3 N]),[1 3 2 4]), [ 9 N*N]);
hh = hist(B, 0:5); % histogram of each 3x3 block: bin with largest value is the mode
[mm mi] = max(hh); % mi will contain bin with largest value
figure; hist(B(:),0:5); title 'histogram of B'; % flat, as expected
figure; hist(mi-1, 0:5); title 'histogram of mi' % not flat?...
Here are the plots:
The strange thing, when you run this code, is that the distribution of mi is not flat, but skewed towards smaller values. When you inspect the histograms, you will see that is because you will frequently have more than one bin with the "max" value in it. In that case, you get the first bin with the max number. This is obviously going to skew your results badly; something to think about. A much better filter might be a median filter - the one that has equal numbers of neighboring pixels above and below. That has a unique solution (while mode can have up to four values, for nine pixels - namely, four bins with two values each).
Something to think about.
Can't show you a mex example today (wrong computer); but there are ample good examples on the Mathworks website (and all over the web) that are quite easy to follow. See for example http://www.shawnlankton.com/2008/03/getting-started-with-mex-a-short-tutorial/

How to generate random matlab vector with these constraints

I'm having trouble creating a random vector V in Matlab subject to the following set of constraints: (given parameters N,D, L, and theta)
The vector V must be N units long
The elements must have an average of theta
No 2 successive elements may differ by more than +/-10
D == sum(L*cosd(V-theta))
I'm having the most problems with the last one. Any ideas?
Edit
Solutions in other languages or equation form are equally acceptable. Matlab is just a convenient prototyping tool for me, but the final algorithm will be in java.
Edit
From the comments and initial answers I want to add some clarifications and initial thoughts.
I am not seeking a 'truly random' solution from any standard distribution. I want a pseudo randomly generated sequence of values that satisfy the constraints given a parameter set.
The system I'm trying to approximate is a chain of N links of link length L where the end of the chain is D away from the other end in the direction of theta.
My initial insight here is that theta can be removed from consideration until the end, since (2) in essence adds theta to every element of a 0 mean vector V (shifting the mean to theta) and (4) simply removes that mean again. So, if you can find a solution for theta=0, the problem is solved for all theta.
As requested, here is a reasonable range of parameters (not hard constraints, but typical values):
5<N<200
3<D<150
L==1
0 < theta < 360
I would start by creating a "valid" vector. That should be possible - say calculate it for every entry to have the same value.
Once you got that vector I would apply some transformations to "shuffle" it. "Rejection sampling" is the keyword - if the shuffle would violate one of your rules you just don't do it.
As transformations I come up with:
switch two entries
modify the value of one entry and modify a second one to keep the 4th condition (Theoretically you could just shuffle two till the condition is fulfilled - but the chance that happens is quite low)
But maybe you can find some more.
Do this reasonable often and you get a "valid" random vector. Theoretically you should be able to get all valid vectors - practically you could try to construct several "start" vectors so it won't take that long.
Here's a way of doing it. It is clear that not all combinations of theta, N, L and D are valid. It is also clear that you're trying to simulate random objects that are quite complex. You will probably have a hard time showing anything useful with respect to these vectors.
The series you're trying to simulate seems similar to the Wiener process. So I started with that, you can start with anything that is random yet reasonable. I then use that as a starting point for an optimization that tries to satisfy 2,3 and 4. The closer your initial value to a valid vector (satisfying all your conditions) the better the convergence.
function series = generate_series(D, L, N,theta)
s(1) = theta;
for i=2:N,
s(i) = s(i-1) + randn(1,1);
end
f = #(x)objective(x,D,L,N,theta)
q = optimset('Display','iter','TolFun',1e-10,'MaxFunEvals',Inf,'MaxIter',Inf)
[sf,val] = fminunc(f,s,q);
val
series = sf;
function value= objective(s,D,L,N,theta)
a = abs(mean(s)-theta);
b = abs(D-sum(L*cos(s-theta)));
c = 0;
for i=2:N,
u =abs(s(i)-s(i-1)) ;
if u>10,
c = c + u;
end
end
value = a^2 + b^2+ c^2;
It seems like you're trying to simulate something very complex/strange (a path of a given curvature?), see questions by other commenters. Still you will have to use your domain knowledge to connect D and L with a reasonable mu and sigma for the Wiener to act as initialization.
So based on your new requirements, it seems like what you're actually looking for is an ordered list of random angles, with a maximum change in angle of 10 degrees (which I first convert to radians), such that the distance and direction from start to end and link length and number of links are specified?
Simulate an initial guess. It will not hold with the D and theta constraints (i.e. specified D and specified theta)
angles = zeros(N, 1)
for link = 2:N
angles (link) = theta(link - 1) + (rand() - 0.5)*(10*pi/180)
end
Use genetic algorithm (or another optimization) to adjust the angles based on the following cost function:
dx = sum(L*cos(angle));
dy = sum(L*sin(angle));
D = sqrt(dx^2 + dy^2);
theta = atan2(dy/dx);
the cost is now just the difference between the vector given by my D and theta above and the vector given by the specified D and theta (i.e. the inputs).
You will still have to enforce the max change of 10 degrees rule, perhaps that should just make the cost function enormous if it is violated? Perhaps there is a cleaner way to specify sequence constraints in optimization algorithms (I don't know how).
I feel like if you can find the right optimization with the right parameters this should be able to simulate your problem.
You don't give us a lot of detail to work with, so I'll assume the following:
random numbers are to be drawn from [-127+theta +127-theta]
all random numbers will be drawn from a uniform distribution
all random numbers will be of type int8
Then, for the first 3 requirements, you can use this:
N = 1e4;
theta = 40;
diffVal = 10;
g = #() randi([intmin('int8')+theta intmax('int8')-theta], 'int8') + theta;
V = [g(); zeros(N-1,1, 'int8')];
for ii = 2:N
V(ii) = g();
while abs(V(ii)-V(ii-1)) >= diffVal
V(ii) = g();
end
end
inline the anonymous function for more speed.
Now, the last requirement,
D == sum(L*cos(V-theta))
is a bit of a strange one...cos(V-theta) is a specific way to re-scale the data to the [-1 +1] interval, which the multiplication with L will then scale to [-L +L]. On first sight, you'd expect the sum to average out to 0.
However, the expected value of cos(x) when x is a random variable from a uniform distribution in [0 2*pi] is 2/pi (see here for example). Ignoring for the moment the fact that our limits are different from [0 2*pi], the expected value of sum(L*cos(V-theta)) would simply reduce to the constant value of 2*N*L/pi.
How you can force this to equal some other constant D is beyond me...can you perhaps elaborate on that a bit more?

MATLAB/General CS: Sampling Without Replacement From Multiple Sets (+Keeping Track of Unsampled Cases)

I currently implementing an optimization algorithm that requires me to sample without replacement from several sets. Although I am coding in MATLAB, this is essentially a CS question.
The situation is as follows:
I have a finite number of sets (A, B, C) each with a finite but possibly different number of elements (a1,a2...a8, b1,b2...b10, c1, c2...c25). I also have a vector of probabilities for each set which lists a probability for each element in that set (i.e. for set A, P_A = [p_a1 p_a2... p_a8] where sum(P_A) = 1). I normally use these to create a probability generating function for each set, which given a uniform number between 0 to 1, can spit out one of the elements from that set (i.e. a function P_A(u), which given u = 0.25, will select a2).
I am looking to sample without replacement from the sets A, B, and C. Each "full sample" is a sequence of elements from each of the different sets i.e. (a1, b3, c2). Note that the space of full samples is the set of all permutations of the elements in A, B, and C. In the example above, this space is (a1,a2...a8) x (b1,b2...b10) x (c1, c2...c25) and there are 8*10*25 = 2000 unique "full samples" in my space.
The annoying part of sampling without replacement with this setup is that if my first sample is (a1, b3, c2) then that does not mean I cannot sample the element a1 again - it just means that I cannot sample the full sequence (a1, b3, c2) again. Another annoying part is that the algorithm I am working with requires me do a function evaluation for all permutations of elements that I have not sampled.
The best method at my disposal right now is to keep track the sampled cases. This is a little inefficient since my sampler is forced to reject any case that has been sampled before (since I'm sampling without replacement). I then do the function evaluations for the unsampled cases, by going through each permutation (ax, by, cz) using nested for loops and only doing the function evaluation if that combination of (ax, by, cz) is not included in the sampled cases. Again, this is a little inefficient since I have to "check" whether each permutation (ax, by, cz) has already been sampled.
I would appreciate any advice in regards to this problem. In particular, I am looking a method to sample without replacement and keep track of unsampled cases that does not explicity list out the full sample space (I usually work with 10 sets with 10 elements each so listing out the full sample space would require a 10^10 x 10 matrix). I realize that this may be impossible, though finding efficient way to do it will allow me to demonstrate the true limits of the algorithm.
Do you really need to keep track of all of the unsampled cases? Even if you had a 1-by-1010 vector that stored a logical value of true or false indicating if that permutation had been sampled or not, that would still require about 10 GB of storage, and MATLAB is likely to either throw an "Out of Memory" error or bring your entire machine to a screeching halt if you try to create a variable of that size.
An alternative to consider is storing a sparse vector of indicators for the permutations you've already sampled. Let's consider your smaller example:
A = 1:8;
B = 1:10;
C = 1:25;
nA = numel(A);
nB = numel(B);
nC = numel(C);
beenSampled = sparse(1,nA*nB*nC);
The 1-by-2000 sparse matrix beenSampled is empty to start (i.e. it contains all zeroes) and we will add a one at a given index for each sampled permutation. We can get a new sample permutation using the function RANDI to give us indices into A, B, and C for the new set of values:
indexA = randi(nA);
indexB = randi(nB);
indexC = randi(nC);
We can then convert these three indices into a single unique linear index into beenSampled using the function SUB2IND:
index = sub2ind([nA nB nC],indexA,indexB,indexC);
Now we can test the indexed element in beenSampled to see if it has a value of 1 (i.e. we sampled it already) or 0 (i.e. it is a new sample). If it has been sampled already, we repeat the process of finding a new set of indices above. Once we have a permutation we haven't sampled yet, we can process it:
while beenSampled(index)
indexA = randi(nA);
indexB = randi(nB);
indexC = randi(nC);
index = sub2ind([nA nB nC],indexA,indexB,indexC);
end
beenSampled(index) = 1;
newSample = [A(indexA) B(indexB) C(indexC)];
%# ...do your subsequent processing...
The use of a sparse array will save you a lot of space if you're only going to end up sampling a small portion of all of the possible permutations. For smaller total numbers of permutations, like in the above example, I would probably just use a logical vector instead of a sparse vector.
Check the matlab documentation for the randi function; you'll just want to use that in conjunction with the length function to choose random entries from each vector. Keeping track of each sampled vector should be as simple as just concatenating it to a matrix;
current_values = [5 89 45]; % lets say this is your current sample set
used_values = [used_values; current_values];
% wash, rinse, repeat