Finding the rows of a matrix with specified elements - matlab

I want to find the rows of a matrix which contain specified element of another matrix.
For example, a=[1 2 3 4 5 6 7] and b=[1 2 0 4;0 9 10 11;3 1 2 12]. Now, I want to find the rows of b which contain at least three element of a. For this purpose, I used bsxfun command as following:
c=find(sum(any(bsxfun(#eq, b, reshape(a,1,1,[])), 2), 3)>=3);
It works good for low dimension matrices but when I want to use this for high dimension matrices, for example, when the number of rows of b is 192799, MATLAB gives following error:
Requested 192799x4x48854 (35.1GB) array exceeds maximum array size preference.
Creation of arrays greater than this limit may take a long time and cause MATLAB
to become unresponsive. See array size limit or preference panel for more information.
Is there any other command which does this task without producing the behaviour like above for high dimension matrices?

a possible solution:
a=[1 2 3 4 5 6 7]
b=[1 2 0 4;0 9 10 11;3 1 2 12]
i=ismember(b,a)
idx = sum(i,2)
idx = find(idx>=3)

Related

MatLAB help: shuffling a predefined vector without consecutively repeating numbers (with equal occurrences of all values)

I'm having troubles with randomly shuffling a vector without repeating numbers (ex. 1 1 is not acceptable but 1 2 is acceptable), given that each value is repeated equally.
More specifically, I would like to repeat the matrix [1:4] ten times (40 elements in total) so that 1, 2, 3 and 4 would all repeat 10 times without being consecutive.
If there is any clarification needed please let me know, I hope this question was clear.
This is what I have so far:
cond_order = repmat([1:4],10,1); %make matrix
cond_order = cond_order(:); %make sequence
I know randperm is quite relevant but I'm not sure how to use it with the one condition of non-repeating numbers.
EDIT: Thank you for all the responses.
I realize I was quite unclear. These are the examples I would like to reject [1 1 2 2 4 4 4...].
So it doesn't matter if [1 2 3 4] occurs in that order as long as individual values are not repeated. (so both [1 2 3 4 1 2 3 4...] and [4 3 1 2...] are acceptable)
Preferably I am looking for a shuffled vector meeting the criteria that
it is random
there are no consecutively repeating values (ex. 1 1 4 4)
all four values appear equal amount of times
Kind of working with the rejection sampling idea, just repeating with randperm until a sequence permutation is found that has no repeated values.
cond_order = repmat(1:4,10,1); %//make matrix
N = numel(cond_order); %//number of elements
sequence_found = false;
while ~sequence_found
candidate = cond_order(randperm(N));
if all(diff(candidate) ~= 0) %// check if no repeated values
sequence_found = true;
end
end
result = candidate;
The solution from mikkola got it methodically right, but I think there is a more efficient way:
He chose to sample based on equal quantities and check for the difference. I chose to do it the other way round and ended up with a solution requiering much less iterations.
n=4;
k=10;
d=42; %// random number to fail first check
while(~all(sum(bsxfun(#eq,d,(1:n).'),2)==k)) %' //Check all numbers to appear k times.
d=mod(cumsum([randi(n,1,1),randi(n-1,1,(n*k)-1)]),n)+1; %generate new random sample, enforcing a difference of at least 1.
end
A subtle but important distinction: does the author need an equal probability of picking any feasible sequence?
A number of people have mentioned answers of the form, "Let's use randperm and then rearrange the sequence so that it's feasible." That may not work. What will make this problem quite hard is if the author needs an equal chance of choosing any feasible sequence. Let me give an example to show the problem.
Imagine the set of numbers [1 2 2 3 4]. First lets enumerate the set of feasible sequences:
6 sequences beginning with 1: [1 2 3 2 4], [1 2 3 4 2], [1 2 4 2 3], [1 2 4 3 2], [1 3 2 4 2], [1 4 2 3 2].
Then there are 6 sequences beginning with [2 1]: [2 1 2 3 4], [2 1 2 4 3], [2 1 3 2 4], [2 1 3 4 2], [2 1 4 2 3], [2 1 4 3 2]. By symmetry, there are 18 sequences beginning with 2 (i.e. 6 of [2 1], 6 of [2 3], 6 of [2 4]).
By symmetry there are 6 sequences beginning with 3 and another 6 starting with 4.
Hence there are 6 * 3 + 18 = 36 possible sequences.
Sampling uniformly from feasible sequences, the probability the first number is 2 is 18/36 = 50 percent! BUT if you just went with a random permutation, the probability the first digit is 2 would be 40 percent! (i.e. 2/5 numbers in set are 2)
If equal probability of any feasible sequence is required, you want 50 percent of a 2 as the first number, but naive use of randperm and then rejiggering numbers at 2:end to make sequence feasible would give you a 40 percent probability of the first digit being two.
Note that rejection sampling would get the probabilities right as every feasible sequence would have an equal probability of being accepted. (Of course rejection sampling becomes very slow as probability of being accepted goes towards 0.)
Following some of the discussion on here, I think that there is a trade-off between performance and the theoretical requirements of the application.
If a completely uniform draw from the set of all valid permutations is required, then pure rejection sampling method will probably be required. The problem with this of course is that as the size of the problem is increased, the rejection rate will become very high. To demonstrate this, if we consider the base example in the question being n multiples of [1 2 3 4] then we can see the number of samples rejected for each valid draw as follows (note the log y axis):
My alternative method is to randomly sort the array, and then if duplicates are detected then the remaining elements will again be randomly sorted:
cond_order = repmat(1:4,10,1); %make matrix
cond_order = reshape(cond_order, numel(cond_order), 1);
cond_order = cond_order(randperm(numel(cond_order)));
i = 2;
while i < numel(cond_order)
if cond_order(i) ~= cond_order(i - 1)
i = i + 1;
else
tmp = cond_order(i:end);
cond_order(i:end) = tmp(randperm(numel(tmp)));
end
end
cond_order
Note that there is no guarantee that this will converge, but in the case where is becomes clear that it will not converge, we can just start again and it will still be better that re-computing the whole sequence.
This definitely meets the second two requirements of the question:
B) there are no consecutive values
C) all 4 values appear equal amount of times
The question is whether it meets the first 'Random' requirement.
If we take the simplest version of the problem, with the input of [1 2 3 4 1 2 3 4] then there are 864 valid permutations (empirically determined!). If we run both methods over 100,000 runs, then we would expect a Gaussian distribution around 115.7 draws per permutation.
As expected, the pure rejection sampling method gives this:
However, my algorithm does not:
There is clearly a bias towards certain samples.
In the end, it depends on the requirements. Both methods sample over the whole distribution so both fill the core requirements of the problem. I have not included performance comparisons, but for anything other than the simplest of cases, I am confident that my algorithm would be much faster. However, the distribution of the draws is not perfectly uniform. Whether it is good enough is dependent on the application and the size of the actual problem.

How to extend the rows of a matrix in MATLAB filling the added rows with the first row's values efficiently [duplicate]

This question already has answers here:
Building a matrix by merging the same row vector multiple times
(2 answers)
Closed 8 years ago.
I have a matrix myVel that is of size [1 501] meaning 1 row and 501 columns.
I want to extend this matrix so that the matrix will be of size [N 501], where N is an arbitrary number.
Each of the values in the columns need to be the same (meaning that all the values in the first column are all say, x and all of the values in the second column are say, y and so on).
This means that each row would consist of the same values.
How can I achieve this efficiently?
Divakar's solution is one way to do it, and the link he referenced shows some great ways to duplicate an array. That post, however, is asking to do it without the built-in function repmat, which is the easiest solution. Because there is no such restriction for you here, I will recommend this approach. Basically, you can use repmat to do this for you. You would keep the amount of columns the same, and you would duplicate for as many rows as you want. In other words:
myVelDup = repmat(myVel, N, 1);
Example:
myVel = [1 2 3 4 5 6];
N = 4;
myVelDup = repmat(myVel, N, 1);
Output:
>> myVel
myVel =
1 2 3 4 5 6
>> myVelDup
myVelDup =
1 2 3 4 5 6
1 2 3 4 5 6
1 2 3 4 5 6
1 2 3 4 5 6
In general, repmat is called in the following way:
out = repmat(in, M, N);
in would be a matrix or vector of values you want duplicated, and you would want to duplicate this M times horizontally (rows) and N times vertically (columns). As such, for your case, as you have an array, you will want to duplicate this N times vertically and so we set the first parameter to N. The second parameter, the columns stay the same so we specify this to be 1 as we don't want to have any duplications... and thus the call to repmat you see above.
For more information on repmat, check out this link: http://www.mathworks.com/help/matlab/ref/repmat.html

Accessing indexes as first columns of matrix in Matlab

I have data that is output from a computational chemistry program (Gaussian09) which contains sets of Force Constant data. The data is arranged with indexes as the first 2-4 columns (quadratic, cubic and quartic FC's are calculated). As an example the cubic FC's look something like this, and MatLab has read them in successfully so I have the correct matrix:
cube=[
1 1 1 5 5 5
1 1 2 6 6 6
.
.
4 1 1 8 8 8
4 2 1 9 9 9
4 3 1 7 7 7 ]
I need a way to access the last 3 columns when feeding in the indices of the first 3 columns. Something along the lines of
>>index=find([cube(:,1)==4 && cube(:,2)==3 && cube(:,3)==1]);
Which would give me the row number of the data that is index [ 4 3 1 ] and allow me to read out the values [7 7 7] which I need within loops to calculate anharmonic frequencies.
Is there a way to do this without a bunch of loops?
Thanks in advance,
Ben
You have already found one way to solve this, by using & in your expression (allowing you to make non-scalar comparisons).
Another way is to use ismember:
index = find(ismember(cube(:,1:3),[4 3 1]));
Note that in many cases, you may not even need the call to find: the binary vector returned by the comparisons or ismember can directly be used to index into another array.

How to find the unique rows in a matrix in matlab, where the order of numbers in row is NOT important?

I have a matrix of following form in matlab:
3 4
4 3
5 6
6 5
I would like to have the rows 1 and 2 to be considered a duplicate, where the elements of the two rows are the same but not in the same order. Similarly rows 3 and 4 should be considered the same. So, given the matrix above, I would like to have the following as the result:
3 4
5 6
I have tried the unique function but it cannot help me for this purpose.
My actual matrix is quite large, and I don't want to solve the problem with an exhaustive pairwise search, since it is extremely time consuming.
Is there an elegant way of achieving my goal?
This is one way of ding this:
X = [3 4
4 3
5 6
6 5];
X = sort(X, 2);
UniqueRows = unique(X, 'rows');
UniqueRows =
3 4
5 6

Scramble an nx1 matrix in matlab efficiently?

I need to randomly scramble the values of an nx1 matrix in matlab. I'm not sure how to do this efficiently, I need to do it many times for n > 40,000.
Example
Matrix before:
1 2 2 2 3 4 5 5 4 3 2 1
Scrambled:
3 5 2 1 2 2 3 4 1 4 5 2
thank you
If your data is stored in matrix data, then you can generate "scrambled" data using randperm like so:
scrambled = data(randperm(numel(data)));
This is sampling without replacement, so every value in data will appear once in scrambled.
For sampling with replacement (values in data may appear in scrambled multiple times and some may not appear at all), you could use randi like this:
scrambled = data(randi(numel(data),1,numel(data)));