Adding numbers based on 2 matrices according to a constraint matlab - matlab

I have 2 matrices; Matrix A and Matrix B.
Matrix A = [1 3 6 2 7;
2 1 5 3 4;
8 3 7 2 1]
Matrix B = [0 0 1 0 0;
0 0 0 0 1;
0 1 0 0 0]
and I want to check if the '1' in matrix B is placed in a place in matrix A where it is greater than or equal to 6 then leave it as it is. But if it is smaller than 6, then go to the place of the number that is less than this number in matrix A and put a '1' in this place in matrix B and add the 2 numbers and recheck if the sum is equal to or greater than 6 and so on.
As you can see in matrix B row 2 the 1 is put in the place of 4 in matrix A. Since the 4 is less than 6 then I will go to the second smaller number than the 4 which in this case is 3 and add 3 and 4 together. This will give us 7 which is greater than 6 so we will stop. So here for example the output matrix will be:
Matrix output = [0 0 1 0 0;
0 0 0 1 1;
0 1 0 1 1]
The steps:
Go to the number that is just smaller than it. In this case go to 3 as it is the one that is just smaller than the 4. I can explain more:
check the place of the 1 in Matrix B and see its value in Matrix A.
If the number in Matrix A is greater than 6, leave it as it is and leave the 1 in Matrix B as it is and go to another row.
If the number in Matrix A is smaller than 6, then what we want is that we want add this number to another number and make it equal to or greater than 6.
This number is the one that is just smaller than it. For example if the row has [2 5 6 1 3] and the 1 is placed in the place of the 5 and 5 is less than the constraint. So we have to go to the 3 as it is the one that is just smaller than the 5 and add them together.
After adding them put 1's in the places of both numbers and check the constraints again. If it satisfies the constraint leave them and go to another row. If not go the one that is just smaller than the number again and do the same.
Thank you so much.
This code is working when matrix B is empty and it puts the 1 in the place of the highest number and it checks the constraint. If it is less than the number it will go to the second highest number and add and recheck and so on.. But what I want now is to solve it with predefined 0s and 1s
B=zeros(size(A));
for k=1:size(A,1)
a=A(k,:)
[b,ia]=sort(a,'descend')
c=cumsum(b)
jj=find(c>=6,1)
idx=ia(1:jj)
B(k,idx)=1
end

This one took a while, but I think I got it in the end...
Doing most of the process without loop except the final stage, plugging in the index row by row to change B which can be done by a arrayfun. I think there might be a few redundant steps, but I think it is pretty fast.
C = A';
D = B' > 0 ;
E = repmat(max(C(D),1),[1 size(A,2)]);
F = A-E<=0;
G = A.*F;
[H ind] = sort (G,2,'descend');
I = (cumsum(H,2) >=6)*-1 +1;
Indent = ones(size(A,1),1);
J = [Indent I];
K = J(:,1:size(A,2)).*ind;
for t= 1:size(A,1)
B(t,K(t,K(t,:)~=0)) = 1;
end
>> B =
0 0 1 0 0
0 0 0 1 1
0 1 0 1 1

Related

Remove single elements from a vector

I have a vector M containing single elements and repeats. I want to delete all the single elements. Turning something like [1 1 2 3 4 5 4 4 5] to [1 1 4 5 4 4 5].
I thought I'd try to get the count of each element then use the index to delete what I don't need, something like this:
uniq = unique(M);
list = [uniq histc(M,uniq)];
Though I'm stuck here and not sure how to go forward. Can anyone help?
Here is a solution using unique, histcounts and ismember:
tmp=unique(M) ; %finding unique elements of M
%Now keeping only those elements in tmp which appear only once in M
tmp = tmp(histcounts(M,[tmp tmp(end)])==1); %Thanks to rahnema for his insight on this
[~,ind] = ismember(tmp,M); %finding the indexes of these elements in M
M(ind)=[];
histcounts was introduced in R2014b. For earlier versions, hist can be used by replacing that line with this:
tmp=tmp(hist(M,tmp)==1);
You can get the result with the following code:
A = [a.', ones(length(a),1)];
[C,~,ic] = unique(A(:,1));
result = [C, accumarray(ic,A(:,2))];
a = A(~ismember(A(:,1),result(result(:,2) == 1))).';
The idea is, add ones to the second column of a', then accumarray base on the first column (elements of a). After that, found the elements in first column which have accum sum in the second column. Therefore, these elements repeated once in a. Finally, removing them from the first column of A.
Here is a cheaper alternative:
[s ii] = sort(a);
x = [false s(2:end)==s(1:end-1)]
y = [x(2:end)|x(1:end-1) x(end)]
z(ii) = y;
result = a(z);
Assuming the input is
a =
1 1 8 8 3 1 4 5 4 6 4 5
we sort the list s and get index of the sorted list ii
s=
1 1 1 3 4 4 4 5 5 6 8 8
we can find index of repeated elements and for it we check if an element is equal to the previous element
x =
0 1 1 0 0 1 1 0 1 0 0 1
however in x the first elements of each block is omitted to find it we can apply [or] between each element with the previous element
y =
1 1 1 0 1 1 1 1 1 0 1 1
we now have sorted logical index of repeated elements. It should be reordered to its original order. For it we use index of sorted elements ii :
z =
1 1 1 1 0 1 1 1 1 0 1 1
finally use z to extract only the repeated elements.
result =
1 1 8 8 1 4 5 4 4 5
Here is a result of a test in Octave* for the following input:
a = randi([1 100000],1,10000000);
-------HIST--------
Elapsed time is 5.38654 seconds.
----ACCUMARRAY------
Elapsed time is 2.62602 seconds.
-------SORT--------
Elapsed time is 1.83391 seconds.
-------LOOP--------
Doesn't complete in 15 seconds.
*Since in Octave histcounts hasn't been implemented so instead of histcounts I used hist.
You can test it Online
X = [1 1 2 3 4 5 4 4 5];
Y = X;
A = unique(X);
for i = 1:length(A)
idx = find(X==A(i));
if length(idx) == 1
Y(idx) = NaN;
end
end
Y(isnan(Y)) = [];
Then, Y would be [1 1 4 5 4 4 5]. It detects all single elements, and makes them as NaN, and then remove all NaN elements from the vector.

How to count patterns columnwise in Matlab?

I have a matrix S in Matlab that looks like the following:
2 2 1 2
2 3 1 1
3 3 1 1
3 4 1 1
3 1 2 1
4 1 3 1
1 1 3 1
I would like to count patterns of values column-wise. I am interested into the frequency of the numbers that follow right after number 3 in any of the columns. For instance, number 3 occurs three times in the first column. The first time we observe it, it is followed by 3, the second time it is followed by 3 again and the third time it is followed by 4. Thus, the frequency for the patters observed in the first column would look like:
3-3: 66.66%
3-4: 33.33%
3-1: 0%
3-2: 0%
To generate the output, you could use the convenient tabulate
S = [
2 2 1 2
2 3 1 1
3 3 1 1
3 4 1 1
3 1 2 1
4 1 3 1
1 1 3 1];
idx = find(S(1:end-1,:)==3);
S2 = S(2:end,:);
tabulate(S2(idx))
Value Count Percent
1 0 0.00%
2 0 0.00%
3 4 66.67%
4 2 33.33%
Here's one approach, finding the 3's then looking at the following digits
[i,j]=find(S==3);
k=i+1<=size(S,1);
T=S(sub2ind(size(S),i(k)+1,j(k))) %// the elements of S that are just below a 3
R=arrayfun(#(x) sum(T==x)./sum(k),1:max(S(:))).' %// get the number of probability of each digit
I'm going to restate your problem statement in a way that I can understand and my solution will reflect this new problem statement.
For a particular column, locate the locations that contain the number 3.
Look at the row immediately below these locations and look at the values at these locations
Take these values and tally up the total number of occurrences found.
Repeat these for all of the columns and update the tally, then determine the percentage of occurrences for the values.
We can do this by the following:
A = [2 2 1 2
2 3 1 1
3 3 1 1
3 4 1 1
3 1 2 1
4 1 3 1
1 1 3 1]; %// Define your matrix
[row,col] = find(A(1:end-1,:) == 3);
vals = A(sub2ind(size(A), row+1, col));
h = 100*accumarray(vals, 1) / numel(vals)
h =
0
0
66.6667
33.3333
Let's go through the above code slowly. The first few lines define your example matrix A. Next, we take a look at all of the rows except for the last row of your matrix and see where the number 3 is located with find. We skip the last row because we want to be sure we are within the bounds of your matrix. If there is a number 3 located at the last row, we would have undefined behaviour if we tried to check the values below the last because there's nothing there!
Once we do this, we take a look at those values in the matrix that are 1 row beneath those that have the number 3. We use sub2ind to help us facilitate this. Next, we use these values and tally them up using accumarray then normalize them by the total sum of the tallying into percentages.
The result would be a 4 element array that displays the percentages encountered per number.
To double check, if we look at the matrix, we see that the value of 3 follows other values of 3 for a total of 4 times - first column, row 3, row 4, second column, row 2 and third column, row 6. The value of 4 follows the value of 3 two times: first column, row 6, second column, row 3.
In total, we have 6 numbers we counted, and so dividing by 6 gives us 4/6 or 66.67% for number 3 and 2/6 or 33.33% for number 4.
If I got the problem statement correctly, you could efficiently implement this with MATLAB's logical indexing and an approach that is essentially of two lines -
%// Input 2D matrix
S = [
2 2 1 2
2 3 1 1
3 3 1 1
3 4 1 1
3 1 2 1
4 1 3 1
1 1 3 1]
Labels = [1:4]'; %//'# Label array
counts = histc(S([false(1,size(S,2)) ; S(1:end-1,:) == 3]),Labels)
Percentages = 100*counts./sum(counts)
Verify/Present results
The styles for presenting the output results listed next use MATLAB's table for a well human-readable format of data.
Style #1
>> table(Labels,Percentages)
ans =
Labels Percentages
______ ___________
1 0
2 0
3 66.667
4 33.333
Style #2
You can do some fancy string operations to present the results in a more "representative" manner -
>> Labels_3 = strcat('3-',cellstr(num2str(Labels','%1d')'));
>> table(Labels_3,Percentages)
ans =
Labels_3 Percentages
________ ___________
'3-1' 0
'3-2' 0
'3-3' 66.667
'3-4' 33.333
Style #3
If you want to present them in descending sorted manner based on the percentages as listed in the expected output section of the question, you can do so with an additional step using sort -
>> [Percentages,idx] = sort(Percentages,'descend');
>> Labels_3 = strcat('3-',cellstr(num2str(Labels(idx)','%1d')'));
>> table(Labels_3,Percentages)
ans =
Labels_3 Percentages
________ ___________
'3-3' 66.667
'3-4' 33.333
'3-1' 0
'3-2' 0
Bonus Stuff: Finding frequency (counts) for all cases
Now, let's suppose you would like repeat this process for say 1, 2 and 4 as well, i.e. find occurrences after 1, 2 and 4 respectively. In that case, you can iterate the above steps for all cases and for the same you can use arrayfun -
%// Get counts
C = cell2mat(arrayfun(#(n) histc(S([false(1,size(S,2)) ; S(1:end-1,:) == n]),...
1:4),1:4,'Uni',0))
%// Get percentages
Percentages = 100*bsxfun(#rdivide, C, sum(C,1))
Giving us -
Percentages =
90.9091 20.0000 0 100.0000
9.0909 20.0000 0 0
0 60.0000 66.6667 0
0 0 33.3333 0
Thus, in Percentages, the first column are the counts of [1,2,3,4] that occur right after there is a 1 somewhere in the input matrix. As as an example, one can see column -3 of Percentages is what you had in the sample output when looking for elements right after 3 in the input matrix.
If you want to compute frequencies independently for each column:
S = [2 2 1 2
2 3 1 1
3 3 1 1
3 4 1 1
3 1 2 1
4 1 3 1
1 1 3 1]; %// data: matrix
N = 3; %// data: number
r = max(S(:));
[R, C] = size(S);
[ii, jj] = find(S(1:end-1,:)==N); %// step 1
count = full(sparse(S(ii+1+(jj-1)*R), jj, 1, r, C)); %// step 2
result = bsxfun(#rdivide, count, sum(S(1:end-1,:)==N)); %// step 3
This works as follows:
find is first applied to determine row and col indices of occurrences of N in S except its last row.
The values in the entries right below the indices of step 1 are accumulated for each column, in variable count. The very convenient sparse function is used for this purpose. Note that this uses linear indexing into S.
To obtain the frequencies for each column, count is divided (with bsxfun) by the number of occurrences of N in each column.
The result in this example is
result =
0 0 0 NaN
0 0 0 NaN
0.6667 0.5000 1.0000 NaN
0.3333 0.5000 0 NaN
Note that the last column correctly contains NaNs because the frequency of the sought patterns is undefined for that column.

Count the number of the first zero elements

I would line to find the number of the first consecutive zero elements. For example in [0 0 1 -5 3 0] we have two zero consecutive elements that appear first in the vector.
could you please suggest a way without using for loops?
V=[0 0 1 -5 3 0] ;
k=find(V);
Number_of_first_zeros=k(1)-1;
Or,
Number_of_first_zeros=find(V,1,'first')-1;
To solve #The minion comment (if that was the purpose):
Number_of_first_zeros=find(V(find(~V,1,'first'):end),1,'first')-find(~V,1,'first');
Use a logical array to find the zeros and then look at where the zeros and ones are alternating.
V=[1 2 0 0 0 3 5123];
diff(V==0)
ans =
0 1 0 0 -1 0
Create sample data
V=[1 2 0 0 0 3 5123];
Find the zeros. The result will be a logical array where 1 represents the location of the zeros
D=V==0
D =
0 0 1 1 1 0 0
Take the difference of that array. 1 would then represent the start and -1 would represent the end.
T= diff(D)
ans =
0 1 0 0 -1 0
find(T==1) would give you the start and find(T==-1) would give you the end. The first index+1 of T==1 would be the start of the first set of zeros and the first index of T==-1 would be the end of the first set of zeros.
You could find position the first nonzero element using find.
I=find(A, 1);
The number of leading zeros is then I-1.
My solution is quite complex yet it doesn't use the loops and it does the trick. I am pretty sure, that there is a more direct approach.
Just in case no one else posts a working solution here my idea.
x=[1 2 4 0 20 0 10 1 23 45];
x1=find(x==0);
if numel(x1)>1
x2=[x1(2:end), 0];
x3=x2-x1;
y=find(x3~=1);
y(1)
elseif numel(x1)==1
display(1)
else
display('No zero found')
end
x is the dataset. x1 contains the index of all zero elements. x2 contains all those indices except the first one (because matrix dimensions must agree, one zero is added. x3 is the difference between the index and the previous index of zeros in your dataset. Now I find all those differences which are not 1 (do not correspond to sequences of zeros) and the first index (of this data is the required result. The if case is needed in case you have only one or no zero at all.
I'm assuming your question is the following: for the following vector [0 0 1 -5 3 0], I would like to find the index of the first element of a pair of 0 values. Is this correct? Therefore, the desired output for your vector would be '1'?
To extend the other answers to find any such pairs, not just 0 0 (eg. 0 1, 0 2, 3 4 etc), then this might help.
% define the pattern
ptrn = [ 0 0 ];
difference = ptrn(2) - ptrn(1)
V = [0 0 1 -5 3 0 0 2 3 4 0 0 1 0 0 0]
x = diff(V) == difference
indices = find(x)
indices =
1 6 11 14 15

Count non-zero entries in each column of a matrix

If I have a matrix:
A = [1 2 3 4 5; 1 1 6 1 2; 0 0 9 0 1]
A =
1 2 3 4 5
1 1 6 1 2
0 0 9 0 1
How can I count the number of non-zero entries for each column? For example the desired output for this matrix would be:
2, 2, 3, 2, 3
I am not sure how to do this as size, length or numel do not appear to meet the requirements. Perhaps it would be best to remove zero entries first?
It's simply
> A ~= 0
ans =
1 1 1 1 1
1 1 1 1 1
0 0 1 0 1
> sum(A ~= 0, 1)
ans =
2 2 3 2 3
Here's another solution I can suggest that isn't very speed worthy for dense matrices but quite fast for sparse matrices (thanks #user1877862!). This also would mimic how one might do this in a compiled language, like C or Java, and perhaps for research purposes too. First find the row and column locations that are non zero, then do a histogram on just the column locations to count the frequency of how often you see a non-zero in each column. In other words:
[~,col] = find(A ~= 0);
counts = histc(col, 1:size(A,2));
find outputs the row and column locations of where a matrix satisfies some Boolean condition inside the argument of the function. We ignore the first output as we aren't concerned with the row locations.
The output we get is:
counts =
2
2
3
2
3

Creating a variable with unequal rows

I want to create a variable that finds a pattern (let's say [1 1]) in different rows of a matrix (A). Of course there aren't an equal number of occurrences of this string in each row.
A = [ 0 0 0 1 1
1 1 1 0 0
0 1 0 1 1
1 1 1 0 0
0 1 0 0 1
1 0 1 1 1
0 1 0 1 0
1 1 1 0 1];
I could do:
for i = 1:n
var(i,:) = strfind(A(i,:),[1 1]);
end
but then both sides of the equation won't be equal.
ERROR: ??? Subscripted assignment dimension mismatch.
I try to preallocate. I create a matrix with what I think would be the maximum number of occurrences of this string in each row of matrix A (let's say 50).
for i = 1:n
var(i, :) = NaN(1,50)
end
That's followed by the previous bit of code and it's no good either.
I've also tried:
for i = 1:n
var(i,1:numel(strfind(A(i,:),[1 1])) = strfind(A(i,:),[1 1])
end
Error: The expression to the left of the equals sign is not a valid
target for an assignment.
How should I go about doing this?
The output I expect is a matrix var(i,:) that gives me the position in the matrix where each of these patterns occur. It works fine for just one row.
For example:
var(1,:) = [1 2 5 8 10 22 48]
var(2,:) = [2 3 4 7 34 45 NaN]
var(3,:) = [4 5 21 32 33 NaN]
Thanks!
In your first try: you tried to build a matrix with different length of rows.
In your second try: you pre-allocated, but then run it over by re-definning var(i,:), while you tried to put there your desired result.
In your third try: unfortunately you just missed one brackets- ) at the end of left expression.
This code suppose to work (what you did at 2nd and 3rd attempts, with pre-allocate and fixed brackets):
var=NaN(1,50);
for i = 1:n
var(i,1:numel(strfind(A(i,:),[1 1]))) = strfind(A(i,:),[1 1])
end