Extract sub-matrix based on conditions of specific columns in matlab - matlab

I want to Select a sub-matrix x7992 based on conditions of Certain columns of original data matrix. Specifically, the original matrix is 23166-by-9, follow a original Gauss code
x7992 =selif(data,data[.,col_coh].==0 .and data[.,col_year].<=1992);
I rewrite this in matlab with
x7992 = data(data(:,col_coh)==0 & data(:,col_year)<=1992);
col_coh,col_year are predefined column number.
However, rather than give me a sub-matrix, the above line of code only give me a single row (23166-by-1),it's not what I want (and not the real result base on this condition).So how to fix it? thank you.
--- Update -----
The data matrix is like (I omit other columns because only first 3 cols are relevant to selection), the first column is id for individuals
1 1979 0
1 1980 0
1 1981 1
1 1982 0
1 1983 1
2 1990 0
2 1991 0
2 1992 0
2 1993 1
3 1985 0
3 1986 0
3 1987 0
Based on the conditions, what I want is a submatrix from data, which excludes those rows with value>1992 in the second column and value=1 in the third one

You only get a single column vector as output since your condition vector is returned as a single 23166x1 vector.
To get the entire row of values you need to add a colon as a second argument.
I've splitted the example in two lines to make more readable.
condIdx = data(:,col_coh)==0 & data(:,col_year)<=1992;
x7992 = data( condIdx, :);
If you want specific columns in your result matrix, just put the column number in a vector instead of the colon operator.
colsInResult = [1 2 3];
x7992 = data( condIdx, colsInResult);

Based on the example that you have given, the following will do this:
data(data(:,2)<=1992 & data(:,3)~=1,:)
which gives this output:
1 1979 0
1 1980 0
1 1982 0
2 1990 0
2 1991 0
2 1992 0
3 1985 0
3 1986 0
3 1987 0

Related

Sum second column elements based on first column elements - MATLAB

I have two matrix with more than 1000 rows, and two columns.
Everytime the first column is '0', the second has a value, and everytime the first column is '1', the second is zero.
Example:
M = [0 23;0 35;1 0;1 0;0 2;1 0]
M =
0 23
0 35
1 0
1 0
0 2
1 0
Let's think about the second column as a non periodic cycle.
What I want is, everytime the first column is 0 (until is one again), having the opportunity to analyse the size and sum of the second column. In the end I want to know which cycle is bigger in size and sum. (in the example matrix, as output, I know the first cycle is the bigger with a sum of 58).
Assuming A as the input two-column array, here's one approach with accumarray -
%// Create ID array for using with accumarray
id = cumsum([1;diff(A(:,1))~=0]);
%// Get summations and counts for all IDs
sums = accumarray(id,A(:,2));
counts = accumarray(id,1);
%// Get offset in case the starting element in first column is not 0
offset = (A(1,1)~=0)+1;
%// Consider only even IDs corresponding to 0 elements cycle
sums = sums(offset:2:end)
counts = counts(offset:2:end)
Sample run -
A =
1 34
1 45
0 23
0 35
1 0
1 0
0 2
0 8
0 6
1 9
sums =
58
16
counts =
2
3

MatLab - Cellfun where func = strcmp Find where str changes in a cell array

I have a cell array of strings, I want to detect the num of times the string changes and get the indxs for the changes. Given Matlab's cellfun function, I am trying to use it instead of looping. Here is all the code. I appreciate you time, feedback, and comments.
% Cell Array Example
names(1:10)={'OFF'};
names(11:15)={'J1 - 1'};
names(16:22)={'J1 - 2'};
names(23:27)={'J2 - 1'};
names(28)={'Off'};
names=names';
% My cellfun code
cellfun(#(x,y) strcmp(x,y), names(1:2:end),names(2:2:end));
My expected result is a vector of length 27 (length(names)-1), where there are 4 zeros in the vector indicating the strcmp func found 4 cases where the comparison was not equal.
The actual result is a vector of length 14 and has only 2 zeros. I'd really appreciate an explanation, why this unexpected result is occurring.
Thank You
The answer provided by Matt correctly shows the issue with your code. However, you can use strcmp directly because it accepts two cell array of strings as input
>> strcmp(names(1:end-1), names(2:end))
ans =
Columns 1 through 14
1 1 1 1 1 1 1 1 1 0 1 1 1 1
Columns 15 through 27
0 1 1 1 1 1 1 0 1 1 1 1 0
You could transform the strings into numeric labels using unique, and then apply diff to detect changes:
[~, ~, u] = unique(names);
result = ~diff(u);
If I understand your question correctly, you should be comparing names(1:end-1) with names(2:end). That is, compare string 1 with string 2, compare string 2 with string 3, and so on. You are instead using a stride of 2, comparing string 1 with string 2, string 3 with string 4, and so on. You can fix this by changing your last line to:
cellfun(#(x,y) strcmp(x,y), names(1:end-1),names(2:end))
The result is then:
Columns 1 through 20:
1 1 1 1 1 1 1 1 1 0 1 1 1 1 0 1 1 1 1 1
Columns 21 through 27:
1 0 1 1 1 1 0

How to count patterns columnwise in Matlab?

I have a matrix S in Matlab that looks like the following:
2 2 1 2
2 3 1 1
3 3 1 1
3 4 1 1
3 1 2 1
4 1 3 1
1 1 3 1
I would like to count patterns of values column-wise. I am interested into the frequency of the numbers that follow right after number 3 in any of the columns. For instance, number 3 occurs three times in the first column. The first time we observe it, it is followed by 3, the second time it is followed by 3 again and the third time it is followed by 4. Thus, the frequency for the patters observed in the first column would look like:
3-3: 66.66%
3-4: 33.33%
3-1: 0%
3-2: 0%
To generate the output, you could use the convenient tabulate
S = [
2 2 1 2
2 3 1 1
3 3 1 1
3 4 1 1
3 1 2 1
4 1 3 1
1 1 3 1];
idx = find(S(1:end-1,:)==3);
S2 = S(2:end,:);
tabulate(S2(idx))
Value Count Percent
1 0 0.00%
2 0 0.00%
3 4 66.67%
4 2 33.33%
Here's one approach, finding the 3's then looking at the following digits
[i,j]=find(S==3);
k=i+1<=size(S,1);
T=S(sub2ind(size(S),i(k)+1,j(k))) %// the elements of S that are just below a 3
R=arrayfun(#(x) sum(T==x)./sum(k),1:max(S(:))).' %// get the number of probability of each digit
I'm going to restate your problem statement in a way that I can understand and my solution will reflect this new problem statement.
For a particular column, locate the locations that contain the number 3.
Look at the row immediately below these locations and look at the values at these locations
Take these values and tally up the total number of occurrences found.
Repeat these for all of the columns and update the tally, then determine the percentage of occurrences for the values.
We can do this by the following:
A = [2 2 1 2
2 3 1 1
3 3 1 1
3 4 1 1
3 1 2 1
4 1 3 1
1 1 3 1]; %// Define your matrix
[row,col] = find(A(1:end-1,:) == 3);
vals = A(sub2ind(size(A), row+1, col));
h = 100*accumarray(vals, 1) / numel(vals)
h =
0
0
66.6667
33.3333
Let's go through the above code slowly. The first few lines define your example matrix A. Next, we take a look at all of the rows except for the last row of your matrix and see where the number 3 is located with find. We skip the last row because we want to be sure we are within the bounds of your matrix. If there is a number 3 located at the last row, we would have undefined behaviour if we tried to check the values below the last because there's nothing there!
Once we do this, we take a look at those values in the matrix that are 1 row beneath those that have the number 3. We use sub2ind to help us facilitate this. Next, we use these values and tally them up using accumarray then normalize them by the total sum of the tallying into percentages.
The result would be a 4 element array that displays the percentages encountered per number.
To double check, if we look at the matrix, we see that the value of 3 follows other values of 3 for a total of 4 times - first column, row 3, row 4, second column, row 2 and third column, row 6. The value of 4 follows the value of 3 two times: first column, row 6, second column, row 3.
In total, we have 6 numbers we counted, and so dividing by 6 gives us 4/6 or 66.67% for number 3 and 2/6 or 33.33% for number 4.
If I got the problem statement correctly, you could efficiently implement this with MATLAB's logical indexing and an approach that is essentially of two lines -
%// Input 2D matrix
S = [
2 2 1 2
2 3 1 1
3 3 1 1
3 4 1 1
3 1 2 1
4 1 3 1
1 1 3 1]
Labels = [1:4]'; %//'# Label array
counts = histc(S([false(1,size(S,2)) ; S(1:end-1,:) == 3]),Labels)
Percentages = 100*counts./sum(counts)
Verify/Present results
The styles for presenting the output results listed next use MATLAB's table for a well human-readable format of data.
Style #1
>> table(Labels,Percentages)
ans =
Labels Percentages
______ ___________
1 0
2 0
3 66.667
4 33.333
Style #2
You can do some fancy string operations to present the results in a more "representative" manner -
>> Labels_3 = strcat('3-',cellstr(num2str(Labels','%1d')'));
>> table(Labels_3,Percentages)
ans =
Labels_3 Percentages
________ ___________
'3-1' 0
'3-2' 0
'3-3' 66.667
'3-4' 33.333
Style #3
If you want to present them in descending sorted manner based on the percentages as listed in the expected output section of the question, you can do so with an additional step using sort -
>> [Percentages,idx] = sort(Percentages,'descend');
>> Labels_3 = strcat('3-',cellstr(num2str(Labels(idx)','%1d')'));
>> table(Labels_3,Percentages)
ans =
Labels_3 Percentages
________ ___________
'3-3' 66.667
'3-4' 33.333
'3-1' 0
'3-2' 0
Bonus Stuff: Finding frequency (counts) for all cases
Now, let's suppose you would like repeat this process for say 1, 2 and 4 as well, i.e. find occurrences after 1, 2 and 4 respectively. In that case, you can iterate the above steps for all cases and for the same you can use arrayfun -
%// Get counts
C = cell2mat(arrayfun(#(n) histc(S([false(1,size(S,2)) ; S(1:end-1,:) == n]),...
1:4),1:4,'Uni',0))
%// Get percentages
Percentages = 100*bsxfun(#rdivide, C, sum(C,1))
Giving us -
Percentages =
90.9091 20.0000 0 100.0000
9.0909 20.0000 0 0
0 60.0000 66.6667 0
0 0 33.3333 0
Thus, in Percentages, the first column are the counts of [1,2,3,4] that occur right after there is a 1 somewhere in the input matrix. As as an example, one can see column -3 of Percentages is what you had in the sample output when looking for elements right after 3 in the input matrix.
If you want to compute frequencies independently for each column:
S = [2 2 1 2
2 3 1 1
3 3 1 1
3 4 1 1
3 1 2 1
4 1 3 1
1 1 3 1]; %// data: matrix
N = 3; %// data: number
r = max(S(:));
[R, C] = size(S);
[ii, jj] = find(S(1:end-1,:)==N); %// step 1
count = full(sparse(S(ii+1+(jj-1)*R), jj, 1, r, C)); %// step 2
result = bsxfun(#rdivide, count, sum(S(1:end-1,:)==N)); %// step 3
This works as follows:
find is first applied to determine row and col indices of occurrences of N in S except its last row.
The values in the entries right below the indices of step 1 are accumulated for each column, in variable count. The very convenient sparse function is used for this purpose. Note that this uses linear indexing into S.
To obtain the frequencies for each column, count is divided (with bsxfun) by the number of occurrences of N in each column.
The result in this example is
result =
0 0 0 NaN
0 0 0 NaN
0.6667 0.5000 1.0000 NaN
0.3333 0.5000 0 NaN
Note that the last column correctly contains NaNs because the frequency of the sought patterns is undefined for that column.

matlab: how to compare two matrices to get the indeces of the elements that differs from one to another

I'm using Matlab with very big multidimensional similar matrices and I'd like to find the differences of between them.
The two matrices have the same size.
Here is an example:
A(:,:,1) =
1 1 1
1 1 1
1 1 1
A(:,:,2) =
1 1 1
1 1 1
1 1 1
A(:,:,3) =
1 1 1
1 1 1
1 1 1
B(:,:,1) =
1 1 99
1 1 99
1 1 1
B(:,:,2) =
1 1 1
1 1 1
1 1 1
B(:,:,3) =
1 1 99
1 1 1
1 1 1
I need a function that give me the indeces of the values that differs, in this example this would be :
output =
1 3 1
1 3 3
2 3 1
I know that I can use functions like find(B~=A) or find(~ismember(B, A)) I don't know how to change their output to the indeces I want.
Thank you all!
You almost have it correct! Remember that find finds column major indices of where in your matrix (or vector) the Boolean condition you want to check for is being satisfied. If you want the actual row/column/slice locations, you need to use ind2sub. You would call it this way:
%// To reproduce your problem
A = ones(3,3,3);
B = ones(3,3,3);
B(7:8) = 99;
B(25) = 99;
%// This is what you call
[row,col,dim] = ind2sub(size(A), find(A ~= B));
The first parameter to ind2sub is the matrix size of where you're searching. Since the dimensions of A are equal to B, we can choose either A or B for the first input, and we use size to help us determine the size of the matrix. The second input are the column major indices that we want to access the matrix. These are simply the result of find.
row, col, and dim will give you the rows, columns and slices of which elements in your 3D matrix were not equal. Also note that these will be column vectors, as the output of find will produce a column vector of column-major indices. As such, we can concatenate each of the column vectors into a single matrix and display your information. Therefore:
locations = [row col dim];
disp(locations);
1 3 1
2 3 1
1 3 3
As such, the first column of this matrix tells you the row locations of where the matrix values are unequal, the second column of this matrix tells you the column locations of where the matrix values are unequal, and finally the third column tells you the slices of where the matrix values are unequal. Therefore, we have three points in this matrix that are unequal, which are located at (1,3,1), (2,3,1) and (1,3,3) respectively. Note that this is unsorted due to the nature of find as it searches amongst the columns of your matrix first. If you want to have this sorted like you have in your example output, use sortrows. If we do this, we get:
sortrows(locations)
ans =
1 3 1
1 3 3
2 3 1

Resorting rows in matrix in non-decreasing order based on the entries of an arbitrary column

I am working with nx2 matrices in Matlab, and what I'm trying to do is fairly simple in principle. I randomly generate a square matrix, I run it through a series of functions and I get an mx2 matrix. I use the unique function on the rows to get rid of repeated rows and I end up with an nx2 matrix. What I'm having trouble doing is further reducing this matrix so that for all entries in the first column that have the exact same entry, only keep the row with the highest number on the second column.
I was using a loop to check the ith and (i+1)th entries of the first column and store the rows with the highest value in the second column, but I am trying to avoid for-loops as much as possible.
If anyone has an idea or suggestion please let me know!
Example:
0 0 0 0
0 1 0 1 0 3
A= 0 3 ---> unique(A, 'rows') = 0 3 --WANT--> 1 1
1 0 1 0 2 4
1 0 1 1
0 0 2 1
2 1 2 4
1 1
2 4
What you are looking for is:
[u,~,n] = unique(A(:,1));
B = [u, accumarray(n, A(:,2), [], #max)];
I don't exactly understand your problem description, but it sounds like sortrows() may be of some help to you.