Deleting rows from a Matlab cell matrix which match a given pattern - matlab

I have a cell matrix with two columns (no headers). Column one contains ticker symbols, e.g AAPL, GS etc. Column two contains either 0 or 1.
How can I delete all rows that contain '1' in column 2? Then how can I get an output of the remaining ticker symbols separately in a different m file?
Help please!

Does this do what you need?
>> a = {'AAPL', 1; 'MSFT', 0; 'GOOG' 1; 'IBM', 0} % Make some data like the OP's
a =
'AAPL' [1]
'MSFT' [0]
'GOOG' [1]
'IBM' [0]
>> toDelete = cell2mat(a(:,2)) == 1; % Extract which rows have a 1 in column 2
>> a(toDelete,:) = []; % Delete those rows
>> remainingTickers = a(:,1) % Extract column 1 from the remaining rows
remainingTickers =
'MSFT'
'IBM'

Related

MATLAB For Loop to count #s in one column based off of entries in another

I have 50 spreadsheets with multiple scored columns:
One column (AG) has numbers coded 1:13, the other, (SEC) has numbers coded 1:6.
Ex:
AG SEC
1 1
2 1
4 1
13 1
3 2
12 2
I want to write a for loop that counts all the 1s in .SEC that correspond to #s 1:5 in .AG. (output would be 3 - it wouldn't count the 1 corresponding to 13). I need this to happen for all #s in .SEC (1:6). The final output would have the spreadsheet name in the first column, and counts for .SEC=1,2,3,4,5,6 in each of the proceeding columns.
My current code creates a variable for total .AG counts in .SEC, but is nondiscriminatory (counts the amount of times any number is given in .AG instead of counting for specific values)
scoringfiles is a 50-item path list (when I do readtable(scoringfiles) it iterates through the list and reads through excel files. filelist is a 50-item list with just filenames.
for i=1:length(scoringfiles)
if contains(filelist(i,:),"sheet")
disp(i)
sheetnum=[sheetnum extractBetween(filelist{i},1,4)]
s1=[s1 length(find(readtable(scoringfiles(i,:)).SEC==1))]
s2=[s2 length(find(readtable(scoringfiles(i,:)).SEC==2))]
s3=[s3 length(find(readtable(scoringfiles(i,:)).SEC==3))]
s4=[s4 length(find(readtable(scoringfiles(i,:)).SEC==4))]
s5=[s5 length(find(readtable(scoringfiles(i,:)).SEC==5))]
s6=[s6 length(find(readtable(scoringfiles(i,:)).SEC==6))]
elseif contains(filelist(i,:),"graph")
disp("not sheet")
end
end
In MATLAB, i and j are the imaginary unit. To avoid redefining it, you should make a habit of using ii and jj as your loop variable instead of i and j.
Now back to the main question:
Let's assume you've read the file contents into the data variable. This is going to be a Nx2 array.
You only care about AG when it is in the range 1:5. Let's create a filter array with true where AG is in the range and false elsewhere.
filter = data(:, 1) >= 1 & data(:, 1) <= 5;
Let's first split the columns into two variables for legibility. Use the filter to select just the rows that match our criteria.
ag = data(filter, 1);
sec = data(filter, 2);
Now you want to go through each unique value in sec, and count the number of ag entries.
unique_sec = unique(sec);
counts = zeros(size(unique_sec)); % Preallocate a zeros array to save our answer in
for ii = 1:length(unique_sec)
sec_value = unique_sec(ii); % Get the value of SEC
matches = sec == sec_value; % Make a filter for rows that have this value
% matches is a logical array. true = 1, false = 0. sum gives number of trues.
counts(ii) = sum(matches);
end
Alternatively, you could perform the filter for 1 <= AG <= 5 inside the loop if you don't want to filter before:
ag = data(:, 1);
sec = data(:, 2);
unique_sec = unique(sec);
counts = zeros(size(unique_sec));
for ii = 1:length(unique_sec)
sec_value = unique_sec(ii);
matches = sec == sec_value & ag >= 1 & ag <= 5; % Add more conditions to the filter
counts(ii) = sum(matches);
end
If you want to do this for multiple files, iterate over them and read the files into the data variable.
I figured out how to apply a filter thanks to the help of Pranav. It is as simple as adding the filter to each line of the for loop as it iterates through reading my spreadsheets. See below:
THIS EXAMPLE ONLY LOOKS AT S1 and S2. Realistically, I have this for 6 different #s creating 6 tables with counts per spreadsheet.
for i=1:length(scoringfiles)
filter1 = readtable(scoringfiles(i,:)).AG >= 1;
filter2 = readtable(scoringfiles(i,:)).AG <= 5;
if contains(filelist(i,:),"sheet")
disp(i)
sheetnum=[sheetnum extractBetween(filelist{i},1,4)]
s1=[s1 length(find(readtable(scoringfiles(i,:)).SEC==1 & filter1 & filter2))]
s2=[s2 length(find(readtable(scoringfiles(i,:)).SEC==2 & filter1 & filter2))]
elseif contains(filelist(i,:),"graph")
disp("not sheet")
end
end

Remove zeros column and rows from a matrix matlab

I would like to remove some columns and rows from a big matrix. Those are the columns and the rows which have all zeros values. Is there any function in MATLAB that can do it for you quite fast? My matrices are sparse. I am doing this way:
% To remove all zero columns from A
ind = find(sum(A,1)==0) ;
A(:,ind) = [] ;
% To remove all zeros rows from A
ind = find(sum(A,2)==0) ;
A(ind,:) = [] ;
It would be nice to have a line of code for this as I may do this kind of task repeatedly. Thanks
A single line of code would be:
A=A(any(A,2),any(A,1))
There is no need to use find like you did, you can directly index using logical vectors. The any function finds the rows or columns with any non-zero elements.
1 Dimension:
I'll first show a simpler example based on another duplicate question, asking to to remove only the rows containing zeros elements.
Given the matrix A=[1,2;0,0];
To remove the rows of 0, you can:
sum the absolute value of each rows (to avoid having a zero sum from a mix of negative and positive numbers), which gives you a column vector of the row sums.
keep the index of each line where the sum is non-zero.
in code:
A=[1,2;0,0];
% sum each row of the matrix, then find rows with non-zero sum
idx_nonzerolines = sum(abs(A),2)>0 ;
% Create matrix B containing only the non-zero lines of A
B = A(idx_nonzerolines,:) ;
will output:
>> idx_nonzerolines = sum(abs(A),2)>0
idx_nonzerolines =
1
0
>> B = A(idx_nonzerolines,:)
B =
1 2
2 Dimensions:
The same method can be used for 2 dimensions:
A=[ 1,2,0,4;
0,0,0,0;
1,3,0,5];
idx2keep_columns = sum(abs(A),1)>0 ;
idx2keep_rows = sum(abs(A),2)>0 ;
B = A(idx2keep_rows,idx2keep_columns) ;
outputs:
>> B = A(idx2keep_rows,idx2keep_columns)
B =
1 2 4
1 3 5
Thanks to #Adriaan in comments for spotting the edge case ;)

How to compare columns of a binary matrix and compare elements in matlab?

i have [sentences*words] matrix as shown below
out = 0 1 1 0 1
1 1 0 0 1
1 0 1 1 0
0 0 0 1 0
i want to process this matrix in a way that should tell W1 & W2 in "sentence number 2" and "sentence number 4" occurs with same value i.e 1 1 and 0 0.the output should be as follows:
output{1,2}= 2 4
output{1,2} tells word number 1 and 2 occurs in sentence number 2 and 4 with same values.
after comparing W1 & W2 next candidate should be W1 & W3 which occurs with same value in sentence 3 & sentence 4
output{1,3}= 3 4
and so on till every nth word is compared with every other words and saved.
This would be one vectorized approach -
%// Get number of columns in input array for later usage
N = size(out,2);
%// Get indices for pairwise combinations between columns of input array
[idx2,idx1] = find(bsxfun(#gt,[1:N]',[1:N])); %//'
%// Get indices for matches between out1 and out2. The row indices would
%// represent the occurance values for the final output and columns for the
%// indices of the final output.
[R,C] = find(out(:,idx1) == out(:,idx2))
%// Form cells off each unique C (these will be final output values)
output_vals = accumarray(C(:),R(:),[],#(x) {x})
%// Setup output cell array
output = cell(N,N)
%// Indices for places in output cell array where occurance values are to be put
all_idx = sub2ind(size(output),idx1,idx2)
%// Finally store the output values at appropriate indices
output(all_idx(1:max(C))) = output_vals
You can get a logical matrix of size #words-by-#words-by-#sentences easily using bsxfun:
coc = bsxfun( #eq, permute( out, [3 2 1]), permute( out, [2 3 1] ) );
this logical array is occ( wi, wj, si ) is true iff word wi and word wj occur in sentence si with the same value.
To get the output cell array from coc you need
nw = size( out, 2 ); %// number of words
output = cell(nw,nw);
for wi = 1:(nw-1)
for wj = (wi+1):nw
output{wi,wj} = find( coc(wi,wj,:) );
output{wj,wi} = output{wi,wj}; %// you can force it to be symmetric if you want
end
end

Exporting blank values into a .txt file - MATLAB

I'm currently trying to export multiple matrices of unequal lengths into a delimited .txt file thus I have been padding the shorter matrices with 0's such that dlmwrite can use horzcat without error:
dlmwrite(filename{1},[a,b],'delimiter','\t')
However ideally I do not want the zeroes to appear in the .txt file itself - but rather the entries are left blank.
Currently the .txt file looks like this:
55875 3.1043e+05
56807 3.3361e+05
57760 3.8235e+05
58823 4.2869e+05
59913 4.3349e+05
60887 0
61825 0
62785 0
63942 0
65159 0
66304 0
67509 0
68683 0
69736 0
70782 0
But I want it to look like this:
55875 3.1043e+05
56807 3.3361e+05
57760 3.8235e+05
58823 4.2869e+05
59913 4.3349e+05
60887
61825
62785
63942
65159
66304
67509
68683
69736
70782
Is there anyway I can do this? Is there an alternative to dlmwrite which will mean I do not need to have matrices of equal lengths?
If a is always longer than b you could split vector a into two vectors of same length as vector b and the rest:
a = [1 2 3 4 5 6 7 8]';
b = [9 8 7 ]';
len = numel(b);
dlmwrite( 'foobar.txt', [a(1:len), b ], 'delimiter', '\t' );
dlmwrite( 'foobar.txt', a(len+1:end), 'delimiter', '\t', '-append');
You can read in the numeric data and convert to string and then add proper whitespaces to have the final output as string based cell array, which you can easily write into the output text file.
Stage 1: Get the cell of strings corresponding to the numeric data from column vector inputs a, b, c and so on -
%// Concatenate all arrays into a cell array with numeric data
A = [{a} {b} {c}] %// Edit this to add more columns
%// Create a "regular" 2D shaped cell array to store the cells from A
lens = cellfun('length',A)
max_lens = max(lens)
A_reg = cell(max_lens,numel(lens))
A_reg(:) = {''}
A_reg(bsxfun(#le,[1:max_lens]',lens)) = cellstr(num2str(vertcat(A{:}))) %//'
%// Create a char array that has string data from input arrays as strings
wsp = repmat({' '},max_lens,1) %// Create whitespace cell array
out_char = [];
for iter = 1:numel(A)
out_char = [out_char char(A_reg(:,iter)) char(wsp)]
end
out_cell = cellstr(out_char)
Stage 2: Now, that you have out_cell as the cell array that has the strings to be written to the text file, you have two options next for the writing operation itself.
Option 1 -
dlmwrite('results.txt',out_cell(:),'delimiter','')
Option 2 -
outfile = 'results.txt';
fid = fopen(outfile,'w');
for row = 1:numel(out_cell)
fprintf(fid,'%s\n',out_cell{row});
end
fclose(fid);

cell to matrix matching / map / cellOperations (MATLAB)

I cannot find the string equivalent of the finalAnswer using the data below. Please, I cannot use if/for loops! A final answer is preferred with each element as an array (i.e. the format of mainData)
mainData = {'IBM' [201] [1] ;
'GE' [403] [1] ;
'MSFT' [502] [3] ;
'GM' [101] [2] } ;
finalAns = [ 101 2 0.5; 403 1 0.6 ] ;
%% I tried doing this ->
temp = cell2mat(mainData(:,[2 3])) ;
tf = ismember(temp, finalAns(:,[1 2],'rows') ;
secIDs = mainData(tf) ;
In order to get the entries in each row of mainData that match those in finalAns (based on the last two columns of mainData and the first two columns of finalAns) and to get them in the same order that they appear in finalAns and with the last column of finalAns appended, you can do this:
>> temp = cell2mat(mainData(:,2:3));
>> [isThere,index] = ismember(finalAns(:,1:2),temp,'rows');
>> output = [mainData(index(isThere),:) num2cell(finalAns(isThere,3))]
output =
'GM' [101] [2] [0.5000]
'GE' [403] [1] [0.6000]
The output is a 2-by-4 cell array with each value in a separate cell. If you want the last three columns to be collected in a vector, you can replace the calculation of output above with this:
>> temp = [temp(index(isThere),:) finalAns(isThere,3)];
>> output = [mainData(index(isThere),1) num2cell(temp,2)]
output =
'GM' [1x3 double]
'GE' [1x3 double]
Note that now you have a 2-by-2 cell array where cells in the second column contain 1-by-3 double arrays.