Performing find and replace functions on elements of a table in Matlab - matlab

I am working with a 400x1200 imported table (readtable generated from an .xls) which contains strings, doubles, dates, and NaNs. Each column is typed consistently. I am looking for a way to locate all instances in the table of any given string ('Help me please') and replace them all with a double (1). Doing this in Matlab will save me loads of work making changes to the approach used on the rest of this project.
Unfortunately, all of the options I've looked at (regexp, strrep, etc) can only take a string as a replacement. Strfind was similarly unhelpful, because of the typing across the table. The lack of cellfun has also made this harder than it should be. I know the solution should have something to do with finding the indices of the strings I want and then just looping DataFile{subscript} = [1], but I can't find a way to do it.

First you should transform your table at a cell array.
Then, you can use the strrep along with str2num, e.g.
% For a given cell index
strrep(yourCellIndexVariable, "Help me please", "1");
str2num(yourCellIndexVariable);
This will replace the string "Help me please" with the string "1" (the strrep function) and the str2num will change the cell index to the double value according to the string.
By yourCellIndexVariable I mean an element from the cell array. There are several ways to get all cells from a cell array, but I think that you have solved that part already.

What you can do is as follows:
[rows, cols] = size(table); % Get the size of your table
YourString = 'Help me please'; % Create your string
Strmat = repmat(YourString,rows,cols); % Stretch to fill a matrix of table size
TrueString = double(strcmp(table,Strmat)); % Compares all entries with one another
TrueString now contains logicals, 1 where the string 'Help me please' is located, and 0 where it is not.
If you have a table containing multiple classes it might be handy to switch to cells though.

Thank you very much everyone for helping think through to a solution. Here's what I ended up with:
% Reads data
[~, ~, raw] = xlsread ( 'MyTable.xlsx');
MyTable = raw;
% Makes a backup of the data in table form
MyTableBackup = readtable( 'MyTable.xlsx' );
% Begin by ditching 1st row with variable names
MyTable(1,:) = [];
% wizard magic - find all cells with strings
StringIndex = cellfun('isclass', MyTable, 'char');
% strrep goes here to recode bad strings. For example:
MyTable(StringIndex) = strrep(MyTable(StringIndex), 'PlzHelpMe', '1');
% Eventually, we are done, so convert back to table
MyTable = cell2table(MyTable);
% Uses backup Table to add variable names
% (the readtable above means the bad characters in variable names are already escaped!)
MyTable.Properties.VariableNames = MyTableBackup.Properties.VariableNames;
This means the new values exist as strings ('1', not 1 as a double), so now I just str2double when I access them for analysis. My takeaway - Matlab is for numbers. Thanks again all!

Related

Matlab add string variable as column in table

I'm trying to use YOLOv4 in MATLAB R2022b to carry out detections on all images in a directory, and append results to a text file.
I can append just the detection results to each line, but when I try to add the filename I get this error:
You might have intended to create a one-row table with the character vector '000001.jpg' as one of its variables. To store text data in a table, use a string array or a cell array of character vectors rather than character arrays. Alternatively, create a cell array with one row, and convert that to a table using CELL2TABLE.
I understand that the filename is a string, and the values returned by YOLO are a categorical array, but I don't understand the most efficient way to deal with this.
filesDir = dir("/home/ADL-Rundle-1/img1/");
for k=1:length(filesDir)
baseFileName=filesDir(k).name
fullFileName = fullfile(filesDir(k).folder, baseFileName);
if isfile(fullFileName)
img = imread(fullFileName);
[bboxes,scores,labels] = detect(detector,img);
T = table(baseFileName, labels, bboxes, scores);
writetable(T,'/home/tableDataPreTrained.txt','WriteMode','Append','WriteVariableNames',0);
end
end
The format of results from YOLO is
And I'd like a file with
000001.jpg, 1547.3, 347.35, 355.64, 716.94, 0.99729
000001.jpg, 717.81, 370.64, 76.444, 108.92, 0.61191
000002.jpg, 1, 569.5, 246.49, 147.25,0.56831
baseFileName is a char vector.
The error message is telling you to use a cell array of char vectors:
T = table({baseFileName}, labels, bboxes, scores);
or a string array:
T = table(string(baseFileName), labels, bboxes, scores);
I would use the string array, it's the more modern MATLAB, and the table looks prettier when displayed. But both accomplish the same thing.
Given that labels and the other two variables have multiple rows, you need to replicate the file name that number of times:
frame = repmat(string(baseFileName), size(labels,1), 1);
T = table(frame, labels, bboxes, scores);

Matlab, Convert cell to matrix

Hope some of you can help me. I have converted a pdf with a lot of txt and tables to .txt file. I did this because three values of the pdf has to be writen into exel. This has to be done more than a thusind times a mounth, therefore i thought there has to be a better eay than doing it manually. The only things that has to be extracted is the Date, Repport number and a single volume. I found out that the date and repport number always is at the same line, so thats pretty easy to extract, even though its readen into a 145x1 cell. But this brings me to my first question.
Each of the cells looks like this:
Date 23/4-2015
Repportnumber 8
How do i remove the whitespace?
I also have to extract the volume. this was more difficult, cause the linemunber of the volume differentiates from one pdf to another, therefore i created a searchfunction, which works and founds the volume, which is created to a cellarry looking like this:
[233.4 452.2 94.6]
I only needs the middlenumber, so how do i create this into a matrix?
Keep in mind it is a 1x1 cell, with whitespace!
Hope some of you guys can help me.
For your first question, you can remove the spaces by searching the line of characters and identifying the spaces with strcmp, then setting those elements of the character string to be empty ([]). Here is an example of the code for that:
% number of character
N = length(my_string);
% character to remove (initialize all 0)
icut = zeros(1,N);
% check each character
for i = 1:N;
% if character is a space, tag for removal
if strcmp(my_string(i),' ');
icut(i) = 1;
end
end
% remove space characters
my_string(icut == 1) = [];
For your second question, you can convert the contents of the cell to a numeric array then simply take the 2nd element.
% convert the cell contents to an array of numbers
cell_array = str2num(my_cell{1});
% get the middle value
middle_value = cell_array(2);
This assumes the cell contains the array of values as a string, as in my_cell = {'[233.4 452.2 94.6]'};.
You can remove the whitespace from a string using strrep. This works on cells containing strings or on char arrays and returns the same object type that it was applied to. If you pass in a cell to strrep it will return a cell, if you pass in a char array it will return a char array.
>> C = {'Date 23/4-2015 Repportnumber 8'};
>> strrep(C, ' ', '') % Cell containing string (char array)
ans =
'Date23/4-2015Repportnumber8'
>> strrep(C{1}, ' ', '') % String (char array)
ans =
Date23/4-2015Repportnumber8
To convert the version cell array to a matrix you can use str2num. Then you can use linear indexing to extract the correct version.
>> C = {'[233.4 452.2 94.6]'};
>> C = str2num(C{1});
>> C(2)
ans =
452.2000

Variable labels in MATLAB

I have a huge table data= {1000 x 1000} of binary data.
They table's variable names are encoded for eg D1,D2,...,DA2,DA3,... with their real labels given in a .txt file.
The .txt file also consists of some text for eg:
D1: Age
Mean age: 33
Median :
.
.
.
D2: weight
I would just like to pick out these names from the text file and create a table with the real variable names.
Any suggestions?
If there is a specific number of lines between each of those labels, then you can extract them by reading in the file, and looping over the relevant lines. For each label, it simple to extract the label with strsplit()
e.g. Let's say there's 5 lines between each label
uselessLines = 5;
% imports as a vertical matrix with each line from the file.
dataLabelsFile = importdata(filename);
% get the total number of lines
numLines = size(dataLabelsFile);
% pre-allocate array for labels, a cell is used for a string
dataLabels = cell(ceil(numLines/(uselessLines+1)));
% use a seperate counting variable
m = 1;
% now, for each label, we add it to the dataLabels matrix
for i=1:(uselessLines+1):numLines
line = strsplit(dataLabelsFile{i}); % by default splits on whitespace
dataLabels(m) = line(2);
m = m + 1;
end
By the end of that loop you should have a variable called dataLabels that holds all of the labels. Now, you can actually very easily work out which label goes with which set of data
provided they are still in the same order. The indexes will be the same for the label to the data.
This is a method you could try if the labels are evenly spaced.
However, if the labels are a random number of lines, then you probably want to do a check with a regular expression like the person below me has suggested. Then you just replace the last two lines of the loop with something like this.
...
if (regular expression matched)
dataLabels(m) = line(2);
m = m + 1;
end
...
That being said, while regular expressions are flexible, if you can get away with replacing it with literally one function call, it's usually better to do that. Regex efficiencies are determined by the skill of the programmer, while in-built functions have generally been tested by some of the better programmers in the world. Additionally, Regex's are harder to understand if you ever want to go back and change it.
Of course there are times when Regex's are amazing, I'm just not convinced this is one of those times.
An implemention of the approach in my earlier comment:
fid = fopen(filename);
varNames = cell(0);
proceed = true;
while proceed
line = fgetl(fid);
if ischar(line)
startIdx = regexp(line,'(?<=^[A-Z]*\d*:)\s');
if ~isempty(startIdx)
varNames{end+1} = strtrim(line(startIdx:end)); %#ok<SAGROW>
end
else
proceed = false;
end
end
fclose(fid);
I cant put the resulting varNames in a table for you, since I have a version of Matlab that does not support tables.

Extracting a matrix of data from a matrix of structs

I have a matrix of structs. I'm trying to extract from that matrix a matrix the same size
with only one of the fields as values.
I've been trying to use struct2cell and similar functions without success.
How can this be done?
If I understand you correctly, you have an array of struct like e.g this
s(1:2,1:3) = struct('a',1,'b',2);
Now you want a different struct that only has the field b
[newS(1:2,1:3).b] = deal(s.b);
edit
If all you need is the output (and if the field values are scalar), you can do the following:
out = zeros(size(s));
out(:) = cat(1,s.b)
I'll borrow Jonas' example. You can use the [] to gather a particular field.
% Create structure array
s(1:2,1:3) = struct('a',1,'b',2);
% Change values
for idx = 1:prod(size(s))
s(idx).a = idx;
s(idx).b = idx^2;
end
% Gather a specific field and reshape it to the size of the original matrix
A = reshape([s.a],size(s));
B = reshape([s.b],size(s));
I have a similar problem, but the contents of the field in my structure array are varying length strings that I use to tag my data, so when I extract the contents of the field, I want a cell of varying length strings.
This code using getfield and arrayfun does the job, but I think it is more complicated than it needs to be.
sa = struct('name', {'ben' 'frank', 'betty', 'cybil', 'jack'}, 'value', {1 1 2 3 5})
names = arrayfun(#(x) getfield(x, 'name'), sa, 'UniformOutput', false)
Can anyone suggest cleaner alternative? extractfield in the mapping toolbox seems to do the job, but it is not part of the base MATLAB system.
Update: I have answered my own embedded question.
names = {sa.name}

the function and use of cell(m,n) in matlab

I was reading that in MatLab, if you are going to fill a larger matrix, its more computational efficient to declare it's size from before with the use of the cell command; E.g.
X = cell(500,90);
but when I try to add values to it, like
X(i;) = x
where i is vector of double of length 90, and i an integer, I get
conversion from cell to double is not possible
Is my understanding of the cell function correct?
Cell contents are being addressed using curly braces, for example:
X{1,1}=1:8;
cell command creates an empty array:
C = cell(3,4,2);
% Or alternatively:
C{3,4,2} = [];
What you do with the cell array is up to you. But most likely it is not what you want - see Rasman's comment.
Have a look at some more examples either at MathWorks or other tutorials.