How to read data from a string - matlab

I have a string in the following format :
fileName.jpg,10,20,10,10,...,12,14,True
Basically, I have a string with comma separated values. The first value is a string, then it follows an array of 100 values and lastly another string being true or false.
Is there a way or directly reading these values into 3 variable? Two strings and an array?
The array of values might contain n\a values which I want to treat as -1 or something similar or by using a cell array and having an empty cell for those? Can you recommend me something for this type of problem?

You can use textscan:
n = 100; % number of integers between filename and logical values
M = textscan(str, ['%s' repmat('%d',1, n) '%s'], 'delimiter', ',',...
'TreatAsEmpty', 'n\a', 'EmptyValue', -1, 'CollectOutput', true);
The result M is a cell array with the file name in the first cell, the 100 integer values in the second, and a string containing the logical value in the last cell.

You can use strsplit and extract the values from your String and store them in separate variables
Code Sample:
a = strsplit("fileName.jpg,10,20,10,10,...,12,14,True",",")
fileName = a(1)
flag = a(end)
data = a(2:end-1)

Related

MATLAB: extract numerical data from alphanumerical table and save as double

I created a list of names of data files, e.g. abc123.xml, abc456.xml, via
list = dir('folder/*.xml').
Matlab starts this out as a 10x1 struct with 5 fields, where the first one is the name. I extracted the needed data with struct2table, so I now got a 10x1 table. I only need the numerical value as 10x1 double. How can I get rid of the alphanumerical stuff and change the data type?
I tried regexp (Undefined function 'regexp' for input arguments of type 'table') and strfind (Conversion to double from table is not possible). Couldn't come up with anything else, as I'm very new to Matlab.
You can extract the name fields and place them in a cell array, use regexp to capture the first string of digits it finds in each name, then use str2double to convert those to numeric values:
strs = regexp({list.name}, '(\d+)', 'once', 'tokens');
nums = str2double([strs{:}]);

MatLab find column number of text cell array

I have a cell data type matrix containing a header and a large number of rows.
sample data:
set press dp
32.7045 17.805965 123.75047
32.690094 17.80584 123.74992
32.6232 17.815094 123.790115
I am trying to find the index of a specific column using the strcmp command to search through the all the data.
dpCol = strcmp([data{:}], 'dp')
This always returns
dpCol =
0
Am I using the data cell type wrong or something? Thank you!
Try using cell notation to yield just the 1st row, EG:
data(1,:) = {'set','press','dp'}
instead of unpacking* the entire cell array since strcmp can operate on cell arrays.
>>> data = {'set' 'press' 'dp'
32.7045 17.805965 123.75047
32.690094 17.80584 123.74992
32.6232 17.815094 123.790115}
data =
'set' 'press' 'dp'
[32.7045] [17.8060] [123.7505]
[32.6901] [17.8058] [123.7499]
[32.6232] [17.8151] [123.7901]
>>> col_idx = strcmp(data(1,:),'dp')
col_idx=
0 0 1
Then return the dp using the logical indices and cell2mat...
>>> dp = cell2mat(data(2:end,col_idx))
dp =
123.7505
123.7499
123.7901
or unpack* and concatenate the comma separated list
>>> dp = [data{2:end,col_idx}]
dp =
123.7505 123.7499 123.7901
As an alternative try cell2struct.
>>> datastruct = cell2struct(data(2:end,:),data(1,:),2)
datastruct =
3x1 struct array with fields:
set
press
dp
Then dp is ...
>>> dp = [datastruct.dp]
dp =
123.7505 123.7499 123.7901
* Using the colon operator inside curly braces unpacks an cell array into a comma separated list. Using square brackets horizontally concatenates the comma separated list which returns a character array set pressdp{{{ since the first item in the list is a character array. The garbage characters between and after 'set', 'press' and 'dp' are caused by reading the doubles as char. IE: char(32.7045) is the ASCII equivalent of whitespace. The arrays always get unpacked as column.

How to load a cell array that has both strings and numbers?

I have a cell array that has both strings and numbers. I want to load all the elements of the cell array. For the same I used the following method:
load(filename);
This command is loading only strings and excluding the columns that has numbers. Basically since my file is not .mat extension, it is treating it as ASCII file and loading only text.
I tried importdata(filename). But that gives me struct of 1*1. I need the elements to be imported into another cell array of same dimension.
Is there a way to load all the values?
load is used to import .mat-files with workspace variables. Since your data is not an actual .mat-file, you need to use a different method.
Let's assume you have the file filename with tab-delimited data:
str1 1
str2 2
str3 3
str4 4
To get a cell-array where the first column is a string (using %s) and the second a double (using %f), you can use textscan. Check out the result, maybe it's already what you're searching for.
filename = 'data';
F = fopen(filename, 'r');
data = textscan(F, '%s %f', 'Delimiter', '\t');
If not, you can create a cell-array CA where the first column is a string (using cellstr) and the second one is a double (using num2cell).
CA = cell(size(data{1},1),2);
CA(:,1) = cellstr(data{1});
CA(:,2) = num2cell(data{2});
Result:
CA =
'str1' [1]
'str2' [2]
'str3' [3]
'str4' [4]

Reading a string from csv file in MATLAB

I have a CSV file which contains numbers,date and text. I have to extract the column which contain the text.
For example, sample csv file look like
1-1-2000,1,2.3,TRUE
2-1-2000,1,2.3,FALSE
I want to extract the column containing the TRUE/FALSE values.
I want to convert the TRUE into 1 and FALSE into 0.
Please suggest some functions and sample code to do this
You can try this:
[~,~,~,bools] = textread('filename.csv', '%s%d%f%s', 'delimiter', ',');
bools = cellfun(#strtrim, bool, 'uniformoutput', false);
bools = strcmp(bools, 'TRUE');
the line with strtrim might not be necessary if you know beforehand that there are never any trailing spaces. The line with strcmp outputs a logical array for all entries that string-compare to the literal TRUE, which implies that all other entries are false. Meaning:
1-1-2000,1,2.3,TRUE
2-1-2000,1,2.3,BANANAS
would produce the same logical vector. If you also want to explicitly compare to the string literal FALSE, use something like this:
a = NaN(size(bools));
a(strcmp(bools, 'TRUE')) = 1;
a(strcmp(bools, 'FALSE')) = 0;
if ~any(isnan(a))
bools = logical(a);
clear a
else
%# handle the error
end

Convert nonuniform cell array to numeric array

I am using xlsread in MATLAB to read in sheets from an excel file. My goal is to have each column of the excel sheet read as a numeric array. One of the columns has a mix of numbers and numbers+char. For example, the values could be 200, 300A, 450, 500A, 200A, 100. here is what I have so far:
[num, txt, raw] = xlsread(fileIn, sheets{ii}); % Reading in each sheet from a for loop
myCol = raw(:, 4) % I want all rows of column 4
for kk=1:numel(myCol)
if iscellstr(myCol(kk))
myCol(kk) = (cellfun(#(x)strrep(x, 'A', ''), myCol(kk), 'UniformOutput', false));
end
end
myCol = cell2mat(myCol);
This is able to strip off the char from the number but then I am left with:
myCol =
[200]
'300'
[450]
'500'
'200'
[100]
which errors out on cell2mat with:
cell2mat(myCol)
??? Error using ==> cell2mat at 46
All contents of the input cell array must be of the same data type.
I feel like I am probably mixing up () and {} somewhere. Can someone help me out with this?
Let me start from reading the file
[num, txt, raw] = xlsread('test.xlsx');
myCol = raw(:, 4);
idx = cellfun(#ischar,myCol ); %# find strings
data = zeros(size(myCol)); %# preallocate matrix for numeric data
data(~idx) = cell2mat(myCol(~idx)); %# convert numeric data
data(idx) = str2double(regexprep(myCol(idx),'\D','')); %# remove non-digits and convert to numeric
The variable myCol is initially a cell array containing both numbers and strings, something like this in your example:
myCol = {200; '300A'; 450; '500A'; '200A'; 100};
The steps you have to follow to convert the string entries into numeric values is:
Identify the cell entries in myCol that are strings. You can use a loop to do this, as in your example, or you can use the function CELLFUN to get a logical index like so:
index = cellfun(#ischar,myCol);
Remove the letters. If you know the letters to remove will always be 'A', as in your example, you can use a simple function like STRREP on all of your indexed cells like so:
strrep(myCol(index),'A','')
If you can have all sorts of other characters and letters in the string, then a function like REGEXPREP may work better for you. For your example, you could do this:
regexprep(myCol(index),'\D','')
Convert the strings of numbers to numeric values. You can do this for all of your indexed cells using the function STR2DOUBLE:
str2double(regexprep(myCol(index),'\D',''))
The final result of the above can then be combined with the original numeric values in myCol. Putting it all together, you get the following:
>> index = cellfun(#ischar,myCol);
>> result(index,1) = str2double(regexprep(myCol(index),'\D',''));
>> result(~index) = [myCol{~index}]
result =
200
300
450
500
200
100