I have a CSV file which contains numbers,date and text. I have to extract the column which contain the text.
For example, sample csv file look like
1-1-2000,1,2.3,TRUE
2-1-2000,1,2.3,FALSE
I want to extract the column containing the TRUE/FALSE values.
I want to convert the TRUE into 1 and FALSE into 0.
Please suggest some functions and sample code to do this
You can try this:
[~,~,~,bools] = textread('filename.csv', '%s%d%f%s', 'delimiter', ',');
bools = cellfun(#strtrim, bool, 'uniformoutput', false);
bools = strcmp(bools, 'TRUE');
the line with strtrim might not be necessary if you know beforehand that there are never any trailing spaces. The line with strcmp outputs a logical array for all entries that string-compare to the literal TRUE, which implies that all other entries are false. Meaning:
1-1-2000,1,2.3,TRUE
2-1-2000,1,2.3,BANANAS
would produce the same logical vector. If you also want to explicitly compare to the string literal FALSE, use something like this:
a = NaN(size(bools));
a(strcmp(bools, 'TRUE')) = 1;
a(strcmp(bools, 'FALSE')) = 0;
if ~any(isnan(a))
bools = logical(a);
clear a
else
%# handle the error
end
Related
I have a cell array of filenames - things like '20160303_144045_4.dat', '20160303_144045_5.dat', which I need to separate into separate arrays by the last digit before the '.dat'; one cell array of '...4.dat's, one of '...5.dat's, etc.
My code is below; it uses regex to split the file around the '.dat', reshapes a bit then regexes again to pull out the last number of the filename, builds a cell to store the filenames in then, and then I get a tad stuck. I have an array produced such as '1,0,1,0,1,0..' of required cell indexes which I thought might be trivial to pull out, but I'm struggling to get it to do what I want.
numFiles = length(sampleFile); %sampleFile is the input cell array
splitFiles = regexp(sampleFile,'.dat','split');
column = vertcat(splitFiles{:});
column = column(:,1);
splitNums = regexp(column,'_','split');
splitNums = splitNums(:,1);
column = vertcat(splitNums{:});
column = column(:,3);
column = cellfun(#str2double,column); %produces column array of values - 3,4,3,4,3,4, etc
uniqueVals = unique(column);
numChannels = length(uniqueVals);
fileNameCell = cell(ceil(numFiles/numChannels),numChannels);
for i = 1:numChannels
column(column ~= uniqueVals(i)) = 0;
column = column / uniqueVals(i); %e.g. 1,0,1,0,1,0
%fileNameCell(i)
end
I feel there should be an easier way than my hodge-podge of code, and I don't want to throw together a ton of messy for-loops if I can avoid it; I definitely believe I've overcomplicated this problem massively.
We can neaten your code quite a bit.
Take some example data:
files = {'abc4.dat';'abc5.dat';'def4.dat';'ghi4.dat';'abc6.dat';'def5.dat';'nonum.dat'};
You can get the final numbers using regexp and matching one or more digits followed by '.dat', then using strrep to remove the '.dat'.
filenums = cellfun(#(r) strrep(regexp(r, '\d+.dat', 'match', 'once'), '.dat', ''), ...
files, 'uniformoutput', false);
Now we can put these in a structure, using the unique numbers (prefixed by a letter because fields can't start with numbers) as field names.
% Get unique file numbers and set up the output struct
ufilenums = unique(filenums);
filestruct = struct;
% Loop over file numbers
for ii = 1:numel(ufilenums)
% Get files which have this number
idx = cellfun(#(r) strcmp(r, ufilenums{ii}), filenums);
% Assign the identified files to their struct field
filestruct.(['x' ufilenums{ii}]) = files(idx);
end
Now you have a neat output
% Files with numbers before .dat given a field in the output struct
filestruct.x4 = {'abc4.dat' 'def4.dat' 'ghi4.dat'}
filestruct.x5 = {'abc5.dat' 'def5.dat'}
filestruct.x6 = {'abc6.dat'}
% Files without numbers before .dat also captured
filestruct.x = {'nonum.dat'}
I have a .txt file with rows consisting of three elements, a word and two numbers, separated by commas.
For example:
a,142,5
aa,3,0
abb,5,0
ability,3,0
about,2,0
I want to read the file and put the words in one variable, the first numbers in another, and the second numbers in another but I am having trouble with textscan.
This is what I have so far:
File = [LOCAL_DIR 'filetoread.txt'];
FID_File = fopen(File,'r');
[words,var1,var2] = textscan(File,'%s %f %f','Delimiter',',');
fclose(FID_File);
I can't seem to figure out how to use a delimiter with textscan.
horchler is indeed correct. You first need to open up the file with fopen which provides a file ID / pointer to the actual file. You'd then use this with textscan. Also, you really only need one output variable because each "column" will be placed as a separate column in a cell array once you use textscan. You also need to specify the delimiter to be the , character because that's what is being used to separate between columns. This is done by using the Delimiter option in textscan and you specify the , character as the delimiter character. You'd then close the file after you're done using fclose.
As such, you just do this:
File = [LOCAL_DIR 'filetoread.txt'];
f = fopen(File, 'r');
C = textscan(f, '%s%f%f', 'Delimiter', ',');
fclose(f);
Take note that the formatting string has no spaces because the delimiter flag will take care of that work. Don't add any spaces. C will contain a cell array of columns. Now if you want to split up the columns into separate variables, just access the right cells:
names = C{1};
num1 = C{2};
num2 = C{3};
These are what the variables look like now by putting the text you provided in your post to a file called filetoread.txt:
>> names
names =
'a'
'aa'
'abb'
'ability'
'about'
>> num1
num1 =
142
3
5
3
2
>> num2
num2 =
5
0
0
0
0
Take note that names is a cell array of names, so accessing the right name is done by simply doing n = names{ii}; where ii is the name you want to access. You'd access the values in the other two variables using the normal indexing notation (i.e. n = num1(ii); or n = num2(ii);).
I have a cell array like:
>>text
'Sentence1'
'Sentence2'
'Sentence3'
Whenever I use
sprintf(fid,'%s\n',text)
I get an error saying:
'Function is not defined for 'cell' inputs.'
But if I put :
sprintf(fid,'%s\n',char(text))
It works but in the file appears all the sentences mixed all together like with no sense.
Can you recommend me what to do?
Whener I put text I get:
>>text
'Title '
'Author'
'comments '
{3x1} cell
That is why I can not use text{:}.
If you issue
sprintf('%s\n', text)
you are saying "print a string with a newline. The string is this cell array". That's not correct; a cell-array is not a string.
If you issue
sprintf('%s\n', char(text))
you are saying "print a string with a newline. The string is this cell array, which I convert to character array.". The thing is, that conversion results in a single character array, and sprintf will re-use the %s\n format only for multiple inputs. Moreover, it writes that single character array by column, meaning, all characters in the first column, concatenated horizontally with all characters from the second column, concatenated with all characters from the third column, etc.
Therefore, the approprate call to sprintf is something with multiple inputs:
sprintf(fid, '%s\n', text{:})
because the cell-expansion text{:} creates a comma-separated list from all entries in the cell-array, which is exactly what sprintf/fprintf expects.
EDIT As you indicate:, you have non-char entries in text. You have to check for that. If you want to pass only the strings into fprintf, use
fprintf(fid, '%s\n', text{ cellfun('isclass', text, 'char') })
if that {3x1 cell} is again a set of strings, so you want to write all strings recursively, then just use something like this:
function textWriter
text = {...
'Title'
'Author'
'comments'
{...
'Title2'
'Author2'
'comments2'
{...
'Title3'
'Author3'
'comments3'
}
}
}
text = cell2str(text);
fprintf(fid, '%s\n', text{:});
end
function out = cell2str(c)
out = {};
for ii = c(:)'
if iscell(ii{1})
S = cell2str(ii{1});
out(end+1:end+numel(S)) = S;
else
out{end+1} = ii{1};
end
end
end
I have a string in the following format :
fileName.jpg,10,20,10,10,...,12,14,True
Basically, I have a string with comma separated values. The first value is a string, then it follows an array of 100 values and lastly another string being true or false.
Is there a way or directly reading these values into 3 variable? Two strings and an array?
The array of values might contain n\a values which I want to treat as -1 or something similar or by using a cell array and having an empty cell for those? Can you recommend me something for this type of problem?
You can use textscan:
n = 100; % number of integers between filename and logical values
M = textscan(str, ['%s' repmat('%d',1, n) '%s'], 'delimiter', ',',...
'TreatAsEmpty', 'n\a', 'EmptyValue', -1, 'CollectOutput', true);
The result M is a cell array with the file name in the first cell, the 100 integer values in the second, and a string containing the logical value in the last cell.
You can use strsplit and extract the values from your String and store them in separate variables
Code Sample:
a = strsplit("fileName.jpg,10,20,10,10,...,12,14,True",",")
fileName = a(1)
flag = a(end)
data = a(2:end-1)
I am using xlsread in MATLAB to read in sheets from an excel file. My goal is to have each column of the excel sheet read as a numeric array. One of the columns has a mix of numbers and numbers+char. For example, the values could be 200, 300A, 450, 500A, 200A, 100. here is what I have so far:
[num, txt, raw] = xlsread(fileIn, sheets{ii}); % Reading in each sheet from a for loop
myCol = raw(:, 4) % I want all rows of column 4
for kk=1:numel(myCol)
if iscellstr(myCol(kk))
myCol(kk) = (cellfun(#(x)strrep(x, 'A', ''), myCol(kk), 'UniformOutput', false));
end
end
myCol = cell2mat(myCol);
This is able to strip off the char from the number but then I am left with:
myCol =
[200]
'300'
[450]
'500'
'200'
[100]
which errors out on cell2mat with:
cell2mat(myCol)
??? Error using ==> cell2mat at 46
All contents of the input cell array must be of the same data type.
I feel like I am probably mixing up () and {} somewhere. Can someone help me out with this?
Let me start from reading the file
[num, txt, raw] = xlsread('test.xlsx');
myCol = raw(:, 4);
idx = cellfun(#ischar,myCol ); %# find strings
data = zeros(size(myCol)); %# preallocate matrix for numeric data
data(~idx) = cell2mat(myCol(~idx)); %# convert numeric data
data(idx) = str2double(regexprep(myCol(idx),'\D','')); %# remove non-digits and convert to numeric
The variable myCol is initially a cell array containing both numbers and strings, something like this in your example:
myCol = {200; '300A'; 450; '500A'; '200A'; 100};
The steps you have to follow to convert the string entries into numeric values is:
Identify the cell entries in myCol that are strings. You can use a loop to do this, as in your example, or you can use the function CELLFUN to get a logical index like so:
index = cellfun(#ischar,myCol);
Remove the letters. If you know the letters to remove will always be 'A', as in your example, you can use a simple function like STRREP on all of your indexed cells like so:
strrep(myCol(index),'A','')
If you can have all sorts of other characters and letters in the string, then a function like REGEXPREP may work better for you. For your example, you could do this:
regexprep(myCol(index),'\D','')
Convert the strings of numbers to numeric values. You can do this for all of your indexed cells using the function STR2DOUBLE:
str2double(regexprep(myCol(index),'\D',''))
The final result of the above can then be combined with the original numeric values in myCol. Putting it all together, you get the following:
>> index = cellfun(#ischar,myCol);
>> result(index,1) = str2double(regexprep(myCol(index),'\D',''));
>> result(~index) = [myCol{~index}]
result =
200
300
450
500
200
100