Reading text file with variable number of entries in matlab - matlab

I have text file that looks like the following for example
1 12 34
67 56 78 98 98
...
Basically row of numbers, but the row size isn't fixed.
Is there a quick way to read this in matlab, and maybe store the content in a cell array?

importdata would read your file and fill up the empty spaces with NaNs, that could be used.
Or you parse your file directly:
str = fileread('file.txt'); %read your file into a string
data = cellfun(#(line) cellfun(#str2double, strsplit(line, ' '), 'UniformOutput', false), strsplit(str, '\r\n'), 'UniformOutput', false);
strsplit splits at the return statement -> first cell array
Use those cell-arrays as input and split them per space.
data = cellfun(#(line) str2double(strsplit(line, ' ')), strsplit(str, '\r\n'), 'UniformOutput', false);
You could also use the str2double to convert the cell array to numbers directly, but then it'll be a normal array, which means that the access would be data{1}(2) instead of data{1}{2} and I would prefer to keep the styles aligned.

try textscan
str = fopen('test.txt');
data = textscan(str,'%s','Delimiter','\n');
%data = textscan(str,'%f');
fclose(str);

Related

writetable without dimension names

I'm trying to write a CSV from a table using writetable with writetable(..., 'WriteRowNames', true), but when I do so, Matlab defaults to putting Row in the (1,1) cell of the CSV. I know I can change Row to another string by setting myTable.Properties.DimensionNames{1} but I can't set that to be blank and so it seems like I'm forced to have some text in that (1,1) cell.
Is there a way to leave the (1,1) element of my CSV blank and still write the row names?
There doesn't appear to be any way to set any of the character arrays in the 'DimensionNames' field to either empty or whitespace. One option is to create your .csv file as you do above, then use xlswrite to clear that first cell:
xlswrite('your_file.csv', {''}, 1, 'A1');
Even though the xlswrite documentation states that the file argument should be a .xls, it still works properly for me.
Another approach could use memmapfile to modify the leading bytes of the file in memory.
For example:
% Set up data
LastName = {'Smith';'Johnson';'Williams';'Jones';'Brown'};
Age = [38;43;38;40;49];
Height = [71;69;64;67;64];
Weight = [176;163;131;133;119];
BloodPressure = [124 93; 109 77; 125 83; 117 75; 122 80];
T = table(Age, Height, Weight, BloodPressure, 'RowNames', LastName);
% Write data to CSV
fname = 'asdf.csv';
writetable(T, fname, 'WriteRowNames', true)
% Overwrite row dimension name in the first row
% Use memmapfile to map only the dimension name to memory
tmp = memmapfile(fname, 'Writable', true, 'Repeat', numel(T.Properties.DimensionNames{1}));
tmp.Data(:) = 32; % Change to the ASCII code for a space
clear('tmp'); % Clean up
Which brings us from:
Row,Age,Height,Weight,BloodPressure_1,BloodPressure_2
Smith,38,71,176,124,93
Johnson,43,69,163,109,77
Williams,38,64,131,125,83
Jones,40,67,133,117,75
Brown,49,64,119,122,80
To:
,Age,Height,Weight,BloodPressure_1,BloodPressure_2
Smith,38,71,176,124,93
Johnson,43,69,163,109,77
Williams,38,64,131,125,83
Jones,40,67,133,117,75
Brown,49,64,119,122,80
Unfortunately not quite deleted, but it's a fun approach.
Alternatively, you can use MATLAB's low level file IO to copy everything after the row dimension name to a new file, then overwrite the original:
fID = fopen(fname, 'r');
fID2 = fopen('tmp.csv', 'w');
fseek(fID, numel(T.Properties.DimensionNames{1}), 'bof');
fwrite(fID2, fread(fID));
fclose(fID);
fclose(fID2);
movefile('tmp.csv', fname);
Which produces:
,Age,Height,Weight,BloodPressure_1,BloodPressure_2
Smith,38,71,176,124,93
Johnson,43,69,163,109,77
Williams,38,64,131,125,83
Jones,40,67,133,117,75
Brown,49,64,119,122,80
No, that is currently not supported. The only workaround I see is to use a placeholder as dimension name and to programmatically remove it from the file afterwards.
writetable(T,fileFullPath,'WriteVariableNames',false);
When specify 'WriteVariableNames' as false (default one is true), then the variable/dimension names will NOT be written in the output file.
Ref link: https://uk.mathworks.com/help/matlab/ref/writetable.html

skip header in non-rectangular matrix

I have consecutive .dat files which I want to read and input into a single matrix by concatenating the files vertically. The code I have so far works fine for simple numeric files with only tabs as delimiter.
import=[];
data=[];
for i = 1:32
data1=[import dlmread(sprintf('%d.dat',i))];
data=vertcat(data, data1);
clear data1;
end
and I take the correct output into the data matrix. But my file format is as follows:
first second third
0 11/15 08:57:42.000 54 67 82
1 11/15 09:48:47.010 49 32 31
...
As you can see I have three delimiters (: \t /) and headers only in the last three columns which are essentially the ones I want to read, that is I want a matrix:
54 67 82
49 32 31
...
I tried specifying the delimiters into the dlmwrite and how many rows/columns to skip but an error occurs in sprintf ('delimiter = sprintf(delimiter); % Interpret \t (if necessary)'). Does anyone have any idea how to go about this?
UPDATE:
I managed to get a little further
data=[];
for i = 1:32
filename = sprintf( '%d.dat',i );
data1=importdata(filename);%creates a cell array
data2=cell2mat(data1(3:end,:));%converts it to char
%The data, without the header, start from the 3rd row.
data=vertcat(data, data2); %concatenate vertically all the files
clear data1; clear data2;
end
%the data
a1=str2num(data(1:end,20:25));%the first data column is in char 20-25
a2=str2num(data(1:end,30:35));%the second data column is in char 30-35
The thing is that the last part takes too much time, over an hour has passed until I manually stopped it. Does anyone know a simpler and faster way to do this?
I managed to solve this myself so I post it here for future reference:
for i = 1:32
filename = sprintf( '%d.dat',i );
data1 = dlmread(filename,'',2,3);%start from row 2, headercolumn 3
data=vertcat(data, data1);
clear data1;
end
Now the data matrix contains only my data columns and it runs in a few seconds.

How do I read comma separated values from a .txt file in MATLAB using textscan()?

I have a .txt file with rows consisting of three elements, a word and two numbers, separated by commas.
For example:
a,142,5
aa,3,0
abb,5,0
ability,3,0
about,2,0
I want to read the file and put the words in one variable, the first numbers in another, and the second numbers in another but I am having trouble with textscan.
This is what I have so far:
File = [LOCAL_DIR 'filetoread.txt'];
FID_File = fopen(File,'r');
[words,var1,var2] = textscan(File,'%s %f %f','Delimiter',',');
fclose(FID_File);
I can't seem to figure out how to use a delimiter with textscan.
horchler is indeed correct. You first need to open up the file with fopen which provides a file ID / pointer to the actual file. You'd then use this with textscan. Also, you really only need one output variable because each "column" will be placed as a separate column in a cell array once you use textscan. You also need to specify the delimiter to be the , character because that's what is being used to separate between columns. This is done by using the Delimiter option in textscan and you specify the , character as the delimiter character. You'd then close the file after you're done using fclose.
As such, you just do this:
File = [LOCAL_DIR 'filetoread.txt'];
f = fopen(File, 'r');
C = textscan(f, '%s%f%f', 'Delimiter', ',');
fclose(f);
Take note that the formatting string has no spaces because the delimiter flag will take care of that work. Don't add any spaces. C will contain a cell array of columns. Now if you want to split up the columns into separate variables, just access the right cells:
names = C{1};
num1 = C{2};
num2 = C{3};
These are what the variables look like now by putting the text you provided in your post to a file called filetoread.txt:
>> names
names =
'a'
'aa'
'abb'
'ability'
'about'
>> num1
num1 =
142
3
5
3
2
>> num2
num2 =
5
0
0
0
0
Take note that names is a cell array of names, so accessing the right name is done by simply doing n = names{ii}; where ii is the name you want to access. You'd access the values in the other two variables using the normal indexing notation (i.e. n = num1(ii); or n = num2(ii);).

How to load a cell array that has both strings and numbers?

I have a cell array that has both strings and numbers. I want to load all the elements of the cell array. For the same I used the following method:
load(filename);
This command is loading only strings and excluding the columns that has numbers. Basically since my file is not .mat extension, it is treating it as ASCII file and loading only text.
I tried importdata(filename). But that gives me struct of 1*1. I need the elements to be imported into another cell array of same dimension.
Is there a way to load all the values?
load is used to import .mat-files with workspace variables. Since your data is not an actual .mat-file, you need to use a different method.
Let's assume you have the file filename with tab-delimited data:
str1 1
str2 2
str3 3
str4 4
To get a cell-array where the first column is a string (using %s) and the second a double (using %f), you can use textscan. Check out the result, maybe it's already what you're searching for.
filename = 'data';
F = fopen(filename, 'r');
data = textscan(F, '%s %f', 'Delimiter', '\t');
If not, you can create a cell-array CA where the first column is a string (using cellstr) and the second one is a double (using num2cell).
CA = cell(size(data{1},1),2);
CA(:,1) = cellstr(data{1});
CA(:,2) = num2cell(data{2});
Result:
CA =
'str1' [1]
'str2' [2]
'str3' [3]
'str4' [4]

Convert nonuniform cell array to numeric array

I am using xlsread in MATLAB to read in sheets from an excel file. My goal is to have each column of the excel sheet read as a numeric array. One of the columns has a mix of numbers and numbers+char. For example, the values could be 200, 300A, 450, 500A, 200A, 100. here is what I have so far:
[num, txt, raw] = xlsread(fileIn, sheets{ii}); % Reading in each sheet from a for loop
myCol = raw(:, 4) % I want all rows of column 4
for kk=1:numel(myCol)
if iscellstr(myCol(kk))
myCol(kk) = (cellfun(#(x)strrep(x, 'A', ''), myCol(kk), 'UniformOutput', false));
end
end
myCol = cell2mat(myCol);
This is able to strip off the char from the number but then I am left with:
myCol =
[200]
'300'
[450]
'500'
'200'
[100]
which errors out on cell2mat with:
cell2mat(myCol)
??? Error using ==> cell2mat at 46
All contents of the input cell array must be of the same data type.
I feel like I am probably mixing up () and {} somewhere. Can someone help me out with this?
Let me start from reading the file
[num, txt, raw] = xlsread('test.xlsx');
myCol = raw(:, 4);
idx = cellfun(#ischar,myCol ); %# find strings
data = zeros(size(myCol)); %# preallocate matrix for numeric data
data(~idx) = cell2mat(myCol(~idx)); %# convert numeric data
data(idx) = str2double(regexprep(myCol(idx),'\D','')); %# remove non-digits and convert to numeric
The variable myCol is initially a cell array containing both numbers and strings, something like this in your example:
myCol = {200; '300A'; 450; '500A'; '200A'; 100};
The steps you have to follow to convert the string entries into numeric values is:
Identify the cell entries in myCol that are strings. You can use a loop to do this, as in your example, or you can use the function CELLFUN to get a logical index like so:
index = cellfun(#ischar,myCol);
Remove the letters. If you know the letters to remove will always be 'A', as in your example, you can use a simple function like STRREP on all of your indexed cells like so:
strrep(myCol(index),'A','')
If you can have all sorts of other characters and letters in the string, then a function like REGEXPREP may work better for you. For your example, you could do this:
regexprep(myCol(index),'\D','')
Convert the strings of numbers to numeric values. You can do this for all of your indexed cells using the function STR2DOUBLE:
str2double(regexprep(myCol(index),'\D',''))
The final result of the above can then be combined with the original numeric values in myCol. Putting it all together, you get the following:
>> index = cellfun(#ischar,myCol);
>> result(index,1) = str2double(regexprep(myCol(index),'\D',''));
>> result(~index) = [myCol{~index}]
result =
200
300
450
500
200
100