Read a text file into matlab - matlab

I have a txt file containing multiple questions, and an answer (True / False) like this
A ball is squared. F
My computer is slow. T
etc.
I want to make a function [Q,A] = load.test(filename) where
Q = A cell array containing N strings.
A = A logical vector of N elements.
I have tried different ways, but none seem to work.
[Q,A] = textread(filename,'%s %s');
This output here is the closest I've come:
'A'
'is'
'F'
'My'
'is'
'T'
what do I need to do ?

In case you have more than one . per sentence, the solution of Silas won't work. Also you loose the dot that way. You can also do it as follows:
fid = fopen('questions.txt');
data = textscan(fid, '%s','delimiter','\n')
fclose(fid);
Q = cellfun(#(x) x(1:end-2), data{1}, 'uni',0);
A = cellfun(#(x) x(end), data{1}, 'uni',0);
Alternatively use:
A = cellfun(#(x) x(end) == 'T',data{1});
to get the desired logical vector.
For a text file of content:
The globe is a disk. F
42 is the answer to everything. T
you get
Q{1} = The globe is a disk.
Q{2} = 42 is the answer to everything.
A =
0
1

According to the documentation you should use textscan instead of textread
If you know that the strings are separated by a '.' or another specific delimiter you can do
parsed = textscan(file, '%s %s', 'delimiter', '.');

Related

Convert .csv to .out (complex numbers)

I have a csv file that has complex numbers.
This is sample of some numbers I have in the csv file:
(0.12825663763789857+0.20327998150393212j),(0.21890748607218197+0.160563964013564j),(0.28205414129281525+0.09884068776334366j),(0.030927026479380615+0.26334550583848626j)
I want to read this file and then save in (.out) file all the real parts in the first column and all the imaginary parts in the second column (without the imaginary letter j).
Here is one attempt. It is slightly more complicated due to the ( and ) that surround your numbers.
First, use textscan to read the file. Since I guess you don't know how many numers are in the file, read everything into a singe string. Will work with mutiple lines, too:
filename = 'data.csv';
fid = fopen(filename);
content = textscan(fid, '%s');
fclose(fid);
For this purpose, content now is a slightly weird cell array (look at the textscan-docs for details). Just initialize the variable nums which will store the numbers and loop through content (if you know a bit more about your csv file, you might pre-allocate nums):
nums = [];
for c1 = 1:numel(content{1})
Next, split the string at every occurence of ,:
string_list = strsplit(content{1}{c1},',');
This gives another cell array. Loop through it to convert the strings to numbers (and end the outer loop):
for c2 = 1 : numel(string_list)
nums(end+1) = str2num(string_list{c2});
end
end
Last, just store the real and the imaginary part of the numbers in separate columns:
out = [];
out(:,1) = real(nums);
out(:,2) = imag(nums);
and save it to data.out.
Update As you mentioned precision, you could use
dlmwrite('data.out', out, 'precision','%.20f');
However, here you need to understand the floating point representation in Matlab. In particular, try to understand the following:
>> a = 0.12825663763789857
a =
0.1283
>> fprintf('%.20f\n', a)
0.12825663763789857397
>> eps(a)
ans =
2.7756e-17
Note that one could have done this without cenverting the strings to numbers, but the way above would allow you to use the data in Matlab instead of just saving it.
HEre is an attempt without converting your strings to numbers, therefore one does not have to deal with precision. It works with negative real and imaginary numbers, too. + signs are removed when written to the new file, - signs are preserved:
filename = 'data.csv';
fid = fopen(filename);
content = textscan(fid, '%s');
fclose(fid);
fid = fopen('data.out','w');
pattern = '(?<real>-{0,1}\d+.\d+)(?<imag>[+-]\d+.\d+)j';
for c1 = 1:numel(content{1})
result = regexp(content{1}{c1}, pattern, 'names');
for c2 = 1:numel(result)
fprintf(fid, '%s,%s\n', strrep(result(c2).real,'+',''), strrep(result(c2).imag,'+',''));
end
end
fclose(fid);

Loading data to vector rather than cell in MATlab

I load data from text files into my MATlab function using this code:
data = cell(h.numDirs, numDataFilesInFirstDir);
for d = 1:h.numDirs
% Code to set fileNames, iDir
for t = 1:size(fileNames,1)
fId = fopen([iDir, '/', fileNames{t}]);
% Drop the first two lines (column headers)
for skip = 1:2
fgets(fId);
end
U_temp = fscanf(fId, '%f %f', [2, Inf]);
U_temp = U_temp'; % ' transpose (syntax highlighting on SO)
data(d, t) = {U_temp(:,2)};
fclose(fId);
end
end
The files should each have the same length (at least for varying t, usually for varying d or else I have problems later)
Should I be (/ How can I) simplify the code here to avoid (unnecessary?) cells?
I could scan the first data set, then use something like
data = zeros(h.numDirs, numDataFilesInFirstDir, lengthOfFirstFile)
but I don't know if that would be any better. Is that a 'better' solution/method?
I would use dlmread instead of fscanf. Data type is hard since your dimensions vary. I wouldn't pad arrays... any benefit from not using cells would be overcome by the extra complexity and memory hit. Cell arrays are a reasonable choice. I wouldn't worry about preallocation too much in this case actually. Below is a similar option using structs with dynamic field names that embed the source directory and filename, for later reference.
data = struct();
for d = 1: ...
for t = 1: ...
file = fullfile(iDir, fileNames{t});
range = [3, 1, inf, 2];
dlm = ' ';
Utemp = dlmread(file, dlm, range);
data.(iDir).(fileNames{t}) = Utemp(:, 2);

Read CSV with semicolon separated data with comma as decimal mark in matlab

My problem is, that I've got CSV-data of the following format:
1,000333e+003;6,620171e+001
1,001297e+003;6,519699e+001
1,002261e+003;6,444984e+001
I want to read the data into matlab, but csvread requires it to be comma separated, and I have not been able to find a solution to the comma-decimal mark. I guess I can use textscan in some way?
I'm sorry to ask such an (I think) easy question, but I hope someone can help. None of the other questions/answers in here seems to be dealing with this combination of comma and semicolon.
EDIT3 (ACCEPTED ANSWER): Using the import data button in the variable section of the home toolbar it is possible to customise how the data is imported. once that is done you can click import selection beneath the arrow and generate a script or function that will follow the same rules defined in the import data window.
--------------------------------------------------kept for reference--------------------------------------------------
You can use dlmread it works in the following format
M = dlmread(filename,';')
the filename is a string with the full path of the file unless the file is in the current working directory in which case you can just type the filename.
EDIT1: to use textscan instead, the following code should do the trick or at least most of it.
%rt is permission r for read t for open in text mode
csv_file = fopen('D:\Dev\MATLAB\stackoverflow_tests\1.csv','rt');
%the formatspec represents what the scan is 'looking'for.
formatSpec = '%s%s';
%textscan inputs work in pairs so your scanning the file using the format
%defined above and with a semicolon delimeter
C = textscan(csv_file, formatSpec, 'Delimiter', ';');
fclose(csv_file);
the result is shown.
C{1}{1} =
1,000333e+003
C{1}{2} =
1,001297e+003
C{1}{3} =
1,002261e+003
C{2}{1} =
6,620171e+001
C{2}{2} =
6,519699e+001
C{2}{3} =
6,444984e+001
EDIT2: to replace the comma with a dot and convert to a integer of type double:
[row, col] = size(C);
for kk = 1 : col
A = C{1,kk};
converted_data{1,kk} = str2double(strrep(A, ',', '.'));
end
celldisp(converted_data)
result:
converted_data{1} =
1.0e+03 *
1.0003
1.0013
1.0023
converted_data{2} =
66.2017
65.1970
64.4498
% Data is in C:\temp\myfile.csv
fid = fopen('C:\temp\myfile.csv');
data = textscan(fid, '%s%s', 'delimiter', ';');
fclose(fid);
% Convert ',' to '.'
data = cellfun( #(x) str2double(strrep(x, ',', '.')), data, 'uniformoutput', false);
data =
[3x1 double] [3x1 double]
data{1}
ans =
1.0e+03 *
1.000333000000000
1.001297000000000
1.002261000000000
data{2}
ans =
66.201710000000006
65.196990000000000
64.449839999999995

Save a sparse array in csv

I have a huge sparse matrix a and I want to save it in a .csv. I can not call full(a) because I do not have enough ram memory. So, calling dlmwrite with full(a) argument is not possible. We must note that dlmwrite is not working with sparse formatted matrices.
The .csv format is depicted below. Note that the first row and column with the characters should be included in the .csv file. The semicolon in the (0,0) position of the .csv file is necessary too.
;A;B;C;D;E
A;0;1.5;0;1;0
B;2;0;0;0;0
C;0;0;1;0;0
D;0;2.1;0;1;0
E;0;0;0;0;0
Could you please help me to tackle this problem and finally save the sparse matrix in the desired form?
You can use csvwrite function:
csvwrite('matrix.csv',a)
You could do this iteratively, as follows:
A = sprand(20,30000,.1);
delimiter = ';';
filename = 'filecontaininghugematrix.csv';
dims = size(A);
N = max(dims);
% create names first
idx = 1:26;
alphabet = dec2base(9+idx,36);
n = ceil(log(N)/log(26));
q = 26.^(1:n);
names = cell(sum(q),1);
p = 0;
for ii = 1:n
temp = repmat({idx},ii,1);
names(p+(1:q(ii))) = num2cell(alphabet(fliplr(combvec(temp{:})')),2);
p = p + q(ii);
end
names(N+1:end) = [];
% formats for writing
headStr = repmat(['%s' delimiter],1,dims(2));
headStr = [delimiter headStr(1:end-1) '\n'];
lineStr = repmat(['%f' delimiter],1,dims(2));
lineStr = ['%s' delimiter lineStr(1:end-1) '\n'];
fid = fopen(filename,'w');
% write header
header = names(1:dims(2));
fprintf(fid,headStr,header{:});
% write matrix rows
for ii = 1:dims(1)
row = full(A(ii,:));
fprintf(fid, lineStr, names{ii}, row);
end
fclose(fid);
The names cell array is quite memory demanding for this example. I have no time to fix that now, so think about this part yourself if it is really a problem ;) Hint: just write the header element wise, first A;, then B; and so on. For the rows, you can create a function that maps the index ii to the desired character, in which case the complete first part is not necessary.

Reading parameters from a text file into the workspace

I have a file which has the following information:
% ---------------------- location details --------------------------
%
% lat : latitude [minimum = -90, maximum = 90, unit =
% degrees north]
% lon : longitude [ minimum = -360, maximum = 360, unit =
% deg east]
% z: altitude (above sea level, m)
%---------------------------------------------------------------
% location:
lat = 54.35
lon = -2.9833
This is a small section of the file.
I would like to read some of this information into MATLAB, where the information can then be used to perform some calculations. The part of the file that I would like to read into MATLAB are those in the text file that are not commented, i.e have a % at the start of the line, and the variable should then be saved in the workspace. For example, I would like to have:
lat = 54.35
lon = -2.9833
in the workspace.
How would I go about this? I have read about textscan and fopen, although these don't really seem to help me in this instance.
The quick-and-dirty approach
The simplest solution I could think of to read this file indeed employs textscan :) and since the lines are conveniently written in valid MATLAB syntax, you could use eval later to evaluate them. Start by reading each line as one string (ignoring the comments in the header)
fid = fopen(filename);
C = textscan(fid, '%s', 'Delimiter', '', 'CommentStyle', '%')
fclose(fid);
Then feed the lines one by one into eval to create the variables in the MATLAB workspace:
cellfun(#eval, C{1});
What this does is interpret the line as a MATLAB command, i.e create variables as named in the file and assign the appropriate values. If you want to suppress the output of eval, you can use evalc instead to "absorb the output":
cellfun(#evalc, C{1}, 'UniformOutput', false);
This should work for your basic example, but it would fail if you have more than one instance of any parameter. Also note that the eval family is notoriously slow.
A more robust approach
If the lines in your file structure have the parameter name = number pattern, you can read the lines more intelligently:
fid = fopen(filename);
C = textscan(fid, '%[^= ]%*[= ]%f', 'CommentStyle', '%')
fclose(fid);
The %[^= ] in the pattern matches the first characters until the first space or equality sign. The %*[ =] ignores the equality sign and any trailing spaces, and then the numerical value is matched with %f. The resulting cell array C stores the parameter names in the first cell and their corresponding values in the second cell.
Now it's up to you to manipulate the parsed data. For instance, to extract all values of lat and lon, you can do this:
lat = C{2}(strcmp(C{1}, 'lat'));
lon = C{2}(strcmp(C{1}, 'lon'));
If you have more than one "lat" line, lat will be an array holding all these values.
Here's another quick and dirty way:
fp = fopen('foo.txt');
found = 1;
while ~feof(fp)
line = fgetl(fp);
if (line(1) ~= '%') && ischar(line)
value(found) = sscanf(line,'%*s %*s %f');
found = found + 1;
end
end
The %*s skips the 'lat' or 'long' and the '='.
The example you provided is kinda well-behaved, therefore the following solution might need some tailoring. However, I would recommend it against any eval():
% Read whole file ignoring lines that start with '%' and using '=' as delimiter
fid = fopen('test.txt');
s = textscan(fid,'%s%f', 'CommentStyle','%','Delimiter','=');
fclose(fid);
% Identify lines with latitude and those with longitude
idxLat = strncmpi('lat',s{1},3);
idxLon = strncmpi('lon',s{1},3);
% Store all latitudes and longitudes
lat = s{2}(idxLat);
lon = s{2}(idxLon);
Gets you a structure with field names matching parameter names, accepts comma-separated lists. List any parameters that should stay as strings in char_params
char_params={};
fid = fopen(filename);
% Load lines into cell (1x1) containing cell array s (Nx1),
% skipping lines starting with % and cutting off anything after % in a line
s = textscan(fid,'%s', 'CommentStyle','%','Delimiter','%');
fclose(fid);
% access the lines strings s{1}, split across '=' and remove whitespace on both sides
s=strtrim(split(s{1},'='));
% Interpret parameters and save to structure
for ind=1:length(s)
% User says which parameters are strings
if any(strcmpi(s{ind,1},char_params))
file_struct.(s{ind,1})=s{ind,2};
% Otherwise, assume they are numbers or numeric row arrays
else
% remove parentheses and brackets
trim_s=regexprep(s(ind,2),'[[]()]','');
% convert comma-separated lists into row arrays
file_struct.(s{ind,1})=str2double(split(trim_s{1},',')).';
end
end