Matlab reading txt formatted file - matlab

If there is a .txt file in the format
Name, Home, 1, 2, 3, 3, 3, 3
It means the first two columns are string, and the rest are integers
How do I read first two column as vectors of strings, and another matrix as numeric values.

One way of doing this so you know exactly what's happening line by line is in the following piece of code:
fid = fopen('textfile.txt');
clear data
tline = fgetl(fid);
n = 1;
while ischar(tline)
data(n,:) = strsplit(tline(1:end),', ');
n=n+1;
tline = fgetl(fid);
end
fclose(fid);
dataStrings = data(:,1:2);
dataValues = str2double(data(:,3:end));
where data contains everything in string type, dataStrings contains only first 2 columns as strings, and dataValues contains the rest of the columns as type double.
This way you get simple matrices, meaning you don't have to worry yourself with structures or cell arrays.

Use textscan:
fileID = fopen('sometextfile.txt');
C = textscan(fileID,'%s %s %f %f %f %f %f %f','Delimiter',','); % assuming you want double data types, change as required
fclose(fileID);
celldisp(C) % C is a cell array

Related

matlab reading mixed data from file

I am pretty new to matlab. I've been reading the documentation but can figure it out why matlab does not correctly read the string from file. What I am trying to do is to read a mixed data type from file. Some sample data is:
t a e incl lasc aper meanan truean rupnode rdnnode name
0.000000 1.2712052487 0.8899021688 22.2458 265.2511471042 322.1539251184 -13.6281352271 -130.986 0.155342 0.889756 phaet_000018
0.000000 1.2712052478 0.8899021575 22.2458 265.2511428392 322.1539270642 -13.6281369694 -130.986 0.155342 0.889756 phaet_000044
0.000000 1.2712052496 0.8899021868 22.2458 265.2511587897 322.1539149438 -13.6281365049 -130.986 0.155342 0.889755 phaet_000006
The first line is header. So here is what I've done so far:
fid = fopen('data.dat');
header = fgetl(fid); # I read the header
Now I read the data:
data = fscanf(fid,'%f %f %f %f %f %f %f %f %f %f %s',[11 inf]);
data1 = data';
fclose(fid);
I can now access the first element as:
data1(1,1)
However, when I do:
data(1,11)
instead of phaet_000018 I am getting a number (112). Any idea what I am doing wrong?
There are a few issues with your code.
First, your sizeA input to fscanf is backwards. sizeA with a vector input is defined as:
Read at most m*n numeric values or character fields. n can be Inf, but m cannot. The output, A, is m-by-n, filled in column order.
So you've asked fscanf to give you 11 rows and whatever number of columns. You can't have an Inf row specification so you'll want to remove the third input entirely and reshape your data afterwards.
For example:
fid = fopen('data.dat');
header = fgetl(fid);
data = fscanf(fid,'%f %f %f %f %f %f %f %f %f %f %s');
fclose(fid);
% We just happen to know this explicitly, not knowledge to generally assume
ncols = 22;
% Reshape and transpose
data = reshape(data, ncols, []).';
Gives us a 3 x 22 data array, which is kinda sorta what we want.
So where are the extra columns coming from? For %s fields, fscanf reads the string until it encounters whitespace. Because the output of fscanf is a numeric array it must convert this string into a numeric value, so it converts each character to its numeric equivalent (double(letter)) and outputs that into the matrix.
Using our above data matrix as an example, we have:
>> char(data(1, 11:end))
ans =
phaet_000018
With this in mind, your initial code only happens to work because all of your strings are the same length. If we change the length of one or more of the strings, this data import will fail:
Error using reshape
Product of known dimensions, 22, not divisible into total number of elements, 65.
Error in testcode (line 11)
data = reshape(data, ncols, []).';
So what can we do instead? If you need this string from your data I would recommend trying textscan:
fid = fopen('data.dat');
header = fgetl(fid);
data = textscan(fid, '%f %f %f %f %f %f %f %f %f %f %s');
fclose(fid);
This will read your data into a 1x11 cell array, where each column corresponds to a column in your data:
>> data{1} % t
ans =
0
0
0
To collect your numeric data you can iterate through the cell array, or you can use the 'CollectOutput' flag in textscan:
fid = fopen('data.dat');
header = fgetl(fid);
data = textscan(fid, '%f %f %f %f %f %f %f %f %f %f %s', 'CollectOutput', true);
fclose(fid);
Which will output a 1x2 cell array, where data{1} is your numeric array and data{2} is a cell array containing your strings:
>> data{1} % Numeric data
ans =
0 1.2712 0.8899 22.2458 265.2511 322.1539 -13.6281 -130.9860 0.1553 0.8898
0 1.2712 0.8899 22.2458 265.2511 322.1539 -13.6281 -130.9860 0.1553 0.8898
0 1.2712 0.8899 22.2458 265.2512 322.1539 -13.6281 -130.9860 0.1553 0.8898
>> data{2} % Strings
ans =
3×1 cell array
'phaet_000018'
'phaet_000044'
'phaet_000006'

Matlab - string containing a number and equal sign

I have a data file that contains parameter names and values with an equal sign in between them. It's like this:
A = 1234
B = 1353.335
C =
D = 1
There is always one space before and after the equal sign. The problem is some variables don't have values assigned to them like "C" above and I need to weed them out.
I want to read the data file (text) into a cell and just remove the lines with those invalid statements or just create a new data file without them.
Whichever is easier, but I will eventually read the file into a cell with textscan command.
The values (numbers) will be treated as double precision.
Please, help.
Thank you,
Eric
Try this:
fid = fopen('file.txt'); %// open file
x = textscan(fid, '%s', 'delimiter', '\n'); %// or '\r'. Read each line into a cell
fclose(fid); %// close file
x = x{1}; %// each cell of x contains a line of the file
ind = ~cellfun(#isempty, regexp(x, '=\s[\d\.]+$')); %// desired lines: space, numbers, end
x = x(ind); %// keep only those lines
If you just want to get the variables, and reject lines that do not have any character, this might work (the data.txt is just a txt generated by the example of data you have given):
fid = fopen('data.txt');
tline = fgets(fid);
while ischar(tline)
tmp = cell2mat(regexp(tline,'\=(.*)','match'));
b=str2double(tmp(2:end));
if ~isnan(b)
disp(b)
end
tline = fgets(fid);
end
fclose(fid);
I am reading the txt file line by line, and using general expressions to get rid of useless chars, and then converting to double the value read.

Read in a file and skip lines that start with a specific string

I'm tryng to read in a text file with Matlab.
The file is in this format:
string number number
string number number
....
I'd like to skip the lines which start with a specific string. For any other string, I want to save the two numbers in that line.
Let's take this sample file file.txt:
badstring 1 2
badstring 3 4
goodstring 5 6
badstring 7 8
goodstring 9 10
If a line starts with badstring we skip it, otherwise we store the two numbers following the string.
fid = fopen('file.txt');
nums = textscan(fid, '%s %f %f');
fclose(fid);
ind = find(strcmp(nums{1},'badstring'));
nums = cell2mat(nums(:,2:end));
nums(ind,:) = [];
display(nums)
This will read the entire file into a cell array, then convert it to a matrix (without the strings), and then kill any row which originally started with badstring. Alternatively, if the file is very large, you can avoid the temporary storage of all the lines with this iterative solution:
fid = fopen('file.txt');
line = fgetl(fid);
numbers = [];
while line ~= -1 % read file until EOF
line = textscan(line, '%s %f %f');
if ~strcmp(line{1}, 'badstring')
numbers = [numbers; line{2} line{3}];
end
line = fgetl(fid);
end
fclose(fid);
display(numbers)

read a txt file to matrix and cellarray Matlab

I have a txt file with those entries and I would like to know how to get the numerical values from the second column until the last column in a matrix and the first column in a cell array.
I've tried with import data and fscanf and I dont understand what's going on.
CP6 7,2 -2,7 6,6
P5 -5,8 -5,9 5,8
P6 5,8 -5,9 5,8
AF7 -5,0 7,2 3,6
AF8 5,0 7,2 3,6
FT7 -7,6 2,8 3,6
This should give you what you want based on the text sample you supplied.
fileID = fopen('x.txt'); %open file x.txt
m=textscan(fileID,'%s %d ,%d %d ,%d %d ,%d');
fclose(fileID); %close file
col1 = m{1,1}; %get first column into cell array col1
colRest = cell2mat(m(1,2:6)); %convert rest of columns into matrix colRest
Lookup textscan for more info on reading specially formatted data
This function should do the trick. It reads your file and scans it according to your pattern. Then, put the first column in a cell array and the others in a matrix.
function [ C1,A ] = scan_your_txt_file( filename )
fid = fopen(filename,'rt');
C = textscan(fid, '%s %d,%d %d,%d %d,%d');
fclose(fid);
C1 = C{1};
A = cell2mat(C(2:size(C,2)));
end
Have you tried xlsread? It makes a numeric array and two non-numeric arrays.
[N,T,R]=xlsread('yourfilename.txt')
but your data is not comma delimited. It also looks like you are using a comma to represent a decimal point. Does this array have 7 columns or 4? Because I'm in the US, I'm going to assume you have paired coordinates and the comma is one kind of delimiter while the space is a second one.
So here is something klugy, but it works. It is a gross ugly hack, but it works.
%housekeeping
clc
%get name of raw file
d=dir('*22202740*.txt')
%translate from comma-as-decimal to period-as-decimal
fid = fopen(d(1).name,'r') %source
fid2= fopen('myout.txt','w+') %sink
while 1
tline = fgetl(fid); %read
if ~ischar(tline), break, end %end loop
fprintf(fid2,'%s\r\n',strrep(tline,',','.')) %write updated line to output
end
fclose(fid)
fclose(fid2)
%open, gulp, parse/store, close
fid3 = fopen('myout.txt','r');
C=textscan(fid3,'%s %f %f %f ');
fclose(fid3);
%measure waist size and height
[n,m]=size(C);
n=length(C{1});
%put in slightly more friendly form
temp=zeros(n,m);
for i=2:m
t0=C{i};
temp(:,i)=t0;
end
%write to excel
xlswrite('myout_22202740.xlsx',temp(:,2:end),['b1:' char(96+m) num2str(n)]);
xlswrite('myout_22202740.xlsx',C{1},['a1:a' num2str(n)])
%read from excel
[N,T,R]=xlsread('myout_22202740.xlsx')
If you want those commas to be decimal points, then that is a different question.

Reading CSV with mixed type data

I need to read the following csv file in MATLAB:
2009-04-29 01:01:42.000;16271.1;16271.1
2009-04-29 02:01:42.000;2.5;16273.6
2009-04-29 03:01:42.000;2.599609;16276.2
2009-04-29 04:01:42.000;2.5;16278.7
...
I'd like to have three columns:
timestamp;value1;value2
I tried the approaches described here:
Reading date and time from CSV file in MATLAB
modified as:
filename = 'prova.csv';
fid = fopen(filename, 'rt');
a = textscan(fid, '%s %f %f', ...
'Delimiter',';', 'CollectOutput',1);
fclose(fid);
But it returs a 1x2 cell, whose first element is a{1}='ÿþ2', the other are empty.
I had also tried to adapt to my case the answers to these questions:
importing data with time in MATLAB
Read data files with specific format in matlab and convert date to matal serial time
but I didn't succeed.
How can I import that csv file?
EDIT After the answer of #macduff i try to copy-paste in a new file the data reported above and use:
a = textscan(fid, '%s %f %f','Delimiter',';');
and it works.
Unfortunately that didn't solve the problem because I have to process csv files generated automatically, which seems to be the cause of the strange MATLAB behavior.
What about trying:
a = textscan(fid, '%s %f %f','Delimiter',';');
For me I get:
a =
{4x1 cell} [4x1 double] [4x1 double]
So each element of a corresponds to a column in your csv file. Is this what you need?
Thanks!
Seems you're going about it the right way. The example you provide poses no problems here, I get the output you desire. What's in the 1x2 cell?
If I were you I'd try again with a smaller subset of the file, say 10 lines, and see if the output changes. If yes, then try 100 lines, etc., until you find where the 4x1 cell + 4x2 array breaks down into the 1x2 cell. It might be that there's an empty line or a single empty field or whatever, which forces textscan to collect data in an additional level of cells.
Note that 'CollectOutput',1 will collect the last two columns into a single array, so you'll end up with 1 cell array of 4x1 containing strings, and 1 array of 4x2 containing doubles. Is that indeed what you want? Otherwise, see #macduff's post.
I've had to parse large files like this, and I found I didn't like textscan for this job. I just use a basic while loop to parse the file, and I use datevec to extract the timestamp components into a 6-element time vector.
%% Optional: initialize for speed if you have large files
n = 1000 %% <# of rows in file - if known>
timestamp = zeros(n,6);
value1 = zeros(n,1);
value2 = zeros(n,1);
fid = fopen(fname, 'rt');
if fid < 0
error('Error opening file %s\n', fname); % exit point
end
cntr = 0
while true
tline = fgetl(fid); %% get one line
if ~ischar(tline), break; end; % break out of loop at end of file
cntr = cntr + 1;
splitLine = strsplit(tline, ';'); %% split the line on ; delimiters
timestamp(cntr,:) = datevec(splitLine{1}, 'yyyy-mm-dd HH:MM:SS.FFF'); %% using datevec to parse time gives you a standard timestamp vector
value1(cntr) = splitLine{2};
value2(cntr) = splitLine{3};
end
%% Concatenate at the end if you like
result = [timestamp value1 value2];