matlab reading mixed data from file - matlab

I am pretty new to matlab. I've been reading the documentation but can figure it out why matlab does not correctly read the string from file. What I am trying to do is to read a mixed data type from file. Some sample data is:
t a e incl lasc aper meanan truean rupnode rdnnode name
0.000000 1.2712052487 0.8899021688 22.2458 265.2511471042 322.1539251184 -13.6281352271 -130.986 0.155342 0.889756 phaet_000018
0.000000 1.2712052478 0.8899021575 22.2458 265.2511428392 322.1539270642 -13.6281369694 -130.986 0.155342 0.889756 phaet_000044
0.000000 1.2712052496 0.8899021868 22.2458 265.2511587897 322.1539149438 -13.6281365049 -130.986 0.155342 0.889755 phaet_000006
The first line is header. So here is what I've done so far:
fid = fopen('data.dat');
header = fgetl(fid); # I read the header
Now I read the data:
data = fscanf(fid,'%f %f %f %f %f %f %f %f %f %f %s',[11 inf]);
data1 = data';
fclose(fid);
I can now access the first element as:
data1(1,1)
However, when I do:
data(1,11)
instead of phaet_000018 I am getting a number (112). Any idea what I am doing wrong?

There are a few issues with your code.
First, your sizeA input to fscanf is backwards. sizeA with a vector input is defined as:
Read at most m*n numeric values or character fields. n can be Inf, but m cannot. The output, A, is m-by-n, filled in column order.
So you've asked fscanf to give you 11 rows and whatever number of columns. You can't have an Inf row specification so you'll want to remove the third input entirely and reshape your data afterwards.
For example:
fid = fopen('data.dat');
header = fgetl(fid);
data = fscanf(fid,'%f %f %f %f %f %f %f %f %f %f %s');
fclose(fid);
% We just happen to know this explicitly, not knowledge to generally assume
ncols = 22;
% Reshape and transpose
data = reshape(data, ncols, []).';
Gives us a 3 x 22 data array, which is kinda sorta what we want.
So where are the extra columns coming from? For %s fields, fscanf reads the string until it encounters whitespace. Because the output of fscanf is a numeric array it must convert this string into a numeric value, so it converts each character to its numeric equivalent (double(letter)) and outputs that into the matrix.
Using our above data matrix as an example, we have:
>> char(data(1, 11:end))
ans =
phaet_000018
With this in mind, your initial code only happens to work because all of your strings are the same length. If we change the length of one or more of the strings, this data import will fail:
Error using reshape
Product of known dimensions, 22, not divisible into total number of elements, 65.
Error in testcode (line 11)
data = reshape(data, ncols, []).';
So what can we do instead? If you need this string from your data I would recommend trying textscan:
fid = fopen('data.dat');
header = fgetl(fid);
data = textscan(fid, '%f %f %f %f %f %f %f %f %f %f %s');
fclose(fid);
This will read your data into a 1x11 cell array, where each column corresponds to a column in your data:
>> data{1} % t
ans =
0
0
0
To collect your numeric data you can iterate through the cell array, or you can use the 'CollectOutput' flag in textscan:
fid = fopen('data.dat');
header = fgetl(fid);
data = textscan(fid, '%f %f %f %f %f %f %f %f %f %f %s', 'CollectOutput', true);
fclose(fid);
Which will output a 1x2 cell array, where data{1} is your numeric array and data{2} is a cell array containing your strings:
>> data{1} % Numeric data
ans =
0 1.2712 0.8899 22.2458 265.2511 322.1539 -13.6281 -130.9860 0.1553 0.8898
0 1.2712 0.8899 22.2458 265.2511 322.1539 -13.6281 -130.9860 0.1553 0.8898
0 1.2712 0.8899 22.2458 265.2512 322.1539 -13.6281 -130.9860 0.1553 0.8898
>> data{2} % Strings
ans =
3×1 cell array
'phaet_000018'
'phaet_000044'
'phaet_000006'

Related

Dimensions of matrices being concatenated are not consistent

i read a csv file with textscan and when i want write in a file i receive this error : Error using horzcat. Dimensions of matrices being concatenated are not consistent.
if i change the first format in textscan (i mean %S) to %f the error vanishes.
the error occurs when matlab want to make [datatest{1} probability]
probability is 1000*1 double
datatest{1} is 1000*1 cell
datatest=textscan(FileID,'%s %*f %f %f %*s %*s %*s %*s %*s %*s %*s %*s %*s %f %f %f %f %f %f %f %f %f %f',1000,'headerlines',1,'delimiter',',');
csvwrite('output.csv',[datatest{1} probability]);
Your variable datatest{1} contains 1000 cells which each contains a string (may be or may be not the same length).
In your statement [datatest{1} probability] you are trying to concatenate cells (containing strings) with double numeric type, this does not work. The concatenation operator needs to operate on data of similar type.
Now even if you were to create a cell array which would contain all your desired columns myCellArray={datatest{1} probability}, this would not help you because the output of that cannot be passed on the function csvwrite.
csvwrite, or the better sister dlmwrite, do not accept cell arrays. You would have to convert the cell values into numeric values. Unfortunately, you want to write strings and numeric values, so your only way is to use low level functions like fprintf
In your case, to write the file you were expecting, you can use the following code.
col1 = datatest{1} ; %// extract the column of interest for easier indexing later on
fidw = fopen('output.csv','w') ; %// get a handle on a file to write (necessary with "fprintf")
for iline = 1:numel(probability) %// loop on each line
fprintf( fidw , '%s, %f\n' , col1{iline} , probability(iline) ) ; %// write the line
end
fclose(fidw) ; %// close the file - IMPORTANT - (necessary with "fprintf")

Reading large text files with insuficient RAM in Matlab

I want to read a large text file of about 2GB and perform a series of operations on that data. Following approach
tic
fid=fopen(strcat(Name,'.dat'));
C=textscan(fid, '%d%d%f%f%f%d');
fclose(fid);
%Extract cell values
y=C{1}(1:Subsampling:end)/Subsampling;
x=C{2}(1:Subsampling:end)/Subsampling;
%...
Reflectanse=C{6}(1:Subsampling:end);
Overlap=round(Overlap/Subsampling);
fails immediatly after reading C (C=textscan(fid, '%d%d%f%f%f%d');) with a strange peak in my memory usage:
What would be the best way to import a file of this size? Is there a way to use textscan() to read individual parts of a text file, or are there any other functions better suited for this task?
Edit: Every column in the textscan contains an information field information for 3D-Points:
width hieght X Y Z Grayscale
345 453 3.422 53.435 0.234 200
346 453 3.545 52.345 0.239 200
... % and so on for ~40 millon points
If you can process each row individually then the following code will allow you to do that. I've included start_line and end_line if you want to specify a block of data.
headerSpec = '%s %s %s %s %s %s';
dataSpec = '%f %f %f %f %f %f';
fid=fopen('data.dat');
% Read Header
names = textscan(fid, headerSpec, 1, 'delimiter', '\t');
k = 0;
% specify a start and end line for getting a block of data
start_line = 2;
end_line = 3;
while ~feof(fid)
k=k+1;
if k < start_line
continue;
end
if k > end_line
break;
end
% get data
C = textscan(fid, dataSpec, 1, 'delimiter', '\t');
row = [C{1:6}]; % convert data from cell to vector
% do what you want with the row
end
fclose(fid);
There is the possibility of reading in the entire file, but this will depend on the amount of memory you have available and if matlab has any restrictions in place. This can be seen by typing memory at the command window.

Matlab reading txt formatted file

If there is a .txt file in the format
Name, Home, 1, 2, 3, 3, 3, 3
It means the first two columns are string, and the rest are integers
How do I read first two column as vectors of strings, and another matrix as numeric values.
One way of doing this so you know exactly what's happening line by line is in the following piece of code:
fid = fopen('textfile.txt');
clear data
tline = fgetl(fid);
n = 1;
while ischar(tline)
data(n,:) = strsplit(tline(1:end),', ');
n=n+1;
tline = fgetl(fid);
end
fclose(fid);
dataStrings = data(:,1:2);
dataValues = str2double(data(:,3:end));
where data contains everything in string type, dataStrings contains only first 2 columns as strings, and dataValues contains the rest of the columns as type double.
This way you get simple matrices, meaning you don't have to worry yourself with structures or cell arrays.
Use textscan:
fileID = fopen('sometextfile.txt');
C = textscan(fileID,'%s %s %f %f %f %f %f %f','Delimiter',','); % assuming you want double data types, change as required
fclose(fileID);
celldisp(C) % C is a cell array

textscan Unexpected Empty Cell with valid Format string

I am reading a tab-delimited file. Five representative lines of this file are:
Date Time Property Path 1 Path 2 Path 3 Path 4 Path 5 Path 6 Path 7 Path 8
Lev 1 Lev 1 Lev 1 Lev 1 Lev 1 Lev 1 Lev 1 Lev 1
1/1 00:00:00 F1 (sm³/s) -1.3405E-003 -1.1170E-002 -1.0123E-004 9.7769E-003 -8.4673E-004 1.1710E-003 2.6890E-004 2.2413E-003
1/1 01:00:00 F1 (sm³/s) 1.9988E-004 1.6655E-003 2.2252E-004 1.6883E-003 1.8612E-003 2.0221E-004 2.0795E-004 1.7333E-003
1/1 02:00:00 F1 (sm³/s) -4.0722E-004 -3.3931E-003 -4.4324E-004 -2.1177E-003 -3.7075E-003 -2.5364E-004 -3.7330E-004 -3.1115E-003
When I use the following format string I get the expected results:
test = '1/1 00:00:00 F1 (sm³/s) -1.3405E-003 -1.1170E-002 -1.0123E-004 9.7769E-003 -8.4673E-004 1.1710E-003 2.6890E-004 2.2413E-003';
textscan(test, '%*s %*s %*s %*s %f %f %f %f %f %f %f %f')
Gives me:
ans =
[-0.0013] [-0.0112] [-1.0123e-04] [0.0098] [-8.4673e-04] [0.0012] [2.6890e-04] [0.0022]
Which is what I want, but when I attempt:
C = textscan(fid,...
'%*s %*s %*s %*s %f %f %f %f %f %f %f %f',...
'CollectOutput', false,...
'Headerlines', 2);
I get a 1x8 cell of empty cells.
What is the error in the format string translation?
I don't think there's anything wrong with your format string specifically.
Try pulling in the lines individually with fgetl or similar and just check that there's nothing you weren't expecting in the file. For example - your code seems to work for me but I can replicate your error by putting an additional blank line at the start of the file, which causes textscan to try and read the second header line as a line of data (and fail inelegantly). That particular error can be removed by increasing the value of HeaderLines.
fid = fopen('test.txt');
fgetl(fid) % repeat until you see your first line of data
Now, I try to use you code and it's work!
file=('d.txt');
fid=fopen(file);
C = textscan(fid,...
'%*s %*s %*s %*s %f %f %f %f %f %f %f %f',...
'CollectOutput', false,...
'Headerlines', 2);
Output:
celldisp(C)
C{1} =
-0.0013405
0.00019988
-0.00040722
C{2} =
-0.01117
0.0016655
-0.0033931
C{3} =
-0.00010123
0.00022252
-0.00044324
C{4} =
0.0097769
0.0016883
-0.0021177
C{5} =
-0.00084673
0.0018612
-0.0037075
C{6} =
0.001171
0.00020221
-0.00025364
C{7} =
0.0002689
0.00020795
-0.0003733
C{8} =
0.0022413
0.0017333
-0.0031115
I came across a problem where my textscan would only grab empty cell arrays, google search led me here. I solved it by using fgetl(fid) a couple of times and then frewind(fid), (fid being your variable for fopen) something about reading the lines made it easier to bring in the values.

Read part of a text-based file

I had a text-based file with .ptx suffix. It contains the point cloud information please see the following example
100
50
0.352 -5.207 -0.823 0.238 61 61 61
0.345 -5.202 -0.824 0.234 60 60 60
...
Question:
How can I load the file from the third row (ignore the first two rows) and save is as a matrix.
I would recommend using textscan.
Something like:
in = textscan('sample.ptx','%f %f %f %f %f %f %f','HeaderLines',2)
You can specify a number of header lines to skip using 'HeaderLines'. The %f refers to the format of the input data. Hope that helps.
Here is a full example of how to apply textscan and transform the result in to a matrix:
fid = fopen('textscantest.txt','r');
assert(fid~=1); % verify file is opened
C = textscan(fid,'%f %f %f %f %f %f %f','HeaderLines',2);
fclose(fid);
M = [C{:}]
Note that since you want it all in the same matrix, you need the same data type and all %f is required for each column.