MATLAB Unwanted conversion from double to int32 after indexing - matlab

I have this code to load data from text file, four columns - string, 3 integer columns and decimal number.
fileID = fopen(Files(j,1).name);
formatSpec = '%*s %d %d %d %f';
data = zeros(0, 4, 'double');
k = 0;
while ~feof(fileID)
k = k+1;
temp = textscan(fileID,formatSpec,10,'HeaderLines',1);
data = [data; [ temp{:} ] ] ;
end
fclose(fileID);
In a temp variable, the last column is saved as decimal number, however, the command
data = [data; [ temp{:} ] ]
has as consequence its rounding and I get 0 or 1 in last column.
I am asking why is that?
temp looks like (1x4 cell):
[5248000;5248100;....] [5248100;5248200;....]
[111;95;....] [0,600000000000000;0,570000000000000;....]
data then looks like (1x4 matrix):
5248000 5248100 111 1
5248100 5248200 95 1
EDIT:
Retesting (recreation of the same variable, just copied numbers from variable editor)
temp=cell(1,4);
temp{1}=[0;100;200;300;400;500;600;700;800;900];
temp{2}=[100;200;300;400;500;600;700;800;900;1000];
temp{3}=[143;155;150;128;121;122;137;145;145;126];
temp{4}=[0.340000000000000;0.450000000000000;0.370000000000000;...
0.570000000000000;0.570000000000000;0.580000000000000;...
0.590000000000000;0.500000000000000;0.460000000000000;0.480000000000000];
tempx=[temp{:}]
This makes it correctly! The last columnd is decimal.
But why it doesn't work on "real" data from textscan function?

Consider reading everything as floating point values:
change
formatSpec = '%*s %d %d %d %f';
to
formatSpec = '%*s %f %f %f %f';
If works in your EDIT because your variables are already of type double.

Concatenating double and int data casts to the int type. For example:
>> [pi; int16(5)]
ans =
2×1 int16 column vector
3
5
To avoid that, you can cast to double before concatenating. So in your case use something like the following to convert each cell's content to double(thanks to #CrisLuengo for a correction):
[ data; cellfun(#double, temp, 'uniformoutput', true) ]

Related

matlab reading mixed data from file

I am pretty new to matlab. I've been reading the documentation but can figure it out why matlab does not correctly read the string from file. What I am trying to do is to read a mixed data type from file. Some sample data is:
t a e incl lasc aper meanan truean rupnode rdnnode name
0.000000 1.2712052487 0.8899021688 22.2458 265.2511471042 322.1539251184 -13.6281352271 -130.986 0.155342 0.889756 phaet_000018
0.000000 1.2712052478 0.8899021575 22.2458 265.2511428392 322.1539270642 -13.6281369694 -130.986 0.155342 0.889756 phaet_000044
0.000000 1.2712052496 0.8899021868 22.2458 265.2511587897 322.1539149438 -13.6281365049 -130.986 0.155342 0.889755 phaet_000006
The first line is header. So here is what I've done so far:
fid = fopen('data.dat');
header = fgetl(fid); # I read the header
Now I read the data:
data = fscanf(fid,'%f %f %f %f %f %f %f %f %f %f %s',[11 inf]);
data1 = data';
fclose(fid);
I can now access the first element as:
data1(1,1)
However, when I do:
data(1,11)
instead of phaet_000018 I am getting a number (112). Any idea what I am doing wrong?
There are a few issues with your code.
First, your sizeA input to fscanf is backwards. sizeA with a vector input is defined as:
Read at most m*n numeric values or character fields. n can be Inf, but m cannot. The output, A, is m-by-n, filled in column order.
So you've asked fscanf to give you 11 rows and whatever number of columns. You can't have an Inf row specification so you'll want to remove the third input entirely and reshape your data afterwards.
For example:
fid = fopen('data.dat');
header = fgetl(fid);
data = fscanf(fid,'%f %f %f %f %f %f %f %f %f %f %s');
fclose(fid);
% We just happen to know this explicitly, not knowledge to generally assume
ncols = 22;
% Reshape and transpose
data = reshape(data, ncols, []).';
Gives us a 3 x 22 data array, which is kinda sorta what we want.
So where are the extra columns coming from? For %s fields, fscanf reads the string until it encounters whitespace. Because the output of fscanf is a numeric array it must convert this string into a numeric value, so it converts each character to its numeric equivalent (double(letter)) and outputs that into the matrix.
Using our above data matrix as an example, we have:
>> char(data(1, 11:end))
ans =
phaet_000018
With this in mind, your initial code only happens to work because all of your strings are the same length. If we change the length of one or more of the strings, this data import will fail:
Error using reshape
Product of known dimensions, 22, not divisible into total number of elements, 65.
Error in testcode (line 11)
data = reshape(data, ncols, []).';
So what can we do instead? If you need this string from your data I would recommend trying textscan:
fid = fopen('data.dat');
header = fgetl(fid);
data = textscan(fid, '%f %f %f %f %f %f %f %f %f %f %s');
fclose(fid);
This will read your data into a 1x11 cell array, where each column corresponds to a column in your data:
>> data{1} % t
ans =
0
0
0
To collect your numeric data you can iterate through the cell array, or you can use the 'CollectOutput' flag in textscan:
fid = fopen('data.dat');
header = fgetl(fid);
data = textscan(fid, '%f %f %f %f %f %f %f %f %f %f %s', 'CollectOutput', true);
fclose(fid);
Which will output a 1x2 cell array, where data{1} is your numeric array and data{2} is a cell array containing your strings:
>> data{1} % Numeric data
ans =
0 1.2712 0.8899 22.2458 265.2511 322.1539 -13.6281 -130.9860 0.1553 0.8898
0 1.2712 0.8899 22.2458 265.2511 322.1539 -13.6281 -130.9860 0.1553 0.8898
0 1.2712 0.8899 22.2458 265.2512 322.1539 -13.6281 -130.9860 0.1553 0.8898
>> data{2} % Strings
ans =
3×1 cell array
'phaet_000018'
'phaet_000044'
'phaet_000006'

Skip all type of characters matlab while using fscanf or sscanf

I need to read a text that has mix of numerical values and characters. Here is an example:
% Loc : LAT = -19.6423 LON = -70.817 DEP = 21.5451196625
I only need to read numerical fields.
Typically I used this:
x = fgetl(fid);
out = sscanf(x,'%% Loc : LAT = %f LON = %f DEP = %f\n');
It works but the problem is that not all the files have fixed format and sometimes letters are written in upper or lower cases. In such cases what I did does not work.
I tried skipping all characters using
out = sscanf(x,'%[a-zA-Z.=\t\b]s %f %[a-zA-Z.=\t\b]s %f %[a-zA-Z.=\t\b]s %f\n');
but it does not work!
Please note that file lines are not the same and I have different number of numerical field in each line of file.
I'm still a little unclear on your file format, but it seems like you could do this much easier using textscan instead of the lower level functions.
Something like this should work:
while (~feof(fid))
textscan(fid, '%s :'); % Read the part of the line through the colon
data = textscan(fid, '%s = %f');
% Do something with the data here
end
The variable fid is an file identifier that you would have to have gotten from calling fopen and you'll need to call fclose when you're done.
I don't think this is going to exactly fix your problem, but hopefully it will get you on a track that's much shorter and cleaner. You'll have to play with this to make sure that you actually get to the end of file, for example, and that there are not corner cases that trip up the pattern matching.
*scanf() uses a format string like "%d", not a multi-character constant like '%d'
Detail: " vs. '.
"%[] does not use a trailing 's' as OP used in '%[a-zA-Z.=\t\b]s'
"%n" records the int count of characters scanned so far.
Suggest
// Adjust these as needed
#define SKIPCHAR " %*[a-zA-Z.%:]"
#define EQUALFLOAT " =%f"
int n = 0;
float lat, lon, dep;
sscanf(x, SKIPCHAR EQUALFLOAT SKIPCHAR EQUALFLOAT SKIPCHAR EQUALFLOAT " %n",
&lat, &lon, &dep, &n);
if (n > 0 && x[n] == '\0') Success();
else Fail();
To cope with different number of numbers in a line:
#define XN 100
float x[XN];
char *p = x;
size_t i;
for (i=0; i<XN; i++) {
int n = 0;
sscanf(p, SKIPCHAR " %n", &n);
p += n;
n = 0;
sscanf(p, EQUALFLOAT " %n", &x[i], &n);
if (n == 0) break;
p += n;
}
I've found a possible solution even if it is for sure not "elegant", nevertheless seems working.
It is based on the following process:
read the file line by line using fgets
parse each string using strtok
try converting each token to a numebr with str2num
if it is actually a "number" str2num (i. e. if str2num does not returns an empty array) insert the number in the output matrix
The output matrix is initialized (to NaN) at the beginning of the script as big enough to have:
a number of rows greater or equal to the number of rows of the input file (if it is not known in advance, a "reasonable" value should be defined)
a number of columns greater or equal to the maximum number of numeric values that can be present in a row of the input file (if it is not known in advance, a "reasonable" value should be defined).
Once you've read all the input file, you can "clean" the the output matrix by removing the exceeding full NaN rows and columns.
In the following you can find the script, the input file I've used and the output matrix (looking at it should make more clear the reason for having initialized it to NaN - I hope).
Notice that the identification of the number and their extraction (using strtok) is based on the format of your the example row: in particular, for example, it is based on the fact that all the token of the string are separated by a space.
This means that the code is not able to identify =123.456 as number.
If your input file has token such as =123.456, the code has to be modified.
% Initialize rows counter
r_cnt=0;
% INitialize column counter
c_cnt=0;
% Define the number of rows of the input file (if it not known in advance,
% put a "reasonable" value) - Used to initialize the output matrix
file_rows=5;
% Define the number of numeric values to be extracted from the input file
% (if it not known in advance, put a "reasonable" value) - Used to
% initialize the output matrix
max_col=5;
% Initialize the variable holding the maximum number of column. Used to
% "clean" the output matrix
max_n_col=-1;
% Initialize the output matrix
m=nan(file_rows,max_col);
% Open the input file
fp=fopen('char_and_num.txt','rt');
% Get the first row
tline = fgets(fp);
% Loop to read line by line the input file
while ischar(tline)
% Increment the row counter
r_cnt=r_cnt+1;
% Parse the line looking for numeric values
while(true)
[str, tline] = strtok(tline);
if(isempty(str))
break
end
% Try to conver the string into a number
tmp_val=str2num(str);
if(~isempty(tmp_val))
% If the token is a number, increment the column counter and
% insert the number in the output matrix
c_cnt=c_cnt+1;
m(r_cnt,c_cnt)=tmp_val;
end
end
% Identify the maximum number not NaN column in the in the output matrix
% so far
max_n_col=max(max_n_col,c_cnt);
% Reset the column counter before nest iteration
c_cnt=0;
% Read next line of the input file
tline = fgets(fp);
end
% After having read all the input file, close it
fclose(fp)
% Clean the output matrix removing the exceeding full NaN rows and columns
m(r_cnt+1:end,:)=[];
m(:,max_n_col+1:end)=[];
m
Input file
% Loc : LAT = -19.6423 LON = -70.817 DEP = 21.5451196625
% Loc : xxx = -1.234 yyy = -70.000 WIDTH = 333.369 DEP = 456.5451196625
% Loc : zzz = 1.23
Output
m =
-19.6423 -70.8170 21.5451 NaN
-1.2340 -70.0000 333.3690 456.5451
1.2300 NaN NaN NaN
Hope this helps.

MATLAB - Error while writing output in a CSV file using fprintf

I am consistently getting an error while writing output in a CSV file using fprintf. I actually want to write my results in a CSV file. I have tried different lengths of the matrix, and I get the same error even with two columns. What's the mistake here and how can I resolve this error?
Sample code:
colname = {'col1' 'col2' 'col3'};
fid = fopen('test.csv','w');
fprintf(fid, '%s, %s, %s\n', colname{1:});
for p=1:5
% <Some code>
fname = %reading image name from a directory
% <Some code>
val1 = %calculating value1
val2 = %calculating value2
datacol = {fname val1 val2};
fprintf(fid, '%s, %f, %f\n', datacol{p+1:});
end
fclose(fid);
Error:
??? Index exceeds matrix dimensions. at fprintf(fid, '%s, %f, %f\n', datacol{p+1:});
P.S.: Writing "datacol = {fname val1 val2};" as "datacol = {fname,val1,val2};" brought the same error message.
You are indexing the cell contents of datacol.
If I am not mistaken datacol looks sth like this:
{'some_string_for_the_name', 1, 2}
Where 1 and 2 are val1 and val2.
During your loop you access datacol{p+1} which obviously is datcol{4} for p = 3.
Since your cell only has three elements, indexing a fourth will result in an error. What you probably would like to do is print the lines of val1 and val2, no?
Changing your fprintf to
fprintf(fid, '%s, %f, %f\n', datacol{1}, datacol{2}, datacol{3});
should solve your problem.

Matlab reading txt formatted file

If there is a .txt file in the format
Name, Home, 1, 2, 3, 3, 3, 3
It means the first two columns are string, and the rest are integers
How do I read first two column as vectors of strings, and another matrix as numeric values.
One way of doing this so you know exactly what's happening line by line is in the following piece of code:
fid = fopen('textfile.txt');
clear data
tline = fgetl(fid);
n = 1;
while ischar(tline)
data(n,:) = strsplit(tline(1:end),', ');
n=n+1;
tline = fgetl(fid);
end
fclose(fid);
dataStrings = data(:,1:2);
dataValues = str2double(data(:,3:end));
where data contains everything in string type, dataStrings contains only first 2 columns as strings, and dataValues contains the rest of the columns as type double.
This way you get simple matrices, meaning you don't have to worry yourself with structures or cell arrays.
Use textscan:
fileID = fopen('sometextfile.txt');
C = textscan(fileID,'%s %s %f %f %f %f %f %f','Delimiter',','); % assuming you want double data types, change as required
fclose(fileID);
celldisp(C) % C is a cell array

Reading data into MATLAB from a textfile

I have a textfile with the following structure:
1999-01-04
1,100.00
1,060.00
1,092.50
0
6,225
1,336,605
37
1999-01-05
1,122.50
1,087.50
1,122.50
0
3,250
712,175
14
...
The file contains repeated sets of eight values (a date followed by seven numbers, each on their own line).
I want to read it into MATLAB and get the values into different vectors. I've tried to accomplish this with several different methods, but none have worked - all output some sort of error.
In case it's important, I'm doing this on a Mac.
EDIT: This is a shorter version of the code I previously had in my answer...
If you'd like to read your data file directly, without having to preprocess it first as dstibbe suggested, the following should work:
fid = fopen('datafile.txt','rt');
data = textscan(fid,'%s %s %s %s %s %s %s %s','Delimiter','\n');
fclose(fid);
data = [datenum(data{1}) cellfun(#str2double,[data{2:end}])]';
The above code places each set of 8 values into an 8-by-N matrix, with N being the number of 8 line sets in the data file. The date is converted to a serial date number so that it can be included with the other double-precision values in the matrix. The following functions (used in the above code) may be of interest: TEXTSCAN, DATENUM, CELLFUN, STR2DOUBLE.
I propose yet another solution. This one is the shortest in MATLAB code. First using sed, we format the file as a CSV file (comma seperated, with each record on one line):
cat a.dat | sed -e 's/,//g ; s/[ \t]*$/,/g' -e '0~8 s/^\(.*\),$/\1\n/' |
sed -e :a -e '/,$/N; s/,\n/,/; ta' -e '/^$/d' > file.csv
Explanation: First we get rid of the thousands comma seperator, and trim spaces at the end of each line adding a comma. But then we remove that ending comma for each 8th line. Finally we join the lines and remove empty ones.
The output will look like this:
1999-01-04,1100.00,1060.00,1092.50,0,6225,1336605,37
1999-01-05,1122.50,1087.50,1122.50,0,3250,712175,14
Next in MATLAB, we simply use textscan to read each line, with the first field as a string (to be converted to num), and the rest as numbers:
fid = fopen('file.csv', 'rt');
a = textscan(fid, '%s %f %f %f %f %f %f %f', 'Delimiter',',', 'CollectOutput',1);
fclose(fid);
M = [datenum(a{1}) a{2}]
and the resulting matrix M is:
730124 1100 1060 1092.5 0 6225 1336605 37
730125 1122.5 1087.5 1122.5 0 3250 712175 14
Use a script to modify your text file into something that Matlab can read.
eg. make it a matrix:
M = [
1999-01-04
1,100.00
1,060.00
1,092.50
0
6,225
1,336,605; <-- notice the ';'
37
1999-01-05
1,122.50
1,087.50
1,122.50
0
3,250; <-- notice the ';'
712,175
14
...
]
import this into matlab and read the various vectors from the matrix.
Note: my matlab is a bit rusty. Might containt errors.
It isn't entirely clear what form you want the data to be in once you've read it. The code below puts it all in one matrix, with each row representing a group of 8 rows in your text file. You may wish use different variables for different columns, or (if you have access to the Statistics toolbox), use a dataset array.
% Read file as text
text = fileread('c:/data.txt');
% Split by line
x = regexp(text, '\n', 'split');
% Remove commas from numbers
x = regexprep(x, ',', '')
% Number of items per object
n = 8;
% Get dates
index = 1:length(x);
dates = datenum(x(rem(index, n) == 1));
% Get other numbers
nums = str2double(x(rem(index, n) ~= 1));
nums = reshape(nums, (n-1), length(nums)/(n-1))';
% Combine dates and numbers
thedata = [dates nums];
You could also look into the function textscan for alternative ways of solving the problem.
Similar to Richie's. Using str2double to convert the file strings to doubles. This implementation processes line by line instead of breaking the file up with a regular expression. The output is a cell array of individual vectors.
function vectors = readdata(filename)
fid=fopen(filename);
tline = fgetl(fid);
counter = 0;
vectors = cell(7,1);
while ischar(tline)
disp(tline)
if counter > 0
vectors{counter} = [vectors{counter} str2double(tline)];
end
counter = counter + 1
if counter > 7
counter = 0;
end
tline = fgetl(fid);
end
fclose(fid);
This has regular expression checking to make sure your data is formatted well.
fid = fopen('data.txt','rt');
%these will be your 8 value arrays
val1 = [];
val2 = [];
val3 = [];
val4 = [];
val5 = [];
val6 = [];
val7 = [];
val8 = [];
linenum = 0; % line number in file
valnum = 0; % number of value (1-8)
while 1
line = fgetl(fid);
linenum = linenum+1;
if valnum == 8
valnum = 1;
else
valnum = valnum+1;
end
%-- if reached end of file, end
if isempty(line) | line == -1
fclose(fid);
break;
end
switch valnum
case 1
pat = '(?\d{4})-(?\d{2})-(?\d{2})'; % val1 (e.g. 1999-01-04)
case 2
pat = '(?\d*[,]*\d*[,]*\d*[.]\d{2})'; % val2 (e.g. 1,100.00) [valid up to 1billion-1]
case 3
pat = '(?\d*[,]*\d*[,]*\d*[.]\d{2})'; % val3 (e.g. 1,060.00) [valid up to 1billion-1]
case 4
pat = '(?\d*[,]*\d*[,]*\d*[.]\d{2})'; % val4 (e.g. 1,092.50) [valid up to 1billion-1]
case 5
pat = '(?\d+)'; % val5 (e.g. 0)
case 6
pat = '(?\d*[,]*\d*[,]*\d+)'; % val6 (e.g. 6,225) [valid up to 1billion-1]
case 7
pat = '(?\d*[,]*\d*[,]*\d+)'; % val7 (e.g. 1,336,605) [valid up to 1billion-1]
case 8
pat = '(?\d+)'; % val8 (e.g. 37)
otherwise
error('bad linenum')
end
l = regexp(line,pat,'names'); % l is for line
if length(l) == 1 % match
if valnum == 1
serialtime = datenum(str2num(l.yr),str2num(l.mo),str2num(l.dy)); % convert to matlab serial date
val1 = [val1;serialtime];
else
this_val = strrep(l.val,',',''); % strip out comma and convert to number
eval(['val',num2str(valnum),' = [val',num2str(valnum),';',this_val,'];']) % save this value into appropriate array
end
else
warning(['line number ',num2str(linenum),' skipped! [didnt pass regexp]: ',line]);
end
end