Specify decimal separator for .dat file in matlab [duplicate] - matlab

This question already has answers here:
Matlab: How to read in numbers with a comma as decimal separator?
(4 answers)
Closed 9 years ago.
I've got a bunch of .dat files, where the decimal separator is comma instead of dot. Is there any function in MATLAB to set comma as the separator?

You will have to read the data in as text (with textscan, textread, dlmread, etc.) and convert to numeric.
Say you have read the data into a cell array with each number in a cell:
>> C = {'1,2345','3,14159','2,7183','1,4142','0,7071'}
C =
'1,2345' '3,14159' '2,7183' '1,4142' '0,7071'
Use strrep and str2double as follows:
>> x = str2double(strrep(C,',','.'))
x =
1.2345 3.1416 2.7183 1.4142 0.7071
For your example data from comments, you have a file "1.dat" formatted similarly to:
1,2 3,4
5,6 7,8
Here you have a space as a delimiter. By default, textscan uses whitespace as a delimiter, so that is fine. All you need to change below is the format specifier for the number of columns in your data by repeating %s for each column (e.g. here we need '%s%s' for two columns):
>> fid = fopen('1.dat','r');
>> C = textscan(fid,'%s%s')
C =
{2x1 cell} {2x1 cell}
>> fclose(fid);
The output of textscan is a cell array for each column delimited by whitespace. Combine the columns into a single cell array and run the commands to convert to numeric:
>> C = [C{:}]
C =
'1,2' '3,4'
'5,6' '7,8'
>> x = str2double(strrep(C,',','.'))
x =
1.2000 3.4000
5.6000 7.8000

Related

Matlab fscanf read two column character/hex data from text file

Need to read in data stored as two columns of hex values in text file temp.dat into a Matlab variable with 8 rows and two columns.
Would like to stick with the fcsanf method.
temp.dat looks like this (8 rows, two columns):
0000 7FFF
30FB 7641
5A82 5A82
7641 30FB
7FFF 0000
7641 CF05
5A82 A57E
30FB 89BF
% Matlab code
fpath = './';
fname = 'temp.dat';
fid = fopen([fpath fname],'r');
% Matlab treats hex a a character string
formatSpec = '%s %s';
% Want the output variable to be 8 rows two columns
sizeA = [8,2];
A = fscanf(fid,formatSpec,sizeA)
fclose(fid);
Matlab is producing the following which I don't expect.
A = 8×8 char array
'03577753'
'00A6F6A0'
'0F84F48F'
'0B21F12B'
'77530CA8'
'F6A00F59'
'F48F007B'
'F12B05EF'
In another variation, I attemped changing the format string like this
formatSpec = '%4c %4c';
Which produced this output:
A =
8×10 char array
'0↵45 F7↵78'
'031A3F65E9'
'00↵80 4A↵B'
'0F52F0183F'
'7BA7B0C20 '
'F 86↵0F F '
'F724700AB '
'F6 1F↵55 '
Still another variation like this:
formatSpec = '%4c %4c';
sizeA = [8,16];
A = fscanf(fid,formatSpec);
Produces a one by 76 character array:
A =
'00007FFF
30FB 7641
5A82 5A827641 30FB
7FFF 0000
7641CF05
5A82 A57E
30FB 89BF'
Would like and expect Matlab to produce a workspace variable with 8 rows and 2 columns.
Have followed the example on the Matlab help area here:
https://www.mathworks.com/help/matlab/ref/fscanf.html
My Matlab code is based on the 'read file contents into an array' section about 1/3 of the way down the page. The example I reference is doing something very similar except that the two columns are one int and one float rather than two characters.
Running Matlab R2017a on Redhat.
Here is the complete code with the solution provided by Azim and comments about
what I learned as a result of posting the question.
fpath = './';
fname = 'temp.dat';
fid = fopen([fpath fname],'r');
formatSpec = '%9c\n';
% specify the output size as the input transposed, NOT the input.
sizeA = [9,8];
A = fscanf(fid,formatSpec,sizeA);
% A' is an 8 by 9 character array, which is the goal matrix size.
% B is an 8 by 1 cell array, each member has this format 'dead beef'.
%
% Cell arrays are data types with indexed data containers called cells,
% where each cell can contain any type of data.
B = cellstr(A');
% split divides str at whitespace characters.
S = split(C)
fclose(fid)
S =
8×2 cell array
'0000' '7FFF'
'30FB' '7641'
'5A82' '5A82'
'7641' '30FB'
'7FFF' '0000'
'7641' 'CF05'
'5A82' 'A57E'
'30FB' '89BF'
It is likely your, 8x2 MATLAB variable would end up being a cell array. This can be done in two steps.
First, your lines have 9 characters so you could use formatSpec = '%9c\n' to read each line. Next you need to adjust the size parameter to read 9 rows and 8 columns; sizeA = [9 8]. This will read in all 9 characters into columns of the output; transposing the output will get you closer.
In the second step you need to convert the result of fscanf into your 8x2 cell array. Since you have R2017a you can then use cellstr and split to get your result.
Finally, if you need the integer values of each hex value you can use hex2dec on each cell in the cell-array.

splitting a character array into a cell array and matrix

I have a text file.
In the file is approx 20,000 rows of data. Each row has one column & contains 256 characters (which are all numbers).
I need to split each row into a cell array or matrix. So each 8 characters are "one piece" of information. I want to split the first 3 characters into a cell array and the next 5 characters into a double, then same again for the next 8 characters.
example
1653256719812345
myCellArray (1 x 2) myDoubleArray (1 x 2)
[165, 198] [32567, 12345]
What is the best way to do this?
Use textscan.
fid = fopen(MyFileName.txt);
data = textscan(fid, '%3d%5d', 'Delimiter', '');
fclose(fid);
testing:
% Test with string of 256 random digits that all happen to be 1:8 repeated 32 times
x = '1234567812345678123456781234567812345678123456781234567812345678123456781234567812345678123456781234567812345678123456781234567812345678123456781234567812345678123456781234567812345678123456781234567812345678123456781234567812345678123456781234567812345678';
>> y = textscan(x, '%3d%5d', 'Delimiter', '')
y =
[32x1 int32] [32x1 int32]
>> y{1}
ans =
123
123
123
123
...
I don't know the exact format of your files, so you may have to do this line-by-line within a loop (in which case you would get each line using fgetl and then replace fid in the textscan statement with the output from fgetl).
In general, whenever you find yourself having to read in data that was produced by FORTRAN code (fixed field width text files), textscan's 'Delimiter, '' and 'Whitespace', '' parameters are your friend.
Use regexp. If the file data.txt contains
1653256719812345
1563256719812345
1233256719812345
1463256719812345
Then the following MATLAB statements will read the numbers.
>> txt = fileread('data.txt') % Read entire file in txt
>> out = regexp(txt,'(\d{3})(\d{5})(\d{3})(\d{5})','tokens') % Match regex capturing groups
out =
{1x4 cell} {1x4 cell} {1x4 cell} {1x4 cell}
Each cell in out is a row from the file containing the parsed numbers as strings.You can use str2double to convert the numbers to a numeric data type in MATLAB
>> nums = cellfun(#str2double,out,'uni',0)
nums =
[1x4 double] [1x4 double] [1x4 double] [1x4 double]
Iterate over your rows one by one and run something like the following code.
k=int2str(1653256719812345);
> myCellArray{1}=k(1:3)
myCellArray =
'165'
>> mydoublearray(1)=str2num(k(4:9))
mydoublearray =
325671
If there's some formulaic pattern you should incorporate that instead of manually hard coding it.

Exporting blank values into a .txt file - MATLAB

I'm currently trying to export multiple matrices of unequal lengths into a delimited .txt file thus I have been padding the shorter matrices with 0's such that dlmwrite can use horzcat without error:
dlmwrite(filename{1},[a,b],'delimiter','\t')
However ideally I do not want the zeroes to appear in the .txt file itself - but rather the entries are left blank.
Currently the .txt file looks like this:
55875 3.1043e+05
56807 3.3361e+05
57760 3.8235e+05
58823 4.2869e+05
59913 4.3349e+05
60887 0
61825 0
62785 0
63942 0
65159 0
66304 0
67509 0
68683 0
69736 0
70782 0
But I want it to look like this:
55875 3.1043e+05
56807 3.3361e+05
57760 3.8235e+05
58823 4.2869e+05
59913 4.3349e+05
60887
61825
62785
63942
65159
66304
67509
68683
69736
70782
Is there anyway I can do this? Is there an alternative to dlmwrite which will mean I do not need to have matrices of equal lengths?
If a is always longer than b you could split vector a into two vectors of same length as vector b and the rest:
a = [1 2 3 4 5 6 7 8]';
b = [9 8 7 ]';
len = numel(b);
dlmwrite( 'foobar.txt', [a(1:len), b ], 'delimiter', '\t' );
dlmwrite( 'foobar.txt', a(len+1:end), 'delimiter', '\t', '-append');
You can read in the numeric data and convert to string and then add proper whitespaces to have the final output as string based cell array, which you can easily write into the output text file.
Stage 1: Get the cell of strings corresponding to the numeric data from column vector inputs a, b, c and so on -
%// Concatenate all arrays into a cell array with numeric data
A = [{a} {b} {c}] %// Edit this to add more columns
%// Create a "regular" 2D shaped cell array to store the cells from A
lens = cellfun('length',A)
max_lens = max(lens)
A_reg = cell(max_lens,numel(lens))
A_reg(:) = {''}
A_reg(bsxfun(#le,[1:max_lens]',lens)) = cellstr(num2str(vertcat(A{:}))) %//'
%// Create a char array that has string data from input arrays as strings
wsp = repmat({' '},max_lens,1) %// Create whitespace cell array
out_char = [];
for iter = 1:numel(A)
out_char = [out_char char(A_reg(:,iter)) char(wsp)]
end
out_cell = cellstr(out_char)
Stage 2: Now, that you have out_cell as the cell array that has the strings to be written to the text file, you have two options next for the writing operation itself.
Option 1 -
dlmwrite('results.txt',out_cell(:),'delimiter','')
Option 2 -
outfile = 'results.txt';
fid = fopen(outfile,'w');
for row = 1:numel(out_cell)
fprintf(fid,'%s\n',out_cell{row});
end
fclose(fid);

reading in a file with textscan and ignoring certain lines

I have an input file which has rows of integers like this:
1 2 3
4 5 6
7 8 9
I want to read in the file, I have used the textscan function for this kind of task before.
But there are a few lines in the data (at random positions) which contain double numbers, for example
<large number of integer lines>
0.12 12.44 65.34
<large number of integer lines>
When reading in the file, I want to ignore these lines. What's the best approach to do this? Can I tell textscan to ignore certain patterns?
The formatSpec argument could be the one you're searching for:
http://www.mathworks.de/de/help/matlab/ref/textscan.html#inputarg_formatSpec
It terminates the reading, if the content does not match the given format. If you call textscan a second time with the same file, it has to start reading where it last terminated.
From linked site:
If you resume a text scan of a file by calling textscan with the same
file identifier (fileID), then textscan automatically resumes reading
at the point where it terminated the last read.
One option is to simply just read everything in as floats - use either textscan or if your data is all numeric dlmread or similar might be simpler.
Then just remove the lines you don't want:
data =
1.0000 2.0000 3.0000
4.0000 5.0000 6.0000
0.1200 12.4400 65.3400
7.0000 8.0000 9.0000
data(data(:,1)~=round(data(:,1)),:)=[]
data =
1 2 3
4 5 6
7 8 9
If your later code requires that the type of your data matrix is non-float, use uint8 or similar to convert at this point.
Assuming that you don't know the location and number of the lines with floats, and that you don't want lines such as 1.0 2.0 3.0 or 1 2 3.0 my idea would be to read the file line by line and not store lines which contain a . character.
fid = fopen('file.txt');
nums = [];
line = fgetl(fid);
while line ~= -1 % #read until end of file
if isempty(strfind(line, '.'))
line = textscan(line, '%d %d %d');
nums = [nums; line{:}];
end
line = fgetl(fid);
end
fclose(fid);
nums is the matrix containing your data.

Use textscan in Matlab to output data

I've got a large text file with some headers and numerical data. I want to ignore the header lines and specifically output the data in columns 2 and 4.
Example data
[headers]
line1
line2
line3
[data]
1 2 3 4
5 6 7 8
9 10 11 12
I've tried using the following code:
FID = fopen('datafile.dat');
data = textscan(FID,'%f',4,'delimiter',' ','headerLines',4);
fclose(FID);
I only get an output of 0x1 cell
Try this:
FID = fopen('datafile.dat');
data = textscan(FID,'%f %f %f %f', 'headerLines', 6);
fclose(FID);
data will be a 1x4 cell array. Each cell will contain a 3x1 array of double values, which are the values in each column of your data.
You can access the 2nd and 4th columns of your data by executing data{2} and data{4}.
With your original code, the main issue is that the data file has 6 header lines but you've specified that there are only 4.
Additionally, though, you'll run into problems with the specification of the number of times to match the formatSpec. Take for instance the following code
data = textscan(FID,'%f',4);
which specifies that you will attempt to match a floating-point value 4 times. Keep in mind that after matching 4 values, textscan will stop. So for the sake of simplicity, let's imagine that your data file only contained the data (i.e. no header lines), then you would get the following results when executing that code, multiple times:
>> FID = fopen('datafile_noheaders.dat');
>> data_line1 = textscan(FID,'%f', 4)
data_line1 =
[4x1 double]
>> data_line1{1}'
ans =
1 2 3 4
>> data_line2 = textscan(FID,'%f', 4)
data_line2 =
[4x1 double]
>> data_line2{1}'
ans =
5 6 7 8
>> data_line3 = textscan(FID,'%f', 4)
data_line3 =
[4x1 double]
>> data_line3{1}'
ans =
9 10 11 12
>> data_line4 = textscan(FID,'%f', 4)
data_line4 =
[0x1 double]
>> fclose(FID);
Notice that textscan picks up where it "left off" each time it is called. In this case, the first three times that textscan is called it returns one row from your data file (in the form of a cell containing a 4x1 column of data). The fourth call returns an empty cell. For the usecase you described, this format is not particularly helpful.
The example given at the top should return data in a format that is much easier to work with for what you are trying to accomplish. In this case it will match four floating point values in each of your rows of data, and will continue with each line of text until it can no longer match this pattern.