Read a .txt file with both numerical data and words Matlab - matlab

i want to read .txt file in matlab with both data and words
the contents of .txt file are
(title "Particle Tracks")
(labels "Time" "Particle Velocity Magnitude")
((xy/key/label "particle-1")
1e-06 45.4551
2e-06 40.3895
2e-06 44.0437
3e-06 34.9606
4e-06 33.1695
4e-06 35.3499
5e-06 29.9504
6e-06 28.0226
6e-06 35.1794
7e-06 41.2255
....
((xy/key/label "particle-2")
1e-06 43.7789
1e-06 45.0513
2e-06 44.1221
3e-06 37.8328
3e-06 43.6451
4e-06 29.1166
5e-06 41.3342
6e-06 28.7241
6e-06 36.3779
7e-06 31.9631
8e-06 29.2826
9e-06 24.7755
9e-06 24.9516
1e-05 22.7528
1e-05 26.6802
1.1e-05 34.4668
the file extends for 100 particles ,1st column is time and 2nd column is velocity
I intend to find the mean velocity of all the particles at various times of column 1,so basically i want to add corresponding column 2 values and divide them by hundred and display against the the column 1 values which is same for all the hundred particles![enter image description here][2]
thanks

The best way to read text data with a complex structure like this is to use the fscanf function in MATLAB. Follow the documentation and you should be able to read the data into an array that you can used to compute the statistics you wish to find.
Another option might be to read the data in line-by-line and use regular expressions with the regexpi function to extract the data you need.

Suppose your input file is input.txt, then use textscan as follows:
fid = fopen('input.txt');
C = textscan(fid, '%n %n', 'commentStyle', '(');
a = C{1};
b = C{2};,
%# do your computations on vectors a and b
%# for example:
ma = mean(a)
mb = mean(b)
You can use the vectors as you wish, e.g. you can process them 100 by 100 elements. That's up to you.

Related

Optimizing reading the data in Matlab

I have a large data file with a text formatted as a single column with n rows. Each row is either a real number or a string with a value of: No Data. I have imported this text as a nx1 cell named Data. Now I want to filter out the data and to create a nx1 array out of it with NaN values instead of No data. I have managed to do it using a simple cycle (see below), the problem is that it is quite slow.
z = zeros(n,1);
for i = 1:n
if Data{i}(1)~='N'
z(i) = str2double(Data{i});
else
z(i) = NaN;
end
end
Is there a way to optimize it?
Actually, the whole parsing can be performed with a one-liner using a properly parametrized readtable function call (no iterations, no sanitization, no conversion, etc...):
data = readtable('data.txt','Delimiter','\n','Format','%f','ReadVariableNames',false,'TreatAsEmpty','No data');
Here is the content of the text file I used as a template for my test:
9.343410
11.54300
6.733000
-135.210
No data
34.23000
0.550001
No data
1.535000
-0.00012
7.244000
9.999999
34.00000
No data
And here is the output (which can be retrieved in the form of a vector of doubles using data.Var1):
ans =
9.34341
11.543
6.733
-135.21
NaN
34.23
0.550001
NaN
1.535
-0.00012
7.244
9.999999
34
NaN
Delimiter: specified as a line break since you are working with a single column... this prevents No data to produce two columns because of the whitespace.
Format: you want numerical values.
TreatAsEmpty: this tells the function to treat a specific string as empty, and empty doubles are set to NaN by default.
If you run this you can find out which approach is faster. It creates an 11MB text file and reads it with the various approaches.
filename = 'data.txt';
%% generate data
fid = fopen(filename,'wt');
N = 1E6;
for ct = 1:N
val = rand(1);
if val<0.01
fwrite(fid,sprintf('%s\n','No Data'));
else
fwrite(fid,sprintf('%f\n',val*1000));
end
end
fclose(fid)
%% Tommaso Belluzzo
tic
data = readtable(filename,'Delimiter','\n','Format','%f','ReadVariableNames',false,'TreatAsEmpty','No Data');
toc
%% Camilo Rada
tic
[txtMat, nLines]=txt2mat(filename);
NoData=txtMat(:,1)=='N';
z = zeros(nLines,1);
z(NoData)=nan;
toc
%% Gelliant
tic
fid = fopen(filename,'rt');
z= textscan(fid, '%f', 'Delimiter','\n', 'whitespace',' ', 'TreatAsEmpty','No Data', 'EndOfLine','\n','TextType','char');
z=z{1};
fclose(fid);
toc
result:
Elapsed time is 0.273248 seconds.
Elapsed time is 0.304987 seconds.
Elapsed time is 0.206315 seconds.
txt2mat is slow, even without converting resulting string matrix to numbers it is outperformed by readtable and textscan. textscan is slightly faster than readtable. Probably because it skips some of the internal sanity checks and does not convert the resulting data to a table.
Depending of how big are your files and how often you read such files, you might want to go beyond readtable, that could be quite slow.
EDIT: After tests, with a file this simple the method below provide no advantages. The method was developed to read RINEX files, that are large and complex in the sense that the are aphanumeric with different numbers of columns and different delimiters in different rows.
The most efficient way I've found, is to read the whole file as a char matrix, then you can easily find you "No data" lines. And if your real numbers are formatted with fix width you can transform them from char into numbers in a way much more efficient than str2double or similar functions.
The function I wrote to read a text file into a char matrix is:
function [txtMat, nLines]=txt2mat(filename)
% txt2mat Read the content of a text file to a char matrix
% Read all the content of a text file to a matrix as wide as the longest
% line on the file. Shorter lines are padded with blank spaces. New lines
% are not included in the output.
% New lines are identified by new line \n characters.
% Reading the whole file in a string
fid=fopen(filename,'r');
fileData = char(fread(fid));
fclose(fid);
% Finding new lines positions
newLines= fileData==sprintf('\n');
linesEndPos=find(newLines)-1;
% Calculating number of lines
nLines=length(linesEndPos);
% Calculating the width (number of characters) of each line
linesWidth=diff([-1; linesEndPos])-1;
% Number of characters per row including new lines
charsPerRow=max(linesWidth)+1;
% Initializing output var with blank spaces
txtMat=char(zeros(charsPerRow,nLines,'uint8')+' ');
% Computing a logical index to all characters of the input string to
% their final positions
charIdx=false(charsPerRow,nLines);
% Indexes of all new lines
linearInd = sub2ind(size(txtMat), (linesWidth+1)', 1:nLines);
charIdx(linearInd)=true;
charIdx=cumsum(charIdx)==0;
% Filling output matrix
txtMat(charIdx)=fileData(~newLines);
% Cropping the last row coresponding to new lines characters and transposing
txtMat=txtMat(1:end-1,:)';
end
Then, once you have all your data in a matrix (let's assume it is named txtMat), you can do:
NoData=txtMat(:,1)=='N';
And if your number fields have fix width, you can transform them to numbers way more efficiently than str2num with something like
values=((txtMat(:,1:10)-'0')*[1e6; 1e5; 1e4; 1e3; 1e2; 10; 1; 0; 1e-1; 1e-2]);
Where I've assumed the numbers have 7 digits and two decimal places, but you can easily adapt it for your case.
And to finish you need to set the NaN values with:
values(NoData)=NaN;
This is more cumbersome than readtable or similar functions, but if you are looking to optimize the reading, this is WAY faster. And if you don't have fix width numbers you can still do it this way by adding a couple lines to count the number of digits and find the place of the decimal point before doing the conversion, but that will slow down things a little bit. However, I think it will still be faster.

GNU-Octave: load data which contains geometric lines as blocks of coordinates

I would know if there is a way to load data stored like this in a file:
$ cat foo
12.108 24.21; 89.02 17.3131; 93.192368 13.10012; ....
10.3069 41.7442; 90.1277 19.351; 93.192368 13.10012; 91.1956 15.29712; ...
...
So the form is:
x y; x y; x y; and so on.
Each point defined by a couple of x y values is a point constituting a geometric line.
Each line of the file contains a unique geometric line which is defined by its sequence of points. Some lines are made of only two points, others of several. It varies. So as there is no constant number of fields I'm now unable to load that file.
Ideally, I'd like to store each line in a variable, or better, all lines in a kind of indexed structure, cell or nD-matrix so that I can further easily loop on their segments (a segment is defined by 2 consecutive points within a line).
Thanks.
Storing data in an array seems to be an elegant solution:
fid=fopen("File.csv");
tline=fgetl(fid);
ix=1;
while ischar(tline)
A{ix}=str2num(tline);
tline=fgetl(fid);
ix=ix+1;
end
fclose(fid);
Open the file using fopoen
Use fgetl to initialize tline with
Set an iterative value, here ix, to 1 (avoiding i is a good idea as it is also a mathematical constant...)
While tline is a char (remember, at the end of file fgetl returns -1) store the fgetl returned line converted to a num value using str2num in the A{ix} part of the A array.
Don't forget to close the file with fclose.
Thus, A{ix} is taking the ix-th line of the input file in (x,y) structured sub-arrays:
> A{1}
ans =
12.108 24.21
89.02 17.3131
93.192368 13.10012
...
> class(A{1})
ans = double
https://www.gnu.org/software/octave/doc/v4.0.0/Opening-and-Closing-Files.html
https://www.gnu.org/software/octave/doc/v4.0.0/Line_002dOriented-Input.html

Performing logical OR between multiple CSVs with 32 bit hex values using MATLAB

I am trying to read multiple (50+) CSV files within the same folder using MATLAB. These CSVs contain 3 32 bit hex values and the format of the data is the same for all files. Each CSV contains the data within 2 rows and 3 columns with no headers. For e.g.
00000800,D404002C,4447538F
000008FF,D404002C,4447538F
After ORing the 2 rows from all files, the final 2 rows of 3 32 bit hex values need to be written out to a CSV.
Now, before jumping in the deep end trying to process multiple files, I have started by just trying to OR Row 1 with Row 2 of the same file. So, 00000800| 000008FF , D404002C | D404002C.. I have been able to convert them to binary and do a logical OR between the 3 values however currently have the following issues:
1) If the MSB of the hex value starts with 3 or 4 (binary 0011 or 0100) then the leading 0's are missed or if the second hex value happens to be 800 then the leading 00000's are missed.
2) I cannot convert the integer cell array back to hex
I have seen many posts about just reading CSVs using MATLAB or separating the data and etc on stackoverflow and matlabcentral however not been able to interpret any of them to sort my issue. Any help would be much appreciated.Below is what I have so far:
fid = fopen('File1.csv');
c = textscan(fid,'%s','Delimiter','\n');
fclose(fid);
contents = c{1};
row1 = strsplit(contents{1},',','CollapseDelimiters',0);
row2 = strsplit(contents{2},',','CollapseDelimiters',0);
x = 1;
y = 1;
while x <= length(row1)
column1{x} = hex2dec(row1(x));
column2{x} = hex2dec(row2(x));
x = x + 1;
end
while y <= length(column1)
bin1{y} = zeros(1,32);
bin2{y} = zeros(1,32);
bin1{y} = dec2bin(column1{y});
bin2{y} = dec2bin(column2{y});
result{y} = bitor(bitget(uint8(bin1{y}),1),bitget(uint8(bin2{y}),1));
y = y+ 1 ;
end
Also, eventually need to be able to do this process with multiple CSVs so I have attached link to File1.csv and File2.csv if someone wants to try to OR row 1 of File1 with row 2 of File2.csv and so on.
CSV Files
Apologies if I have missed anything, Please leave a comment and I'll try to explain it further.
Thanks!
EDIT: Hope the image below explains what I am trying to do better.
You can try the following approach:
use the dir function to get the list of files to be processed
create a loop to go through the files to be processed. In the loop
read the input files
convert the hexadecimal values read from the files into a matrix of characters using the char function
convert the data stored in the char matrinx from hex to dec and then to uint32 using the functions hex2dec and uint32
perform the or using the bitor function
go to next iteration
at the end of the loop, write the output
The above described approach has been implemented in the folowing code:
% Get the list of CSV files
hex_files=dir('O_File*.csv');
% Open the outpur file
fp_out=fopen('new_hex_file.csv','wt');
% Loop over the CSV files
for i=1:length(hex_files)
% Read the i-th CSV file
fid = fopen(hex_files(i).name);
c = textscan(fid,'%s','Delimiter','\n');
fclose(fid);
% Get the 2 rows
contents = c{1};
row_1=char(strsplit(contents{1},',','CollapseDelimiters',0));
row_2=char(strsplit(contents{2},',','CollapseDelimiters',0));
% Convert from hex to uint32
row_d_1=uint32(hex2dec(row_1));
row_d_2=uint32(hex2dec(row_2));
if(i == 1)
% Store the row of the first file and continue
tmp_var_1=row_d_1;
tmp_var_2=row_d_2;
continue
else
% OR the two rows
tmp_var_1=bitor(tmp_var_1,row_d_1);
tmp_var_2=bitor(tmp_var_2,row_d_2);
end
end
% Write the OR values into the new file
fprintf(fp_out,'%08X,%08X,%08X\n',tmp_var_1);
fprintf(fp_out,'%08X,%08X,%08X\n',tmp_var_2);
% Close the output file
fclose(fp_out);
The following input files have been used to test it:
File1.csv
00000800,D404002C,4447538F
000008FF,D404002C,4447538F
File2.csv
000008FF,D404DD2C,49475115
11100800,D411EC2C,3ACD1266
File3.csv
123456FF,ABCDEF2C,369ABC15
01012369,00110033,36936966
The output is:
12345EFF,FFCDFF2C,7FDFFF9F
11112BFF,D415EC3F,7EDF7BEF
Hope this helps.
Qapla'

Create a 2 column matrix with 2 different format types

very very new to Matlab and I'm having trouble reading a binary file into a matrix. The problem is I am trying to write the binary file into a two column matrix (which has 100000's of rows) where each column is a different format type.
I want column 1 to be in 'int8' format and column 2 to be a 'float'
This is my attempt so far:
FileID= fopen ('MyBinaryFile.example');
[A,count] = fread(FileID,[nrows, 2],['int8','float'])
This is not working because I get the error message 'Error using fread' 'Invalid Precision'
I will then go on to plot once I have successfully done this.
Probably a very easy solution to someone with matlab experience but I haven't been successful at finding a solution on the internet.
Thanks in advance to anyone who can help.
You should be aware that Matlab cannot hold different data type in a matrix (it can do so in a cell array but this is another topic). So there is no point trying to read your mixed type file in one go in one single matrix ... it is not possible.
Unless you want a cell array, you will have to use 2 different variables for your 2 columns of different type. Once this is established, there are many ways to read such a file.
For the purpose of the example, I had to create a binary file as you described. This is done this way:
%% // write example file
A = int8(-5:5) ; %// a few "int8" data
B = single(linspace(-3,1,11)) ; %// a few "float" (=single) data
fileID = fopen('testmixeddata.bin','w');
for il=1:11
fwrite(fileID,A(il),'int8');
fwrite(fileID,B(il),'single');
end
fclose(fileID);
This create a 2 column binary file, with first column: 11 values of type int8 going from -5 to +5, and second column: 11 values of type float going from -3 to 1.
In each of the solution below, the first column will be read in a variable called C, and the second column in a variable called D.
1) Read all data in one go - convert to proper type after
%% // Read all data in one go - convert to proper type after
fileID = fopen('testmixeddata.bin');
R = fread(fileID,'uint8=>uint8') ; %// read all values, most basic data type (unsigned 8 bit integer)
fclose(fileID);
R = reshape( R , 5 , [] ) ; %// reshape data into a matrix (5 is because 1+4byte=5 byte per column)
temp = R(1,:) ; %// extract data for first column into temporary variable (OPTIONAL)
C = typecast( temp , 'int8' ) ; %// convert into "int8"
temp = R(2:end,:) ; %// extract data for second column
D = typecast( temp(:) , 'single' ) ; %// convert into "single/float"
This is my favourite method. Specially for speed because it minimizes the read/seek operations on disk, and most post calculations are done in memory (much much faster than disk operations).
Note that the temporary variable I used was only for clarity/verbose, you can avoid it altogether if you get your indexing into the raw data right.
The key thing to understand is the use of the typecast function. And the good news is it got even faster since 2014b.
2) Read column by column (using "skipvalue") - 2 pass approach
%% // Read column by column (using "skipvalue") - 2 pass approach
col1size = 1 ; %// size of data in column 1 (in [byte])
col2size = 4 ; %// size of data in column 2 (in [byte])
fileID = fopen('testmixeddata.bin');
C = fread(fileID,'int8=>int8',col2size) ; %// read all "int8" values, skipping all "float"
fseek(fileID,col1size,'bof') ; %// rewind to beginning of column 2 at the top of the file
D = fread(fileID,'single=>single',col1size) ; %// read all "float" values, skipping all "int8"
fclose(fileID);
That works too. It works fine ... but probably much slower than above. Although it may be clearer code to read for someone else ... I find that ugly (and yet I've used this way for several years until I got to use the method above).
3) Read element by element
%% // Read element by element (slow - not recommended)
fileID = fopen('testmixeddata.bin');
C=[];D=[];
while ~feof(fileID)
try
C(end+1) = fread(fileID,1,'int8=>int8') ;
D(end+1) = fread(fileID,1,'single=>single') ;
catch
disp('reached End Of File')
end
end
fclose(fileID);
Talking about ugly code ... that does work too, and if you were writing C code it would be more than ok. But in Matlab ... please avoid ! (well, your choice ultimately)
Merging in one variable
If really you want all of that in one single variable, it could be a structure or a cell array. For a cell array (to keep matrix indexing style), simply use:
%% // Merge into one "cell array"
Data = { C , D } ;
Data =
[11x1 int8] [11x1 single]

Merging multiple files into one file by using MATLAB

Here i am sharing one of my data which are in .dat file. I have 16162 different files. I merged all into one file and want to read it in matlab and need to extract three parameter's values from a single file and arrange them in either row wise or column wise. I can do it by using C sharp codes but i want to do it by using matlab. can anybody help me out for writing the codes please?
Here is the one sample file data:
DISTRIBUTION: monomodal log-normal
n : 1.000
r_mod: .010
sigma: 1.400
number conc., surface. conc., volume conc.
.1087E+01 .1866E-02 .7878E-05
part. ave. radius, surf. ave. radius, vol. ave. radius :
.1149E-01 .1169E-01 .1201E-01
surface mean radius, volume mean radius :
.1267E-01 .1392E-01
eff. variance :
.9939E-01
Let's say: I want to extract or read three parameters (r_mod, sigma, Surface means radius). The corresponding values for these three parameters from the file I put in this page is .010 , 1.400 , .1267E-01
The output should be (Which i want):
r_mod sigma surface mean radius .01 1.4 1.27E-02 .02 1.4 2.67E-02 .03 1.4 3.98E-02 ... .. .. .. .. .. .. .. ..
I have more than thousands similar files in a same directory. I want to read all those files in matlab and the output should show in this way in a single file.
Given that all files have the exact same structure, the following will do the job (just make sure to ream the comments in the code, you will need to adapt your file names and number of files to read):
n = 2; % Number of files you want to go through
vals = zeros(1,3*n);
str = 'r_mod sigma surface mean radius ';
k = 1;
for i = 1:n
path = ['myFile',num2str(i),'.dat']; % change this to fit your file names
fid = fopen(path, 'rb');
data = textscan(fid,'%s');
fclose(fid);
data = data{1};
vals(k) = str2double(data{8});
vals(k+1) = str2double(data{10});
vals(k+2) = str2double(data{40});
k = k+3;
end
out = [str, num2str(vals)];
fid = fopen('output.txt', 'w');
fprintf(fid,out);
The file output.txt now contains your desired line. You may change the format if you want the output file to be .dat as well.