Related
I have a huge text file that needs to be read and processed in MATLAB. This file at some points contain text to indicate that a new data series has started.
I have searched here but cant find any simple solution.
So what I want to do is to read the data in the file, put the data in a table in three different columns and when it finds text it should create a new table. It should repeat this process until the entire document is scanned.
This is how the document looks like:
time V(A,B) I(R1)
Step Information: X=1 (Run: 1/11)
0.000000000000000e+000 -2.680148e-016 0.000000e+00
9.843925313007988e-012 -4.753470e-006 2.216314e-011
1.000052605772457e-011 -4.835427e-006 2.552497e-011
1.031372754715773e-011 -4.999340e-006 -3.042096e-012
1.094013052602406e-011 -5.327165e-006 -1.206968e-011
Step Information: X=1 (Run: 2/11)
0.000000000000000e+000 -2.680148e-016 0.000000e+000
9.843925313007988e-012 -4.753470e-006 2.216314e-011
1.000052605772457e-011 -4.835427e-006 2.552497e-011
1.031372754715773e-011 -4.999340e-006 -3.042096e-012
1.094013052602406e-011 -5.327165e-006 -1.206968e-011
A rather crude approach is to read the file line by line and check if the line consists of three numbers. If it does, then append this to a temporary matrix. When you finally get to a line that doesn't contain three numbers, append this matrix as an element in a cell array, clear the temporary matrix and continue.
Something like this would work, assuming that the file is stored in 'file.txt':
%// Open the file
f = fopen('file.txt', 'r');
%// Initialize empty cell array
data = {};
%// Initialize temporary matrix
temp = [];
%// Loop over the file...
while true
%// Get a line from the file
line = fgetl(f);
%// If we reach the end of the file, get out
if line == -1
%// Last check before we break
%// Check if the temporary matrix isn't empty and add
if ~isempty(temp)
data = [data; temp];
end
break;
end
%// Else, check to see if this line contains three numbers
numbers = textscan(line, '%f %f %f');
%// If this line doesn't consist of three numbers...
if all(cellfun(#isempty, numbers))
%// If the temporary matrix is empty, skip
if isempty(temp)
continue;
end
%// Concatenate to cell array
data = [data; temp];
%// Reset temporary matrix
temp = [];
%// If this does, then create a row vector and concatenate
else
temp = [temp; numbers{:}];
end
end
%// Close the file
fclose(f);
The code is pretty self-explanatory but let's go into it to be sure you know what's going on. First open up the file with fopen to get a "pointer" to the file, then initialize our cell array that will contain our matrices as well as the temporary matrix used when reading in matrices in between header information. After we simply loop over each line of the file and we can grab a line with fgetl using the file pointer we created. We then check to see if we have reached the end of the file and if we have, let's check to see if the temporary matrix has any numerical data in it. If it does, add this into our cell array then finally get out of the loop. We use fclose to close up the file and clean things up.
Now the heart of the operation is what follows after this check. We use textscan and search for three numbers separated by spaces. That's done with the '%f %f %f' format specifier. This should give you a cell array of three elements if you are successful with numbers. If this is correct, then convert this cell array of elements into a row of numbers and concatenate this into the temporary matrix. Doing temp = [temp; numbers{:}]; facilitates this concatenation. Simply put I piece together each number and concatenate them horizontally to create a single row of numbers. I then take this row and concatenate this as another row in the temporary matrix.
Should we finally get to a line where it's all text, this will give you all three elements in the cell array found by textscan to be empty. That's the purpose of the all and cellfun call. We search each element in the cell and see if it's empty. If every element is empty, this is a line that is text. If this situation arises, simply take the temporary matrix and add this as a new entry into your cell array. You'd then reset the temporary matrix and start the logic over again.
However, we also have to take into account that there may be multiple lines that consist of text. That's what the additional if statement is for inside the first if block using all. If we have an additional line of text that precedes a previous line of text, the temporary matrix of values should still be empty and so you should check to see if that is empty before you try and concatenate the temporary matrix. If it's empty, don't bother and just continue.
After running this code, I get the following for my data matrix:
>> format long g
>> celldisp(data)
data{1} =
0 -2.680148e-16 0
9.84392531300799e-12 -4.75347e-06 2.216314e-11
1.00005260577246e-11 -4.835427e-06 2.552497e-11
1.03137275471577e-11 -4.99934e-06 -3.042096e-12
1.09401305260241e-11 -5.327165e-06 -1.206968e-11
data{2} =
0 -2.680148e-16 0
9.84392531300799e-12 -4.75347e-06 2.216314e-11
1.00005260577246e-11 -4.835427e-06 2.552497e-11
1.03137275471577e-11 -4.99934e-06 -3.042096e-12
1.09401305260241e-11 -5.327165e-06 -1.206968e-11
To access a particular "table", do data{ii} where ii is the table you want to access that was read in from top to bottom in your text file.
The most versatile way is to read line by line using textscan. If you want to speed this process up, you can have a dummy read first:
ie. You loop through all the lines without storing the data and decide which lines are the text lines and which are numbers, recording a quick number of lines for each.
You then have enough information about the data to run through quickly the arrays. This will speed up the time it takes to store the data in your new arrays massively.
Your second loop is the one that actually reads the data into the array/s. You should now know which lines to skip. You can also pre-allocate the arrays within the data cell if you wish to.
fid = fopen('file.txt','r');
data = {};
nlines = [];
% now start the loop
k=0; % counter for data sets
while ~feof(fid)
line = fgetl(fid);
% check if is data or text
if all(ismember(line,' 0123456789+.')) % is it data
nlines(k) = nlines(k)+1;
else %is it text
k=k+1;
nlines(k) = 0;
end
end
frewind(fid); % go back to start of file
% You could preallocate the data array here if you wished
% now get the data
for aa = 1 : length(nlines)
if nlines(aa)==0;
continue
end
textscan(fid,'%s\r\n',1); % skip textline
data{aa} = textscan(fid,'%f%f%f\r\n',nlines(k));
end
i have a mat data and extract 8 feature this file.
i should arrange this features as a cell and repeat that for 12 category
i want to arrange and uniform this code as one code?
feature_mean1=zeros(12,15);
for vmean1= 1:12
feature_mean1(vmean1,:)= mean(Catrgoryy1{vmean1});
end
feature_mean2=zeros(12,15);
for vmean2= 1:12
feature_mean2(vmean2,:)= mean(Catrgoryy2{vmean2});
end
%**********************
%***************
feature_min1=zeros(12,15);
for vmin1= 1:12
feature_min1(vmin1,:)= min(Catrgoryy1{vmin1});
end
feature_min2=zeros(12,15);
for vmin2= 1:12
feature_min2(vmin2,:)= min(Catrgoryy2{vmin2});
end
%***************
X=zeros(30,4);
h=1;
X_1=[feature_mean1(1,:)',feature_std1(1,:)',feature_min1(1,:)',feature_max1(1,:)',feature_mean2(1,:)',feature_std2(1,:)',feature_min2(1,:)',feature_max2(1,:)'];%
Y_1=repmat(1,length(X_1),1);
%%%**************222*************
X_2=[feature_mean1(2,:)',feature_std1(2,:)',feature_min1(2,:)',feature_max1(2,:)',feature_mean2(2,:)',feature_std2(2,:)',feature_min2(2,:)',feature_max2(2,:)'];
Y_2=repmat(2,length(X_2),1);
%%%**************333**************
.
.
.
X_12=[feature_mean1(12,:)',feature_std1(12,:)',feature_min1(12,:)',feature_max1(12,:)',feature_mean2(12,:)',feature_std2(12,:)',feature_min2(12,:)',feature_max2(12,:)'];
Y_12=repmat(12,length(X_12),1);
at first must be form 8 array for each of the Features
then insert all of them in the for loop
for o=1:12
Xf(o,:)=[feature_mean11{o},feature_std11{o},feature_min11{o},feature_max11{o},feature_mean22{o},feature_std22{o},feature_min22{o},feature_max22{o}];
end
finish
I have large data files and I would like to import 12 columns of data for further use. However the row length will be different in each instance. I would import the selected columns only but below the data I need are some blank rows followed by extra numbers which aren't necessary, so I'm wondering how to import just the data I need? I don't mind specifying and end row but this would be different for each case and I'm not sure if I'm missing anything else obvious! To help I've attached a print-screen of an example of the data I'm working with:
To summarise I only require the "blue" data above the purple boxes, each file I will use will have the same layout except there may be more/less rows of data.
EDIT
I have updated the code to give you a better understanding of the process:
% An empty array:
importedarray = [];
% Open the data file to read from it:
fid = fopen( 'dummydata.txt', 'r' );
% Check that fid is not -1 (error openning the file).
% read each line of the data in a cell array:
cac = textscan( fid, '%s', 'Delimiter', '\n' );
% size(cac{1},1) must equals the # of rows in your data file.
totalRows = size(cac{1},1);
fprintf('Imported %d rows of data!\n',totalRows)
% Close the file as we don't need it anymore:
fclose( fid );
% for total rows in data
for k=1:totalRows
fprintf('Parsing data on row %d of %d...\n',k,totalRows);
currentRow = cac{1}{k,1};
fprintf('Row contains:\n%s\n',currentRow);
% finish (break from loop) when encounter an empty row:
if isempty(currentRow)
fprintf('Empty row encountered (#%d). Exiting the loop...\n',k);
break;
end
eachRowElement = strsplit(currentRow, ' ');
fprintf('Splitting row to %d elements...\n',length(eachRowElement));
fprintf('Converting row to floats...');
eachRowElement2num = cellfun(#str2num,eachRowElement,'UniformOutput',false);
fprintf('Done!\n');
fprintf('Converting cell to matrix...');
importedarray(k,:) = cell2mat(eachRowElement2num);
fprintf('Done!\n');
end
clearvars cac k fid totalRows currentRow eachRowElement eachRowElement2num;
Given your example image (that all the columns of each row are filled with floats and on an empty row you stop) this should do the job giving info along the way. If not you will be able to tell what is the issue by looking the line the code stopped. I include code to eliminate the unnecessary variables after importing. This must be done manually or you can create a function to perform the task (functions' work space is different the the temporary variables are deleted on function return, see: http://www.mathworks.com/help/matlab/ref/function.html). Hope this helps.
PS. In your example you keep 12 columns skipping the first two. The above code will import the whole row. You can choose what columns to keep later by using matrix indexing, like:
importedarray = importedarray(:,3:14);
if these columns don't change you can incorporate this into your function.
I have a large matrix in xlsx file which contains chars as following for example:
1,26:45:32.350,6,7,8,9,9,0,0,0
1,26:45:32.409,5,7,8,9,9,0,75,89
I want to make the 2nd column (the one which contains 26:45:32:350)
as a time vector and all the rest as a double matrix.
I tried the next code on like 50000 rows and it worked.
[FileName PathName] = uigetfile('*.xlsx','XLSX Files');
fid = fopen(FileName);
T=char(importdata(FileName));
Time=T(:,5:16);
Data=str2double(T);
However, when I tested it on the whole matrix (about 500,000 roww), I recieved Data=[] instead of matrix.
Is there any other thing I can do so 'Data' will be double matrix even for large matrix?
The excel file contains 1 column and around 500,000 rows, so the whole line 1,26:45:32:350,6,7,8,9,9,0,0,0 is inside 1 cell.
Also, I wrote another code,which works but take alot of time to run.
[FileName PathName] = uigetfile('*.xlsx','XLSX Files');
fid = fopen(FileName);
T=importdata(FileName);
h = waitbar(0,'Converting Data to cell array, please wait...');
for i=1:length(T)
delimiter_index=[0 find(T{i,1}(:)==char(44))'];
for j=1:length(delimiter_index)-1
Data{i,j}=T{i,1}(delimiter_index(j)+1:delimiter_index(j+1)-1);
end
waitbar(i/length(T));
end
close(h)
h = waitbar(0,'Seperating Data to time and data, please wait...');
for i=1:length(T)
Full_Time(i,:)=Data{i,2};
Data{i,2}=Data{i,1};
Data{i,1}=Full_Time(i,:);
waitbar(i/length(T));
end
close(h)
Data(:,1)=[];
h = waitbar(0,'Changing data cell to mat, please wait...');
for i=1:size(Data,1)
for j=1:size(Data,2)
Matrix(i,j)=str2num(Data{i,j});
end
waitbar(i/size(Data,1));
end
close(h)
Running this code for like 20000 rows shows that:(slowest to fastest)
waitbar
allchild
str2num
importdata
So basically I can remove this waitbar, but allchild (not sure what it is) and str2num take most of the time.
Is there anything I can do to make it run faster?
The following piece of code works when data is passed as a 1x50 array.
(Data is in fact a struct that passes several other parameters too). In the 1x50 case a 4x1 array of parameters is returned for each i (the value of de.nP is 600).
However I want to change it so that I can pass a matrix of data say d dates so that the matrix has dimension dx50. This will then return a 4xd array for each i.
My question is should I use a cell array or a 3D array to store the values?
Seems to me both methods could do the job?
for i=1:de.nP
betas(:,i)=NSS_betas(P1(:,i),data);
end
Going further into the code I will need to use
Params=vertcat(betas,P1);
Where P1 is a 2x1 array. So for each date (i) I need to concatenate the contents of P(1) to all the betas for that date.
Will this affect the choice of whether to use cellarray or 3D array?
It seems to me cellarray is better suited to vectorised code (Which is what I am trying to use as much as possible) but 3D array might be easier to use with functions like vertcat?
Here is the whole code
mats=[1:50];
mats2=[2 5 10 30];
betaTRUE=[5 -2 5 -5 1 3; 4 -3 6 -1 2 4];
for i=1:size(betaTRUE,1)
yM(i,:)=NSS(betaTRUE(i,:),mats);
y2(i,:)=NSS(betaTRUE(i,:),mats2);
end
dataList=struct('yM',yM,'mats',mats,'model',#NSS,'mats2',mats2,'y2',y2);
de=struct('min',[0; 2.5],'max', [2.5;5],'d',2,'nP',200,'nG',300,'ww',0.1,'F',0.5,'CR',0.99,'R',0,'oneElementfromPm',1);
beta=DElambdaVec(de,dataList,#OF);
function [output]=DElambdaVec(de,data,OF)
P1=zeros(de.d,de.nP);
Pu=zeros(de.d,de.nP);
for i=1:de.d
P1(i,:)=de.min(i,1)+(de.max(i,1)-de.min(i,1))*rand(de.nP,1);
end
P1(:,1:de.d)=diag(de.max);
P1(:,de.d+1:2*de.d)=diag(de.min);
for i=1:de.nP
betas(:,i)=NSS_betas(P1(:,i),data);
end
Params=vertcat(betas,P1);
Fbv=NaN(de.nG,1);
Fbest=realmax;
F=zeros(de.nP,1);
P=zeros(de.nP,1);
for i=1:de.nP
F(i)=OF(Params(:,i)',data);
P(i)=pen(P1(:,i),de,F(i));
F(i)=F(i)+P(i);
end
[Fbest indice] =min(F);
xbest=Params(:,indice);
%vF=vF+vP;
%NaN(de.nG,de.nP);
Col=1:de.nP;
for g=1:de.nG
P0=P1;
rowS=randperm(de.nP)';
colS=randperm(4)';
RS=circshift(rowS,colS(1));
R1=circshift(rowS,colS(2));
R2=circshift(rowS,colS(3));
R3=circshift(rowS,colS(4));
%mutate
Pm=P0(:,R1)+de.F*(P0(:,R2)-P0(:,R3));
%extra mutation
if de.R>0
Pm=Pm+de.r*randn(de.d,de.nP);
end
%crossover
PmElements=rand(de.d,de.nP)<de.CR;
%mPv(MI)=mP(Mi);
if de.oneElementfromPm
Row=unidrnd(de.d,1,de.nP);
ExtraPmElements=sparse(Row,Col,1,de.d,de.nP);
PmElements=PmElements|ExtraPmElements;
end
P0_Elements=~PmElements;
Pu(:,RS)=P0(:,RS).*P0_Elements+PmElements.*Pm;
for i=1:de.nP
betasPu(:,i)=NSS_betas(Pu(:,i),data);
end
ParamsPu=vertcat(betasPu,Pu);
flag=0;
for i=1:de.nP
Ftemp=OF(ParamsPu(:,i)',data);
Ptemp=pen(Pu(:,i),de,F(i));
Ftemp=Ftemp+Ptemp;
if Ftemp<=F(i);
P1(:,i)=Pu(:,i);
F(i)=Ftemp;
if Ftemp < Fbest
Fbest=Ftemp; xbest=ParamsPu(:,i); flag=1;
end
else
P1(:,i)=P0(:,i);
end
end
if flag
Fbv(g)=Fbest;
end
end
output.Fbest=Fbest; output.xbest=xbest; output.Fbv=Fbv;
end
function penVal=pen(mP,pso,vF)
minV=pso.min;
maxV=pso.max;
ww=pso.ww;
A=mP-maxV;
A=A+abs(A);
B=minV-mP;
B=B+abs(B);
C=ww*((mP(1,:)+mP(2,:))-abs(mP(1,:)+mP(2,:)));
penVal=ww*sum(A+B,1)*vF-C;
end
function betas=NSS_betas(lambda,data)
mats=data.mats2';
lambda=lambda;
yM=data.y2';
nObs=size(yM,1);
G= [ones(nObs,1) (1-exp(-mats./lambda(1)))./(mats./lambda(1)) ((1-exp(- mats./lambda(1)))./(mats./lambda(1))-exp(-mats./lambda(1))) ((1-exp(- mats./lambda(2)))./(mats./lambda(2))-exp(-mats./lambda(2)))];
betas=G\yM;
end
This does the trick will require extensive recoding in the rest of the function though!
betas=zeros(4,size(data.y2,1),de.nP);
for i=1:de.nP
betas(:,:,i)=NSS_betas(P1(:,i),data);
end