Merge all data from xyz files in Matlab - matlab

I have a set of 501 XYZ files which I load in as
for k = 1:501
% AIS SEC data
AIS_SEC{k} = importdata(['AIS_SEC(' num2str(k) ').xyz']);
end
This generates an 1x501 cell array in which all data are stored (I uploaded this file as in attachment at https://nl.mathworks.com/matlabcentral/answers/486579-how-to-merge-multiple-xyz-files-into-1-large-array). How can I merge all these data to have 1 large XYZ file? The ultimate goal is the have a nx3 array where all of the data from the seperate xyz files are merged in to 1.
For example, to concentrate X data, I tried:
for k = 1:501
my_field = sprintf('X%d', k);
variable.(my_field) = ([AIS_SEC{1,k}.data(:,1)]);
end
BUT: Dot indexing is not supported for variables of this type.
Thanks!

There are several wrong things with your code:
First thing, the error Struct contents reference from a non-struct array object. shows up first at index k=33 because the structure imported has no data field (the import was probably empty or failed).
Checking for the presence of the field let the code run through to completion. You'll then notice that you have 8 rows which are empty.
load('AIS_SEC.mat')
n=numel(AIS_SEC) ;
EmptyRows = false(n,1) ;
for k = 1:n
my_field = sprintf('X%03d', k);
if isfield( AIS_SEC{1,k} , 'data')
variable.(my_field) = AIS_SEC{1,k}.data;
else
variable.(my_field) = [] ;
EmptyRows(k) = true ;
end
end
fprintf('Number of empty rows encountered: %u\n',sum(EmptyRows))
I took the liberty to also remove unnecessary parenthesis, added an empty row index counter, and adjusted the sprintf output format so all your field will have the same length (even the first field will have the leading zeros. 'X001' to 'X500' instead of 'X1' to 'X500').
Also, this was only retrieving the first column out of each structure, so I modified it to retrieve the 3 x,y,z columns. If you really wanted only the first column, just replace variable.(my_field) = AIS_SEC{1,k}.data by variable.(my_field) = AIS_SEC{1,k}.data(:,1).
Now this gives you a long structure with 500 fields (each representing one imported variable). Your question is not clear enough on that point but if you want to have a single array where all the values are merged, then you have 2 options:
1) Directly after the code above, convert your structure to a merged array:
vararray = cell2mat(struct2cell(variable)) ;
2) If the steps above (the final variable structure) is not something you need to keep, then you can avoid it in the first place:
load('AIS_SEC.mat')
n = numel(AIS_SEC) ; % number of field to import
% first pass we count how many data point (if any) each structure has
EmptyRows = false(n,1) ;
npts = zeros(n,1) ;
for k = 1:n
if isfield( AIS_SEC{1,k} , 'data')
npts(k) = size( AIS_SEC{1,k}.data , 1 ) ;
else
EmptyRows(k) = true ;
end
end
% fprintf('Number of empty rows encountered: %u\n',sum(EmptyRows))
% The above allows us to preallocate the output matrix at the right size
cumpts = cumsum(npts) ;
totalNumberOfPoints = cumpts(end) ;
vararray = zeros(totalNumberOfPoints,3) ;
% first field to import
vararray( 1:cumpts(1) , : ) = AIS_SEC{1,1}.data ;
% now all the remaining ones
for k = 2:n
idx = (cumpts(k-1)+1):cumpts(k) ;
if ~isempty(idx)
vararray(idx,:) = AIS_SEC{1,k}.data ;
end
end
In this version there are 2 passes of the loop accross the structure. Counter intuitively this is done for better performances. The first pass is only to count the number of data points in each structure (and also to flag the empty ones). Thanks to the number returned by the first pass, we can preallocate the output matrix before the second pass and assign each structure data to the merged array in the right place without having to resize the output array at each iteration.

Related

Importing a txt file into Matlab and reading the data

I need to import a .txt datafile into Matlab. The file has been made into 3 columns. Each column has specific numbers for a given variable. The script code must be able to do the following,
Requirement
1) import the data from txt into Matlab
2) Matlab should remove the values from the columns if the values are out of a certain range
3) Matlab should tell which line and what type of error.
My Approach
I have tried using the following approach,
function data = insertData(filename)
filename = input('Insert the name of the file: ', 's');
data = load(filename);
Column1 = data(:,1);
Column2 = data(:,2);
Column3 = data(:,3);
%Ranges for each column
nclm1 = Column1(Column1>0);
nclm2 = Column2(Column2 >= 10 & Column2 <= 100);
nclm3 = Column3(Column3>0);
%Final new data columns within the ranges
final = [nclm1, nclm2, nclm3];
end
Problem
The above code has the following problems:
1) Matlab is not saving the imported data as 'data' after the user inserts the name of the file. Hence I don't know why my code is wrong.
filename =input('Insert the name of the file: ', 's');
data = load(filename);
2) The columns in the end do not have the same dimensions because I can see that Matlab removes values from the columns independently. Therefore is there a way in which I can make Matlab remove values/rows from a matrix rather than the three 'vectors', given a range.
1) Not sure what you mean by this. I created a sample text file and Matlab imports the data as data just fine. However, you are only returning the original unfiltered data so maybe that is what you mean??? I modified it to return the original data and the filtered data.
2) You need to or the bad indices together so that they are removed from each column like this. Note I made some other edits ... see comments in the code below:
function [origData, filteredData]= insertData(filename)
% You pass in filename then overwrite it ...
% Modified to only prompt if not passed in.
if ~exist('filename','var') || isempty(filename)
filename = input('Insert the name of the file: ', 's');
end
origData = load(filename);
% Ranges check for each column
% Note: return these if you want to know what data was filter for
% which reason
badIdx1 = origData(:,1) > 0;
badIdx2 = origData(:,2) >= 10 & origData(:,2) <= 100;
badIdx3 = origData(:,3)>0;
totalBad = badIdx1 | badIdx2 | badIdx3;
%Final new data columns within the ranges
filteredData = origData(~totalBad,:);
end
Note: you mentioned you want to know which line for which type of error. That information is now contained in badIDx1,2, 3. So you can return them, print a message to the screen, or whatever you need to display that information.

Append data to 2-D array Netcdf format Matlab

I have a function which generates yout_new(5000,1) at every iteration and I want to store this data to a netcdf file and append the new data generated at every iteration into this existing file . At the 2nd iteration the stored variable size should be yout_new(5000,2) . Here is my try which doesn't work. Is there is any nice way to do it ?
neq=5000;
filename='thrust.nc';
if ~exist(filename, 'file')
%% create file
ncid=netcdf.create(filename,'NC_WRITE');
%%define dimension
tdimID = netcdf.defDim(ncid,'t',...
netcdf.getConstant('NC_UNLIMITED'));
ydimID = netcdf.defDim(ncid,'y',neq);
%%define varibale
varid = netcdf.defVar(ncid,'yout','NC_DOUBLE',[ydimID tdimID]);
netcdf.endDef(ncid);
%%put variables from workspace ( i is the iteration)
netcdf.putVar(ncid,varid,[ 0 0 ],[ neq 0],yout_new);
%%close the file
netcdf.close(ncid);
else
%% open the existing file
ncid=netcdf.open(filename,'NC_WRITE');
%Inquire variables
[varname,xtype,dimids,natts] = netcdf.inqVar(ncid,0);
varid = netcdf.inqVarID(ncid,varname);
%Enquire current dimension length
[dimname, dimlen] = netcdf.inqDim(ncid,0);
% Append new data to existing variable.
netcdf.putVar(ncid,varid,dimlen,numel(yout_new),yout_new);
netcdf.close(ncid);
There are more easy functions in MATLAB, to deal with netCDF. You read about ncdisp, ncinfo,nccreate,ncread,ncwrite. Coming to the question, you said you have to write two columns, I will take number of columns as variable (infinity), every time you can append the columns. Check the below code:
N = 3 ; % number of columns
rows = 5000 ; % number of rows
ncfile = 'myfile.nc' ; % my ncfile name
nccreate(ncfile,'yout_new','Dimensions',{'row',rows,'col',Inf},'DeflateLevel',5) ; % creat nc file
% generate your data in loop and write to nc file
for i = 1:N
yout_new = rand(rows,1) ;
ncwrite(ncfile,'yout_new',yout_new,[1,i]) ;
end
Please not that, it is not mandatory to make number of columns as unlimited, you can fix it to your desired number instead of inf.

Only Import File when it contains certain numbers from a Table

I got a couple 100 sensor measurement files all containing the date and time of measurement. All the files have names that include date and time. Example:
07-06-2016_17-58-32.wf
07-06-2016_18-02-32.wf
...
...
08-06-2016_17:48-26.wf
I have a function (importfile) and a loop that imports my data. The loop looks like this:
Files = dir('C:\Osci\User\*.waveform');
numFiles = length(Files);
Data = cell(1, numFiles);
for fileNum = 1:numFiles
Data{fileNum} = importfile(Files(fileNum).name);
end
Not all of these waveform files are useful. The measurement files are only useful if they were generated in a certain time period. I got a table that shows my allowed time periods:
07-Jun-2016 18:00:01
07-Jun-2016 18:01:31
07-Jun-2016 18:02:01
...
I want to modify my loop, so that the files (.waveform files) are only imported if the numbers for day (first number), hour (4th number) and minute (5th number) from the files match the numbers of the table containing the allowed time periods.
EDIT: Rather than a scalar hour, minute, and second, there is a vector of each. In my case, MyDay, MyHour and MyMinute are 1100x1 matrices while fileTimes only consists of 361 rows.
So, using the provided example the loop should only import file
07-06-2016_18-02-32.wf
since it is the only one where the numbers match (in this case 7, 18, 02).
EDIT2: Using #erfan's answer (and changing some directories and variable names) I have the following working code:
fmtstr = 'O:\\Basic_Research_All\\Lange\\Skripe ISAT\\Rohdaten\\*_%02i-*-*_%02i-%02i-*.wf';
Files = struct([]);
n = size(MyDayMyHourMyMinute);
for N = 1:n;
Files = [Files; dir(sprintf(fmtstr, MyDayMyHourMyMinute(N,:)))];
end
numFiles = length(Files);
WaveformData = cell(1, numFiles);
for fileNum = 1:numFiles
WaveformData{fileNum} = importfile(Files(fileNum).name);
end
Since your filenames are pretty well defined as dates and times, you can prefilter your list by turning them into actual dates and times:
% Get the file list
Files = dir('C:\Osci\User\*.waveform');
% You only need the names
Files = {Files.name};
% Get just the filename w/o the extension
[~, baseFileNames] = cellfun(#(x) fileparts(x), Files, 'UniformOutput', false);
% Your filename is just a date, so parse it as such
fileTimes = datevec(baseFileNames, 'mm-dd-yyyy_HH-MM-SS');
% Now pick out the files you want
% goodFiles = fileTimes(:, 4) == myHour & fileTimes(:, 5) == myMinute & fileTimes(:, 6) == mySecond;
goodFiles = ismember(fileTimes(:, 4:6), [myHour(:), myMinute(:), mySecond(:)], 'rows');
% Pare down your list of filenames
Files = Files(goodFiles);
% Preallocate your data cell
Data = cell(1, numel(Files));
% Now do your loop
for idx = 1:numel(Data)
Data{idx} = importfile(Files{idx});
end
You will, of course, need to define myHour, myMinute and mySecond. Of course, using the logical indexing in goodFiles, you could impose any sort of time criteria, like time or date range. If you find that your filenames aren't so well defined, you could parse out the filename using textscan or strfind to get the bits you want. The important thing is that cell arrays can be indexed into in much the same way as numerical or string arrays and it's often better to vectorize your filter criteria and then only do the loop on the parts you have to.
The OP indicated in a comment below that rather than a scalar hour, minute, and second, there is a vector of each. In that case, use ismember to match the two time vectors and return a logical index vector. With 2015a, MathWorks introduced the function ismembertol, which allows one to check membership within a certain tolerance.
You can apply your selection from the beginning. Imagine the acceptable values for day, hour and minute are saved in acc as an n*3 matrix. If you replace the first line of your code with:
fmtstr = 'C:\Osci\User\%02i-*-*_%02i-%02i-*.wf';
Files = struct([]);
for ii = 1:n
Files = [Files; dir(sprintf(fmtstr, acc(ii,:)))];
end
Then you have already applied your criteria to Files. The rest is the same.

Create a 2 column matrix with 2 different format types

very very new to Matlab and I'm having trouble reading a binary file into a matrix. The problem is I am trying to write the binary file into a two column matrix (which has 100000's of rows) where each column is a different format type.
I want column 1 to be in 'int8' format and column 2 to be a 'float'
This is my attempt so far:
FileID= fopen ('MyBinaryFile.example');
[A,count] = fread(FileID,[nrows, 2],['int8','float'])
This is not working because I get the error message 'Error using fread' 'Invalid Precision'
I will then go on to plot once I have successfully done this.
Probably a very easy solution to someone with matlab experience but I haven't been successful at finding a solution on the internet.
Thanks in advance to anyone who can help.
You should be aware that Matlab cannot hold different data type in a matrix (it can do so in a cell array but this is another topic). So there is no point trying to read your mixed type file in one go in one single matrix ... it is not possible.
Unless you want a cell array, you will have to use 2 different variables for your 2 columns of different type. Once this is established, there are many ways to read such a file.
For the purpose of the example, I had to create a binary file as you described. This is done this way:
%% // write example file
A = int8(-5:5) ; %// a few "int8" data
B = single(linspace(-3,1,11)) ; %// a few "float" (=single) data
fileID = fopen('testmixeddata.bin','w');
for il=1:11
fwrite(fileID,A(il),'int8');
fwrite(fileID,B(il),'single');
end
fclose(fileID);
This create a 2 column binary file, with first column: 11 values of type int8 going from -5 to +5, and second column: 11 values of type float going from -3 to 1.
In each of the solution below, the first column will be read in a variable called C, and the second column in a variable called D.
1) Read all data in one go - convert to proper type after
%% // Read all data in one go - convert to proper type after
fileID = fopen('testmixeddata.bin');
R = fread(fileID,'uint8=>uint8') ; %// read all values, most basic data type (unsigned 8 bit integer)
fclose(fileID);
R = reshape( R , 5 , [] ) ; %// reshape data into a matrix (5 is because 1+4byte=5 byte per column)
temp = R(1,:) ; %// extract data for first column into temporary variable (OPTIONAL)
C = typecast( temp , 'int8' ) ; %// convert into "int8"
temp = R(2:end,:) ; %// extract data for second column
D = typecast( temp(:) , 'single' ) ; %// convert into "single/float"
This is my favourite method. Specially for speed because it minimizes the read/seek operations on disk, and most post calculations are done in memory (much much faster than disk operations).
Note that the temporary variable I used was only for clarity/verbose, you can avoid it altogether if you get your indexing into the raw data right.
The key thing to understand is the use of the typecast function. And the good news is it got even faster since 2014b.
2) Read column by column (using "skipvalue") - 2 pass approach
%% // Read column by column (using "skipvalue") - 2 pass approach
col1size = 1 ; %// size of data in column 1 (in [byte])
col2size = 4 ; %// size of data in column 2 (in [byte])
fileID = fopen('testmixeddata.bin');
C = fread(fileID,'int8=>int8',col2size) ; %// read all "int8" values, skipping all "float"
fseek(fileID,col1size,'bof') ; %// rewind to beginning of column 2 at the top of the file
D = fread(fileID,'single=>single',col1size) ; %// read all "float" values, skipping all "int8"
fclose(fileID);
That works too. It works fine ... but probably much slower than above. Although it may be clearer code to read for someone else ... I find that ugly (and yet I've used this way for several years until I got to use the method above).
3) Read element by element
%% // Read element by element (slow - not recommended)
fileID = fopen('testmixeddata.bin');
C=[];D=[];
while ~feof(fileID)
try
C(end+1) = fread(fileID,1,'int8=>int8') ;
D(end+1) = fread(fileID,1,'single=>single') ;
catch
disp('reached End Of File')
end
end
fclose(fileID);
Talking about ugly code ... that does work too, and if you were writing C code it would be more than ok. But in Matlab ... please avoid ! (well, your choice ultimately)
Merging in one variable
If really you want all of that in one single variable, it could be a structure or a cell array. For a cell array (to keep matrix indexing style), simply use:
%% // Merge into one "cell array"
Data = { C , D } ;
Data =
[11x1 int8] [11x1 single]

How to import a sequence of Excel Files in matlab as a column vectors or as a cell array?

I want to import a sequence of excel files with a large amount of data in them. The problem that I have is I want to process the data in each file at a time and store the output from this into a variable, but each time I try to process a different file the variable gets overwritten in the variable workspace. Is there anyway I could store these files and process each file at a time?
numFiles = 1;
range = 'A2:Q21';
sheet = 1;
myData = cell(1,numFiles); % Importing data from Excel
for fileNum = 1:numFiles
fileName = sprintf('myfile%02d.xlsx',fileNum);
myData{fileNum} = importfile3(fileName,sheet,range);
end
data = cell2mat(myData);
The actual data import is performed by importfile3 which is, for the most part, a wrapper for the xlsread function that returns a matrix corresponding to the specified range of excel data.
function data = importfile3(workbookFile, sheetName, range)
% If no sheet is specified, read first sheet
if nargin == 1 || isempty(sheetName)
sheetName = 1;
end
% If no range is specified, read all data
if nargin <= 2 || isempty(range)
range = '';
end
%% Import the data
[~, ~, raw] = xlsread(workbookFile, sheetName, range);
%% Replace non-numeric cells with 0.0
R = cellfun(#(x) ~isnumeric(x) || isnan(x),raw); % Find non-numeric cells
raw(R) = {0.0}; % Replace non-numeric cells
%% Create output variable
data = cell2mat(raw);
The issue that you are running in to is a result of cell2mat concatenating all of the data in your cells in to one large 2-dimensional matrix. If you were to import two excel files with 20 rows and 17 columns, each, this would result in a 2-dimensional matrix of size [20 x 34]. The doc for cell2mat has a nice visual describing this.
I see that your importfile3 function returns a matrix, and based on your use of cell2mat in your final line of code, it looks like you would like to have your final result be in the form of a matrix. So I think the easiest way to go about this is to just bypass the intermediate myData cell array.
In the example code below, the resulting data is a 3-dimensional matrix. The 1st dimension indicates row number, 2nd dimension is column number, and 3rd dimension is file number. Cell arrays are very useful for "jagged" data, but based on the code you provided, each excel data set that you import will have the same number of rows and columns.
numFiles = 2;
range = 'A2:Q21';
sheet = 1;
% Number of rows and cols known before data import
numRows = 20;
numCols = 17;
data = zeros(numRows,numCols,numFiles);
for fileNum = 1:numFiles
fileName = sprintf('myfile%02d.xlsx',fileNum);
data(:,:,fileNum) = importfile3(fileName,sheet,range);
end
Accessing this data is now very straight-forward.
data(:,:,1) returns the data imported from your first excel file.
data(:,:,2) returns the data imported from your second excel file.
etc.