I need to import a .txt datafile into Matlab. The file has been made into 3 columns. Each column has specific numbers for a given variable. The script code must be able to do the following,
Requirement
1) import the data from txt into Matlab
2) Matlab should remove the values from the columns if the values are out of a certain range
3) Matlab should tell which line and what type of error.
My Approach
I have tried using the following approach,
function data = insertData(filename)
filename = input('Insert the name of the file: ', 's');
data = load(filename);
Column1 = data(:,1);
Column2 = data(:,2);
Column3 = data(:,3);
%Ranges for each column
nclm1 = Column1(Column1>0);
nclm2 = Column2(Column2 >= 10 & Column2 <= 100);
nclm3 = Column3(Column3>0);
%Final new data columns within the ranges
final = [nclm1, nclm2, nclm3];
end
Problem
The above code has the following problems:
1) Matlab is not saving the imported data as 'data' after the user inserts the name of the file. Hence I don't know why my code is wrong.
filename =input('Insert the name of the file: ', 's');
data = load(filename);
2) The columns in the end do not have the same dimensions because I can see that Matlab removes values from the columns independently. Therefore is there a way in which I can make Matlab remove values/rows from a matrix rather than the three 'vectors', given a range.
1) Not sure what you mean by this. I created a sample text file and Matlab imports the data as data just fine. However, you are only returning the original unfiltered data so maybe that is what you mean??? I modified it to return the original data and the filtered data.
2) You need to or the bad indices together so that they are removed from each column like this. Note I made some other edits ... see comments in the code below:
function [origData, filteredData]= insertData(filename)
% You pass in filename then overwrite it ...
% Modified to only prompt if not passed in.
if ~exist('filename','var') || isempty(filename)
filename = input('Insert the name of the file: ', 's');
end
origData = load(filename);
% Ranges check for each column
% Note: return these if you want to know what data was filter for
% which reason
badIdx1 = origData(:,1) > 0;
badIdx2 = origData(:,2) >= 10 & origData(:,2) <= 100;
badIdx3 = origData(:,3)>0;
totalBad = badIdx1 | badIdx2 | badIdx3;
%Final new data columns within the ranges
filteredData = origData(~totalBad,:);
end
Note: you mentioned you want to know which line for which type of error. That information is now contained in badIDx1,2, 3. So you can return them, print a message to the screen, or whatever you need to display that information.
Related
This function reads the data from multiple mat files and save them in multiple txt files. But the data (each value) are saved one value in one column and so on. I want to save the data in a form of three columns (coordinates) in the text files, so each row has three values separated by space. Reshape the data before i save them in a text file doesn't work. I know that dlmwrite should be modified in away to make newline after three values but how?
mat = dir('*.mat');
for q = 1:length(mat)
load(mat(q).name);
[~, testName, ~] = fileparts(mat(q).name);
testVar = eval(testName);
pos(q,:,:) = testVar.Bodies.Positions(1,:,:);
%pos=reshape(pos,2,3,2000);
filename = sprintf('data%d.txt', q);
dlmwrite(filename , pos(q,:,:), 'delimiter','\t','newline','pc')
end
My data structure:
These data should be extracted from each mat file and stored in the corresponding text files like this:
332.68 42.76 42.663 3.0737
332.69 42.746 42.655 3.0739
332.69 42.75 42.665 3.074
A TheMathWorks-trainer once told me that there is almost never a good reason nor a need to use eval. Here's a snippet of code that should solve your writing problem using writematrix since dlmwrite is considered to be deprecated.
It further puts the file-handling/loading on a more resilient base. One can access structs dynamically with the .(FILENAME) notation. This is quite convenient if you know your fields. With who one can list variables in the workspace but also in .mat-files!
Have a look:
% path to folder
pFldr = pwd;
% get a list of all mat-files (returns an array of structs)
Lst = dir( fullfile(pFldr,'*.mat') );
% loop over files
for Fl = Lst.'
% create path to file
pFl = fullfile( Fl.folder, Fl.name );
% variable to load
[~, var2load, ~] = fileparts(Fl.name);
% get names of variables inside the file
varInfo = who('-file',pFl);
% check if it contains the desired variables
if ~all( ismember(var2load,varInfo) )
% display some kind of warning/info
disp(strcat("the file ",Fl.name," does not contain all required varibales and is therefore skipped."))
% skip / continue with loop
continue
end
% load | NO NEED TO USE eval()
Dat = load(pFl, var2load);
% DO WHATEVER YOU WANT TO DO
pos = squeeze( Dat.(var2load)(1,:,1:2000) );
% create file name for text file
pFl2save = fullfile( Fl.folder, strrep(Fl.name,'.mat','.txt') );
writematrix(pos,pFl2save,'Delimiter','\t')
end
To get your 3D-matrix data into a 2D matrix that you can write nicely to a file, use the function squeeze. It gets rid of empty dimensions (in your case, the first dimension) and squeezes the data into a lower-dimensional matrix
Why don't you use writematrix() function?
mat = dir('*.mat');
for q = 1:length(mat)
load(mat(q).name);
[~, testName, ~] = fileparts(mat(q).name);
testVar = eval(testName);
pos(q,:,:) = testVar(1,:,1:2000);
filename = sprintf('data%d.txt', q);
writematrix(pos(q,:,:),filename,'Delimiter','space');
end
More insight you can find here:
https://www.mathworks.com/help/matlab/ref/writematrix.html
I have a set of 501 XYZ files which I load in as
for k = 1:501
% AIS SEC data
AIS_SEC{k} = importdata(['AIS_SEC(' num2str(k) ').xyz']);
end
This generates an 1x501 cell array in which all data are stored (I uploaded this file as in attachment at https://nl.mathworks.com/matlabcentral/answers/486579-how-to-merge-multiple-xyz-files-into-1-large-array). How can I merge all these data to have 1 large XYZ file? The ultimate goal is the have a nx3 array where all of the data from the seperate xyz files are merged in to 1.
For example, to concentrate X data, I tried:
for k = 1:501
my_field = sprintf('X%d', k);
variable.(my_field) = ([AIS_SEC{1,k}.data(:,1)]);
end
BUT: Dot indexing is not supported for variables of this type.
Thanks!
There are several wrong things with your code:
First thing, the error Struct contents reference from a non-struct array object. shows up first at index k=33 because the structure imported has no data field (the import was probably empty or failed).
Checking for the presence of the field let the code run through to completion. You'll then notice that you have 8 rows which are empty.
load('AIS_SEC.mat')
n=numel(AIS_SEC) ;
EmptyRows = false(n,1) ;
for k = 1:n
my_field = sprintf('X%03d', k);
if isfield( AIS_SEC{1,k} , 'data')
variable.(my_field) = AIS_SEC{1,k}.data;
else
variable.(my_field) = [] ;
EmptyRows(k) = true ;
end
end
fprintf('Number of empty rows encountered: %u\n',sum(EmptyRows))
I took the liberty to also remove unnecessary parenthesis, added an empty row index counter, and adjusted the sprintf output format so all your field will have the same length (even the first field will have the leading zeros. 'X001' to 'X500' instead of 'X1' to 'X500').
Also, this was only retrieving the first column out of each structure, so I modified it to retrieve the 3 x,y,z columns. If you really wanted only the first column, just replace variable.(my_field) = AIS_SEC{1,k}.data by variable.(my_field) = AIS_SEC{1,k}.data(:,1).
Now this gives you a long structure with 500 fields (each representing one imported variable). Your question is not clear enough on that point but if you want to have a single array where all the values are merged, then you have 2 options:
1) Directly after the code above, convert your structure to a merged array:
vararray = cell2mat(struct2cell(variable)) ;
2) If the steps above (the final variable structure) is not something you need to keep, then you can avoid it in the first place:
load('AIS_SEC.mat')
n = numel(AIS_SEC) ; % number of field to import
% first pass we count how many data point (if any) each structure has
EmptyRows = false(n,1) ;
npts = zeros(n,1) ;
for k = 1:n
if isfield( AIS_SEC{1,k} , 'data')
npts(k) = size( AIS_SEC{1,k}.data , 1 ) ;
else
EmptyRows(k) = true ;
end
end
% fprintf('Number of empty rows encountered: %u\n',sum(EmptyRows))
% The above allows us to preallocate the output matrix at the right size
cumpts = cumsum(npts) ;
totalNumberOfPoints = cumpts(end) ;
vararray = zeros(totalNumberOfPoints,3) ;
% first field to import
vararray( 1:cumpts(1) , : ) = AIS_SEC{1,1}.data ;
% now all the remaining ones
for k = 2:n
idx = (cumpts(k-1)+1):cumpts(k) ;
if ~isempty(idx)
vararray(idx,:) = AIS_SEC{1,k}.data ;
end
end
In this version there are 2 passes of the loop accross the structure. Counter intuitively this is done for better performances. The first pass is only to count the number of data points in each structure (and also to flag the empty ones). Thanks to the number returned by the first pass, we can preallocate the output matrix before the second pass and assign each structure data to the merged array in the right place without having to resize the output array at each iteration.
I have several datasets, called '51.raw' '52.raw'... until '69.raw' and after I run these datasets in my code the size of these datasets changes from 375x91x223 to sizes with varying y-dimensions (i.e. '51.raw' output: 375x45x223; '52.raw' output: 375x50x223, ... different with each dataset).
I want to later save the '.raw' file name with this information (i.e. '51_375x45x223.raw') and also want to use the new dataset size to later reshape the dataset within my code. I have attempted to do this but need help:
for k=51:69
data=reshape(data,[375 91 223]); % from earlier in the code after importing data
% then executes code with dimensions of 'data' chaging to 375x45x223, ...
length=size(data); dimensions.([num2str(k)]) = length; %save size in 'dimensions'.
path=['C:\Example\'];
name= sprintf('%d.raw',k);
write([path name], data);
% 'write' is a function to save the dat in specified path and name (value of k). I don't know how to add the size of the dataset to the name.
Also later I want to reshape the dataset 'data' for this iteration and do a reshape with the new y dimensions value.
i.e. data=reshape(data,[375 new y-dimension 223]);
Your help will be appreciated. Thanks.
You can easily convert your dimensions to a string which will be saved as a file.
% Create a string of the form: dim1xdim2xdim3x...
dims = num2cell(size(data));
dimstr = sprintf('%dx', dims{:});
dimstr = dimstr(1:end-1);
% Append this to your "normal" filename
folder = 'C:\Example\';
filename = fullfile(folder, sprintf('%d_%s.raw', k, dimstr));
write(filename, data);
That being said, it is far better include this dimension information within the file itself rather than relying on the filename.
As a side note, avoid using names of internal functions as variable names such as length, and path. This can potentially result in strange and unexpected behavior in the future.
Update
If you need to parse the filename, you could use textscan to do that:
filename = '1_2x3x4.raw';
ndims = sum(filename == 'x') + 1;
fspec = repmat('%dx', [1 ndims]);
parts = textscan(filename, ['%d_', fspec(1:end-1)]);
% Then load your data
% Now reshape it based on the filename
data = reshape(data, parts{2:end});
How can I assign a name to a matrix being written from MATLAB to Excel using xlswrite?
This is equivalent to opening the xls sheet, selecting the data and right-click->define name. Here I can assign a string to the matrix. How can I do this from MATLAB?
Another application I use reads the data from the xls file based on named range.
You cannot do that from the function xlswrite as far as i know. You have to use an excel COM server from within Matlab.
The example below is derived from the Mathworks example, but adapted to your need.
Note that you won't need to use xlswrite this way, you input your data directly into the excel sheet.
%// First, open an Excel Server.
e = actxserver('Excel.Application');
%// Insert a new workbook.
eWorkbook = e.Workbooks.Add;
e.Visible = 1;
%// Make the first sheet active.
eSheets = e.ActiveWorkbook.Sheets;
eSheet1 = eSheets.get('Item', 1);
eSheet1.Activate;
%// Name the range you want:
e.Names.Add( 'TestRange' , '=Sheet1!$A$1:$B$2' ) ;
%// since you are here, directly put the MATLAB array into the range.
A = [1 2; 3 4];
eActivesheetRange = e.Activesheet.get('Range', 'TestRange');
eActivesheetRange.Value = A ;
%// Now, save the workbook.
eWorkbook.SaveAs('myfile.xls');
Edit to answer your question in comment:
If you want to programatically define the range address, you have to build the address as a string.
The row index is easy enough, but excel activeX in Matlab does not allow referencing cells/column/range by index, so there is a bit more fiddling to get the address of the column right (in case you overflow on the AAA... ranges.
The bit below should let you do that:
%// test data
A = rand(4,32);
%// input the upper left start of the range you want
sheetName = 'Sheet1' ;
startCol = 'Y' ;
startRow = 4 ;
%// calculate the end coordinates of the range
endRow = startRow + size(A,1)-1 ;
%// a bit more fiddling for the columns, as they could have many "characters"
cols = double(startCol)-64 ;
colIndex = sum(cols .* 26.^(length(cols)-1:-1:0)) ; %// index of starting column
endCol = eSheet1.Columns.Item(colIndex+size(A,2)-1).Address ;
endCol = endCol(2:strfind(endCol, ':')-1) ; %// column string reference
%// build range string in excel style
rngString = sprintf('=%s!$%s$%d:$%s$%d',sheetName,startCol,startRow,endCol,endRow) ;
myRange = eSheet1.Range(rngString) ; %// reference a range object
myRange.Value = A ; %// assign your values
%// Add the named range
e.Names.Add( 'TestRange' , myRange ) ;
The test data deliberately overflow from column Y to column BD to make sure that the address translation could handle (i) adding a character, and (ii) overflowing from Ax to By. This seems to work good, but if your data arrays are not that wide then you don't need to worry about it anyway.
I want to import a sequence of excel files with a large amount of data in them. The problem that I have is I want to process the data in each file at a time and store the output from this into a variable, but each time I try to process a different file the variable gets overwritten in the variable workspace. Is there anyway I could store these files and process each file at a time?
numFiles = 1;
range = 'A2:Q21';
sheet = 1;
myData = cell(1,numFiles); % Importing data from Excel
for fileNum = 1:numFiles
fileName = sprintf('myfile%02d.xlsx',fileNum);
myData{fileNum} = importfile3(fileName,sheet,range);
end
data = cell2mat(myData);
The actual data import is performed by importfile3 which is, for the most part, a wrapper for the xlsread function that returns a matrix corresponding to the specified range of excel data.
function data = importfile3(workbookFile, sheetName, range)
% If no sheet is specified, read first sheet
if nargin == 1 || isempty(sheetName)
sheetName = 1;
end
% If no range is specified, read all data
if nargin <= 2 || isempty(range)
range = '';
end
%% Import the data
[~, ~, raw] = xlsread(workbookFile, sheetName, range);
%% Replace non-numeric cells with 0.0
R = cellfun(#(x) ~isnumeric(x) || isnan(x),raw); % Find non-numeric cells
raw(R) = {0.0}; % Replace non-numeric cells
%% Create output variable
data = cell2mat(raw);
The issue that you are running in to is a result of cell2mat concatenating all of the data in your cells in to one large 2-dimensional matrix. If you were to import two excel files with 20 rows and 17 columns, each, this would result in a 2-dimensional matrix of size [20 x 34]. The doc for cell2mat has a nice visual describing this.
I see that your importfile3 function returns a matrix, and based on your use of cell2mat in your final line of code, it looks like you would like to have your final result be in the form of a matrix. So I think the easiest way to go about this is to just bypass the intermediate myData cell array.
In the example code below, the resulting data is a 3-dimensional matrix. The 1st dimension indicates row number, 2nd dimension is column number, and 3rd dimension is file number. Cell arrays are very useful for "jagged" data, but based on the code you provided, each excel data set that you import will have the same number of rows and columns.
numFiles = 2;
range = 'A2:Q21';
sheet = 1;
% Number of rows and cols known before data import
numRows = 20;
numCols = 17;
data = zeros(numRows,numCols,numFiles);
for fileNum = 1:numFiles
fileName = sprintf('myfile%02d.xlsx',fileNum);
data(:,:,fileNum) = importfile3(fileName,sheet,range);
end
Accessing this data is now very straight-forward.
data(:,:,1) returns the data imported from your first excel file.
data(:,:,2) returns the data imported from your second excel file.
etc.