Remove '#NA' from Matlab Table - matlab

I have the following code to clean table data of error terms:
errorTerms={'#NA', '#NA', 'ActiveX VT_ERROR: '};
inputData=readtable(inputFile,'TreatAsEmpty',errorTerms);
However '#NA' terms remain.
I can get rid of them in this way:
inputData.GICS1=strrep(inputData.GICS1,'#NA','NaN');
But this requires several independent loops as I have many tables of different sizes.
Is there a more elegant way to import this data as tables? Or clean it?
The data looks like this:
Id Avg GICS1
a 3.0 #NA
b 5.6 Consumer Staples
c 4.8 Materials
d 3.1 Health Care
e 1.6 Energy
f 9.3 #NA
g 8.5 Industrials
h 7.0 Consumer Discretionary

You can use varfun to go through your table columns and apply a regex to each column using regexrep and your errorTerms array:
inputData = readtable('test.xlsx');
errorTerms = {'#NA', '#NA', 'ActiveX VT_ERROR: '};
expression = sprintf('(%s)', strjoin(errorTerms, '|'));
% Explicit loop
varnames = inputData.Properties.VariableNames;
for ii = 1:length(varnames)
try
inputData.(varnames{ii}) = regexprep(inputData.(varnames{ii}), expression, 'NaN');
catch err
switch err.identifier
case 'MATLAB:UndefinedFunction'
% Do nothing, skip loop iteration
otherwise
rethrow(err)
end
end
end
% % Only works for string data
% varnames = inputData.Properties.VariableNames;
% inputData = varfun(#(x) regexprep(x, expression, 'NaN'), inputData);
% inputData.Properties.VariableNames = varnames; % Variable names overwritten by varfun, set them back
Edit: I have added a try/catch block to account for mixed data types in your columns. I will caveat that this is a fairly greedy implementation, a more robust method would be to compare the error message to make sure regexprep is what is causing the issue but I'm lazy.

Related

Matlab: read multiple files

my matlab script read several wav files contained in a folder.
Each read signal is saved in cell "mat" and each signal is saved in array. For example,
I have 3 wav files, I read these files and these signals are saved in arrays "a,b and c".
I want apply another function that has as input each signal (a, b and c) and the name of corresponding
file.
dirMask = '\myfolder\*.wav';
fileRoot = fileparts(dirMask);
Files=dir(dirMask);
N = natsortfiles({Files.name});
C = cell(size(N));
D = cell(size(N));
for k = 1:numel(N)
str =fullfile(fileRoot, Files(k).name);
[C{k},D{k}] = audioread(str);
mat = [C(:)];
fs = [D(:)];
a=mat{1};
b=mat{2};
c=mat{3};
myfunction(a,Files(1).name);
myfunction(b,Files(2).name);
myfunction(c,Files(3).name);
end
My script doesn't work because myfunction considers only the last Wav file contained in the folder, although
arrays a, b and c cointain the three different signal.
If I read only one wav file, the script works well. What's wrong in the for loop?
Like Cris noticed, you have some issues with how you structured your for loop. You are trying to use 'b', and 'c' before they are even given any data (in the second and third times through the loop). Assuming that you have a reason for structuring your program the way you do (I would rewrite the loop so that you do not use 'a','b', or 'c'. And just send 'myfunction' the appropriate index of 'mat') The following should work:
dirMask = '\myfolder\*.wav';
fileRoot = fileparts(dirMask);
Files=dir(dirMask);
N = natsortfiles({Files.name});
C = cell(size(N));
D = cell(size(N));
a = {};
b = {};
c = {};
for k = 1:numel(N)
str =fullfile(fileRoot, Files(k).name);
[C{k},D{k}] = audioread(str);
mat = [C(:)];
fs = [D(:)];
a=mat{1};
b=mat{2};
c=mat{3};
end
myfunction(a,Files(1).name);
myfunction(b,Files(2).name);
myfunction(c,Files(3).name);
EDIT
I wanted to take a moment to clarify what I meant by saying that I would not use the a, b, or c variables. Please note that I could be missing something in what you were asking so I might be explaining things you already know.
In a certain scenarios like this it is possible to articulate exactly how many variables you will be using. In your case, you know that you have exactly 3 audio files that you are going to process. So, variables a, b, and c can come out. Great, but what if you have to throw another audio file in? Now you need to go back in and add a 'd' variable and another call to 'myfunction'. There is a better way, that not only reduces complexity but also extends functionality to the program. See the following code:
%same as your code
dirMask = '\myfolder\*.wav';
fileRoot = fileparts(dirMask);
Files = dir(dirMask);
%slight variable name change, k->idx, slightly more meaningful.
%also removed N, simplifying things a little.
for idx = 1:numel(Files)
%meaningful variable name change str -> filepath.
filepath = fullfile(fileRoot, Files(idx).name);
%It was unclear if you were actually using the Fs component returned
%from the 'audioread' call. I wanted to make sure that we kept access
%to that data. Note that we have removed 'mat' and 'fs'. We can hold
%all of that data inside one variable, 'audio', which simplifies the
%program.
[audio{idx}.('data'), audio{idx}.('rate')] = audioread(filepath);
%this function call sends exactly the same data that your version did
%but note that we have to unpack it a little by adding the .('data').
myfunction(audio{idx}.('data'), Files(idx).name);
end

Make the basis of a function from nest loop outer components

I have a segment of code where a composition of nested loops needs to be run at various times; however, each time the operations within the nested loops are different. Is there a way to make the outer portion (loop composition) somehow a functional piece, so that the internal operations are variable. For example, below, two code blocks are shown which both use the same loop introduction, but have different purposes. According to the principle of DRY, how can I improve this, so as not to need to repeat myself each time a similar loop needs to be used?
% BLOCK 1
for a = 0:max(aVec)
for p = find(aVec'==a)
iDval = iDauVec{p};
switch numel(iDval)
case 2
r = rEqVec(iDval);
qVec(iDval(1)) = qVec(p) * (r(2)^0.5 / (r(1)^0.5 + r(2)^0.5));
qVec(iDval(2)) = qVec(p) - qVec(iDval(1));
case 1
qVec(iDval) = qVec(p);
end
end
end
% BLOCK 2
for gen = 0:max(genVec)-1
for p = find(genVec'==gen)
iDval = iDauVec{p};
QinitVec(iDval) = QinitVec(p)/numel(iDval);
end
end
You can write your loop structure as a function, which takes a function handle as one of its inputs. Within the loop structure, you can call this function to carry out your operation.
It looks as if the code inside the loop needs the values of p and iDval, and needs to assign to different elements of a vector variable in the workspace. In that case a suitable function definition might be something like this:
function vec = applyFunctionInLoop(aVec, vec, iDauVec, funcToApply)
for a = 0:max(aVec)
for p = find(aVec'==a)
iDval = iDauVec{p};
vec = funcToApply(vec, iDval, p);
end
end
end
You would need to put the code for each different operation you want to carry out in this way into a function with suitable input and output arguments:
function qvec = myFunc1(qVec, iDval, p)
switch numel(iDval)
case 2
r = rEqVec(iDval); % see note
qVec(iDval(1)) = qVec(p) * (r(2)^0.5 / (r(1)^0.5 + r(2)^0.5));
qVec(iDval(2)) = qVec(p) - qVec(iDval(1));
case 1
qVec(iDval) = qVec(p);
end
end
function v = myFunc2(v, ix, q)
v(ix) = v(q)/numel(ix);
end
Now you can use your loop structure to apply each function:
qvec = applyFunctionInLoop(aVec, qVec, iDauVec, myFunc1);
QinitVec = applyFunctionInLoop(aVec, QinitVec, iDauVec, myFunc2);
and so on.
In most of the answer I've kept to the same variable names you used in your question, but in the definition of myFunc2 I've changed the names to emphasise that these variables are local to the function definition - the function is not operating on the variables you passed in to it, but on the values of those variables, which is why we have to pass the final value of the vector out again.
Note that if you want to use the values of other variables in your functions, such as rEqVec in myFunc1, you need to think about whether those variables will be available in the function's workspace. I recommend reading these help pages on the Mathworks site:
Share Data Between Workspaces
Dynamic Function Creation with Anonymous and Nested Functions

Dynamically check for existence of structure field name with hierarchy

As a follow-up to my previous question about how to assign fields to a structure variable with a dynamic hierarchy, I would now like to be able to query those fields with isfield. However, isfield will only take one argument, not a list as with setfield.
To summarize my problem:
I have a function that organizes data into a structure variable. Depending on certain flags, the data is saved into the substructures with a different number of levels.
For instance, the accepted answer to my previous question has me doing this to build my structure:
foo = struct();
% Pick one...
true_false_statement = true;
% true_false_statement = false;
if true_false_statement
extra_level = {};
else
extra_level = {'baz'};
end
foo = setfield(foo, extra_level{:}, 'bar1', 1);
which gives me foo.bar1 = 1 if true_false_statement is true, and foo.baz.bar1 = 1 otherwise.
Now I want to test for the existence of the field (for instance to pre-allocate an array). If I do this:
if ~isfield(foo, extra_levels{:}, 'bar1')
foo = setfield(foo, extra_level{:}, 'bar1', zeros(1,100));
end
I get an error because isfield will only accept two arguments.
The best I've been able to come up with is to write a separate function with a try...catch block.
function tf = isfield_dyn(structure_variable, intervening_levels, field)
try
getfield(structure_variable, intervening_levels{:}, field);
tf = true;
catch err
if strcmpi(err.identifier, 'MATLAB:nonExistentField')
tf = false;
else
rethrow(err);
end
end
As mentioned below in the comments, this is a hacky hack way to do this, and it doesn't even work all that well.
Is there a more elegant built-in way to do this, or some other more robust way to write a custom function to do this?
You might find the private utility functions getsubfield, setsubfield, rmsubfield, and issubfield from the FieldTrip toolbox very handy. From the documentation of getsubfield:
% GETSUBFIELD returns a field from a structure just like the standard
% GETFIELD function, except that you can also specify nested fields
% using a '.' in the fieldname. The nesting can be arbitrary deep.
%
% Use as
% f = getsubfield(s, 'fieldname')
% or as
% f = getsubfield(s, 'fieldname.subfieldname')
%
% See also GETFIELD, ISSUBFIELD, SETSUBFIELD
I am somewhat confused because
isfield(foo, 'bar1')
isfield(foo, 'baz')
seem to work just fine on your example struct.
Of course, if you want to test more fields, just write a loop over those fieldnames and test them one by one. That may not look vectorized, but is definitely better than abusing a try-catch block to guide your flow.

Workaround for dynamically generating a structure - Matlab/Octave issues

I have many, many sets of data in .csv format that I've organized by a file name standard so I can use regular expressions for the second time ever. I have, however, run into a slight problem. My data files are titled things like, "2012001_C335_2000MHZ_P_1111.CSV". There are four years of interest, two frequencies, and four different C335-style labels to describe locations. I have a significant amount of data processing to do on each of these files, so I'd like to read them all into one giant structure and then do my processing on different parts of it. I'm writing:
for ix_id = 1:length(ids)
for ix_years = 1:2:length(ids_years{ix_id})
for ix_frq = 1:length(frqs)
st = [ids_years{ix_id}{ix_year} '_' ids{ix_id} '_' ids_frqs{ix_id}{ix_frq}'_P_1111.CSV'];
data.(ids_frqs{ix_id}{ix_frq}).(ids{ix_id}).(['Y' ids_years{ix_id}{ix_year}]) =...
dlmread(st);
end
end
end
All ids variables are 1x4 cell arrays where each cell contains strings.
This produces the errors:
"Error: a cs-list cannot be further indexed"
and
"Error: invalid assignment to cs-list outside multiple assignment"
I did an internet search for these errors and found a few posts with dates ranging from 2010 to 2012, such as this one and this one, where the author suggests it's a problem with Octave itself. I can do a workaround that involves defining two separate structures by removing the innermost for loop over ix_frq and replacing the lines beginning with "st" and "data" with
data.1500.(ids{ix_id}).(['Y' ids_years{ix_id}{ix_year}]) = ...
dlmread([ids_years{ix_id}{ix_year} '_' ids{ix_id} '_' ids{ix_id} '_1500MHZ_P_1111.CSV']);
data.2000.(ids{ix_id}).(['Y' ids_years{ix_id}{ix_year}]) = ...
dlmread([ids_years{ix_id}{ix_year} '_' ids{ix_id} '_' ids{ix_id} '_2000MHZ_P_1111.CSV']);
so it seems that the trouble arises when I try to make a more nested structure. I'm wondering if this is unique to Octave or the same in Matlab, and also if there's a slicker workaround than defining two separate structures since I'd like this to be as portable as possible. If you have any insight as to the meaning of the error message, I'm interested in that too. Thanks!
EDIT: Here is the full script - now generates a few dummy .csv files. Runs on Octave v. 3.8
clear all
%this program tests the creation of various structures. The end goal is to have a structure of the format frequency.beamname.year(1) = matrix of the appropriate file
A = rand(3,2);
csvwrite('2009103_C115_1500MHZ.CSV',A)
csvwrite('2009103_C115_2000MHZ.CSV',A)
csvwrite('2010087_C115_1500MHZ.CSV',A)
csvwrite('2010087_C115_2000MHZ.CSV',A)
csvwrite('2009103_C335_1500MHZ.CSV',A)
csvwrite('2009103_C335_2000MHZ.CSV',A)
csvwrite('2010087_C335_1500MHZ.CSV',A)
csvwrite('2010087_C335_2000MHZ.CSV',A)
data = dir('*.CSV'); %imports all of the files of a directory
files = {data.name}; %cell array of filenames
nfiles = numel(files);
%find all the years
years = unique(cellfun(#(x)x{1},regexp(files,'\d{7}','match'),'UniformOutput',false));
%find all the beam names
ids = unique(cellfun(#(x)x{1},regexp(files,'([C-I]\d{3})|([C-I]\d{1}[C-I]\d{2})','match'),'UniformOutput',false));
%find all the frequencies
frqs = unique(cellfun(#(x)x{1},regexp(files,'\d{4}MHZ','match'),'UniformOutput',false));
%now, vectorize to cover all the beams
for id_ix = 1:length(ids)
expression_yrs = ['(\d{7})(?=_' ids{id_ix} ')'];
listl_yrs = regexp(files,expression_yrs,'match');
ids_years{id_ix} = unique(cellfun(#(x)x{1},listl_yrs(cellfun(#(x)~isempty(x),listl_yrs)),'UniformOutput',false)); %returns the years for data collected with both the 1500 and 2000 MHZ antennas along each of thebeams
expression_frqs = ['(?<=' ids{id_ix} '_)(\d{4}MHZ)'];
listfrq = regexp(files,expression_frqs,'match'); %finds every frequency that was collected for C115, C335
ids_frqs{id_ix} = unique(cellfun(#(x)x{1},listfrq(cellfun(#(x)~isempty(x),listfrq)),'UniformOutput',false));
end
%% finally, dynamically generate a structure data.Beam.Year.Frequency
%this works
for ix_id = 1:length(ids)
for ix_year = 1:length(ids_years{ix_id})
data1500.(ids{ix_id}).(['Y' ids_years{ix_id}{ix_year}])=dlmread([ids_years{ix_id}{ix_year} '_' ids{ix_id} '_' ids_frqs{1}{1} '.CSV']);
data2000.(ids{ix_id}).(['Y' ids_years{ix_id}{ix_year}])=dlmread([ids_years{ix_id}{ix_year} '_' ids{ix_id} '_' ids_frqs{1}{2} '.CSV']);
end
end
%this doesn't work
for ix_id=1:length(ids)
for ix_year=1:length(ids_years{ix_id})
for ix_frq = 1:numel(frqs)
data.(['F' ids_frqs{ix_id}{ix_frq}]).(ids{ix_id}).(['Y' ids_years{ix_id}{ix_year}])=dlmread([ids_years{ix_id}{ix_year} '_' ids{ix_id} '_' ids_frqs{ix_id}{ix_frq} '.CSV']);
end
end
end
Hopefully, that helps clarify the question - I am not sure of the etiquette here with posting edits and code.
The problem is that when you get to the for loop that is causing a problem, data already exists and is a struct array.
octave> data
data =
8x1 struct array containing the fields:
name
date
bytes
isdir
datenum
statinfo
When you select a field from a struct array you will get a cs-list (comma-separate list) unless you also index which of the structs in the struct array. See:
octave> data.name
ans = 2009103_C115_1500MHZ.CSV
ans = 2009103_C115_2000MHZ.CSV
ans = 2009103_C335_1500MHZ.CSV
ans = 2009103_C335_2000MHZ.CSV
ans = 2010087_C115_1500MHZ.CSV
ans = 2010087_C115_2000MHZ.CSV
ans = 2010087_C335_1500MHZ.CSV
ans = 2010087_C335_2000MHZ.CSV
octave> data(1).name
ans = 2009103_C115_1500MHZ.CSV
So when you do:
data.(...) = dlmread (...);
you don't get what you were expecting on the left hand side, you will get a cs-list. But I'm guessing this is accidental, since data at the moment only has filenames, so simply create a new empty struct:
data = struct (); # this will clear your previous data
for ix_id=1:length(ids)
for ix_year=1:length(ids_years{ix_id})
for ix_frq = 1:numel(frqs)
data.(['F' ids_frqs{ix_id}{ix_frq}]).(ids{ix_id}).(['Y' ids_years{ix_id}{ix_year}])=dlmread([ids_years{ix_id}{ix_year} '_' ids{ix_id} '_' ids_frqs{ix_id}{ix_frq} '.CSV']);
end
end
end
I would also recommend to think better about your current solution. This code looks overcomplicated to me.

Using a string to refer to a structure array - matlab

I am trying to take the averages of a pretty large set of data, so i have created a function to do exactly that.
The data is stored in some struct1.struct2.data(:,column)
there are 4 struct1 and each of these have between 20 and 30 sub-struct2
the data that I want to average is always stored in column 7 and I want to output the average of each struct2.data(:,column) into a 2xN array/double (column 1 of this output is a reference to each sub-struct2 column 2 is the average)
The omly problem is, I can't find a way (lots and lots of reading) to point at each structure properly. I am using a string to refer to the structures, but I get error Attempt to reference field of non-structure array. So clearly it doesn't like this. Here is what I used. (excuse the inelegence)
function [avrg] = Takemean(prefix,numslits)
% place holder arrays
avs = [];
slits = [];
% iterate over the sub-struct (struct2)
for currslit=1:numslits
dataname = sprintf('%s_slit_%02d',prefix,currslit);
% slap the average and slit ID on the end
avs(end+1) = mean(prefix.dataname.data(:,7));
slits(end+1) = currslit;
end
% transpose the arrays
avs = avs';
slits = slits';
avrg = cat(2,slits,avs); % slap them together
It falls over at this line avs(end+1) = mean(prefix.dataname.data,7); because as you can see, prefix and dataname are strings. So, after hunting around I tried making these strings variables with genvarname() still no luck!
I have spent hours on what should have been 5min of coding. :'(
Edit: Oh prefix is a string e.g. 'Hs' and the structure of the structures (lol) is e.g. Hs.Hs_slit_XX.data() where XX is e.g. 01,02,...27
Edit: If I just run mean(Hs.Hs_slit_01.data(:,7)) it works fine... but then I cant iterate over all of the _slit_XX
If you simply want to iterate over the fields with the name pattern <something>_slit_<something>, you need neither the prefix string nor numslits for this. Pass the actual structure to your function, extract the desired fields and then itereate them:
function avrg = Takemean(s)
%// Extract only the "_slit_" fields
names = fieldnames(s);
names = names(~cellfun('isempty', strfind(names, '_slit_')));
%// Iterate over fields and calculate means
avrg = zeros(numel(names), 2);
for k = 1:numel(names)
avrg(k, :) = [k, mean(s.(names{k}).data(:, 7))];
end
This method uses dynamic field referencing to access fields in structs using strings.
First of all, think twice before you use string construction to access variables.
If you really really need it, here is how it can be used:
a.b=123;
s1 = 'a';
s2 = 'b';
eval([s1 '.' s2])
In your case probably something like:
Hs.Hs_slit_01.data= rand(3,7);
avs = [];
dataname = 'Hs_slit_01';
prefix = 'Hs';
eval(['avs(end+1) = mean(' prefix '.' dataname '.data(:,7))'])