Matlab loop through functions using an array in a for loop - matlab

I am writing a code to do some very simple descriptive statistics, but I found myself being very repetitive with my syntax.
I know there's a way to shorten this code and make it more elegant and time efficient with something like a for-loop, but I am not quite keen enough in coding (yet) to know how to do this...
I have three variables, or groups (All data, condition 1, and condition 2). I also have 8 matlab functions that I need to perform on each of the three groups (e.g mean, median). I am saving all of the data in a table where each column corresponds to one of the functions (e.g. mean) and each row is that function performed on the correspnding group (e.g. (1,1) is mean of 'all data', (2,1) is mean of 'cond 1', and (3,1) is mean of 'cond 2'). It is important to preserve this structure as I am outputting to a csv file that I can open in excel. The columns, again, are labeled according the function, and the rows are ordered by 1) all data 2) cond 1, and 3) cond 2.
The data I am working with is in the second column of these matrices, by the way.
So here is the tedious way I am accomplishing this:
x = cell(3,8);
x{1,1} = mean(alldata(:,2));
x{2,1} = mean(cond1data(:,2));
x{3,1} = mean(cond2data(:,2));
x{1,2} = median(alldata(:,2));
x{2,2} = median(cond1data(:,2));
x{3,2} = median(cond2data(:,2));
x{1,3} = std(alldata(:,2));
x{2,3} = std(cond1data(:,2));
x{3,3} = std(cond2data(:,2));
x{1,4} = var(alldata(:,2)); % variance
x{2,4} = var(cond1data(:,2));
x{3,4} = var(cond2data(:,2));
x{1,5} = range(alldata(:,2));
x{2,5} = range(cond1data(:,2));
x{3,5} = range(cond2data(:,2));
x{1,6} = iqr(alldata(:,2)); % inter quartile range
x{2,6} = iqr(cond1data(:,2));
x{3,6} = iqr(cond2data(:,2));
x{1,7} = skewness(alldata(:,2));
x{2,7} = skewness(cond1data(:,2));
x{3,7} = skewness(cond2data(:,2));
x{1,8} = kurtosis(alldata(:,2));
x{2,8} = kurtosis(cond1data(:,2));
x{3,8} = kurtosis(cond2data(:,2));
% write output to .csv file using cell to table conversion
T = cell2table(x, 'VariableNames',{'mean', 'median', 'stddev', 'variance', 'range', 'IQR', 'skewness', 'kurtosis'});
writetable(T,'descriptivestats.csv')
I know there is a way to loop through this stuff and get the same output in a much shorter code. I tried to write a for-loop but I am just confusing myself and not sure how to do this. I'll include it anyway so maybe you can get an idea of what I'm trying to do.
x = cell(3,8);
data = [alldata, cond2data, cond2data];
dfunction = ['mean', 'median', 'std', 'var', 'range', 'iqr', 'skewness', 'kurtosis'];
for i = 1:8,
for y = 1:3
x{y,i} = dfucntion(i)(data(1)(:,2));
x{y+1,i} = dfunction(i)(data(2)(:,2));
x{y+2,i} = dfunction(i)(data(3)(:,2));
end
end
T = cell2table(x, 'VariableNames',{'mean', 'median', 'stddev', 'variance', 'range', 'IQR', 'skewness', 'kurtosis'});
writetable(T,'descriptivestats.csv')
Any ideas on how to make this work??

You want to use a cell array of function handles. The easiest way to do that is to use the # operator, as in
dfunctions = {#mean, #median, #std, #var, #range, #iqr, #skewness, #kurtosis};
Also, you want to combine your three data variables into one variable, to make it easier to iterate over them. There are two choices I can see. If your data variables are all M-by-2 in dimension, you could concatenate them into a M-by-2-by-3 three-dimensional array. You could do that with
data = cat(3, alldata, cond1data, cond2data);
The indexing expression into data that retrieves the values you want would be data(:, 2, y). That said, I think this approach would have to copy a lot of data around and probably isn't the best for performance. The other way to combine data together is in 1-by-3 cell array, like this:
data = {alldata, cond1data, cond2data};
The indexing expression into data that retrieves the values you want in this case would be data{y}(:, 2).
Since you are looping from y == 1 to y == 3, you only need one line in your inner loop body, not three.
for y = 1:3
x{y, i} = dfunctions{i}(data{y}(:,2));
end
Finally, to get the cell array of strings containing function names to pass to cell2table, you can use cellfun to apply func2str to each element of dfunctions:
funcnames = cellfun(#func2str, dfunctions, 'UniformOutput', false);
The final version looks like this:
dfunctions = {#mean, #median, #std, #var, #range, #iqr, #skewness, #kurtosis};
data = {alldata, cond1data, cond2data};
x = cell(length(data), length(dfunctions));
for i = 1:length(dfunctions)
for y = 1:length(data)
x{y, i} = dfunctions{i}(data{y}(:,2));
end
end
funcnames = cellfun(#func2str, dfunctions, 'UniformOutput', false);
T = cell2table(x, 'VariableNames', funcnames);
writetable(T,'descriptivestats.csv');

You can create a cell array of functions using str2func :
function_string = {'mean', 'median', 'std', 'var', 'range', 'iqr', 'skewness', 'kurtosis'};
dfunction = {};
for ii = 1:length(function_string)
fun{ii} = str2func(function_string{ii})
end
Then you can use it on your data as you'd like to :
for ii = 1:8,
for y = 1:3
x{y,i} = dfucntion{ii}(data(1)(:,2));
x{y+1,i} = dfunction{ii}(data(2)(:,2));
x{y+2,i} = dfunction{ii}(data(3)(:,2));
end
end

Related

All possible combinations of many parameters MATLAB

I have a list of parameters and I need to evaluate my method over this list. Right now, I am doing it this way
% Parameters
params.corrAs = {'objective', 'constraint'};
params.size = {'small', 'medium', 'large'};
params.density = {'uniform', 'non-uniform'};
params.k = {3,4,5,6};
params.constraintP = {'identity', 'none'};
params.Npoints_perJ = {2, 3};
params.sampling = {'hks', 'fps'};
% Select the current parameter
for corrAs_iter = params.corrAs
for size_iter = params.size
for density_iter = params.density
for k_iter = params.k
for constraintP_iter = params.constraintP
for Npoints_perJ_iter = params.Npoints_perJ
for sampling_iter = params.sampling
currentParam.corrAs = corrAs_iter;
currentParam.size = size_iter;
currentParam.density = density_iter;
currentParam.k = k_iter;
currentParam.constraintP = constraintP_iter;
currentParam.Npoints_perJ = Npoints_perJ_iter;
currentParam.sampling = sampling_iter;
evaluateMethod(currentParam);
end
end
end
end
end
end
end
I know it looks ugly and if I want to add a new type of parameter, I have to write another for loop. Is there any way, I can vectorize this? Or maybe use 2 for loops instead of so many.
I tried the following but, it doesn't result in what I need.
for i = 1:numel(fields)
% if isempty(params.(fields{i}))
param.(fields{i}) = params.(fields{i})(1);
params.(fields{i})(1) = [];
end
What you need is all combinations of your input parameters. Unfortunately, as you add more parameters the storage requirements will grow quickly (and you'll have to use a large indexing matrix).
Instead, here is an idea which uses linear indicies of a (never created) n1*n2*...*nm matrix, where ni is the number of elements in each field, for m fields.
It is flexible enough to cope with any amount of fields being added to params. Not performance tested, although as with any "all combinations" operation you should be wary of the non-linear increase in computation time as you add more fields to params, note prod(sz)!
The code I've shown is fast, but the performance will depend entirely on which operations you do in the loop.
% Add parameters here
params.corrAs = {'objective', 'constraint'};
params.size = {'small', 'medium', 'large'};
params.density = {'uniform', 'non-uniform'};
% Setup
f = fieldnames( params );
nf = numel(f);
sz = NaN( nf, 1 );
% Loop over all parameters to get sizes
for jj = 1:nf
sz(jj) = numel( params.(f{jj}) );
end
% Loop for every combination of parameters
idx = cell(1,nf);
for ii = 1:prod(sz)
% Use ind2sub to switch from a linear index to the combination set
[idx{:}] = ind2sub( sz, ii );
% Create currentParam from the combination indices
currentParam = struct();
for jj = 1:nf
currentParam.(f{jj}) = params.(f{jj}){idx{jj}};
end
% Do something with currentParam here
% ...
end
Asides:
I'm using dynamic field name references for indexing the fields
I'm passing multiple outputs into a cell array from ind2sub, so you can handle a variable number of field names when ind2sub has one output for each dimension (or field in this use-case).
Here is a vectorized solution :
names = fieldnames(params).';
paramGrid = cell(1,numel(names));
cp = struct2cell(params);
[paramGrid{:}] = ndgrid(cp{:});
ng = [names;paramGrid];
st = struct(ng{:});
for param = st(:).'
currentParam = param;
end
Instead of nested loops we can use ndgrid to create the cartesian product of the cell entries so we can find all combinations of cell entries without loop.

Sorting several functions in one function

I have a function where I'm receiving input data and other data from four random sources. This function must be repeated for 12 times and 12 should be set so that. This function should also be repeated 10 times. Is there a more compact way to perform what I'm doing below?
for ii=1:10
Percent=0.7;
num_points1 = size(X_1,1);
split_point1 = round(num_points1*Percent);
sequence1 = randperm(num_points1);
X1_train{ii} = X_1(sequence1(1:split_point1),:);
Y1_train{ii} = Y_1(sequence1(1:split_point1));
X1_test{ii} = X_1(sequence1(split_point1+1:end),:);
Y1_test{ii}= Y_1(sequence1(split_point1+1:end));
num_points2 = size(X_2,1);
split_point2 = round(num_points2*Percent);
sequence2 = randperm(num_points2);
X2_train{ii} = X_2(sequence2(1:split_point2),:);
Y2_train{ii} = Y_2(sequence2(1:split_point2));
X2_test{ii} = X_2(sequence2(split_point2+1:end),:);
Y2_test{ii}= Y_2(sequence2(split_point2+1:end));
.
.
.
.
num_points12 = size(X_12,1);
split_point12 = round(num_points12*Percent);
sequence12 = randperm(num_points12);
X12_train{ii} = X_12(sequence12(1:split_point12),:);
Y12_train{ii} = Y_12(sequence12(1:split_point12));
X12_test{ii} = X_12(sequence12(split_point12+1:end),:);
Y12_test{ii}= Y_12(sequence12(split_point12+1:end));
end
The biggest problem you have currently is that you have 12 separate variables to do 12 related operations. Don't do that. Consolidate all of the variables into one container then iterate over the container.
I have the following suggestions for you:
Combine X_1, X_2, ... X_12 into one container. A cell array or structure may be prudent to use here. I'm going to use cell arrays in this case as your code currently employs them and it's probably the easiest thing for you to transition to.
Create four master cell arrays for the training and test set data and labels and within each cell array are nested cell arrays that contain each trial.
Loop over the cell array created in step #1 and assign the results to each of the four master cell arrays.
Therefore, something like this comes to mind:
X = {X_1, X_2, X_3, X_4, X_5, X_6, X_7, X_8, X_9, X_10, X_11, X_12};
Y = {Y_1, Y_2, Y_3, Y_4, Y_5, Y_6, Y_7, Y_8, Y_9, Y_10, Y_11, Y_12};
N = numel(X);
num_iterations = 10;
X_train = cell(1, num_iterations);
Y_train = cell(1, num_iterations);
X_test = cell(1, num_iterations);
Y_test = cell(1, num_iterations);
Percent = 0.7;
for ii = 1 : num_iterations
for jj = 1 : N
vals = X{jj};
labels = Y{jj};
num_points = size(vals,1);
split_point = round(num_points*Percent);
sequence = randperm(num_points);
X_train{ii}{jj} = vals(sequence(1:split_point),:);
Y_train{ii}{jj} = labels(sequence(1:split_point));
X_test{ii}{jj} = vals(sequence(split_point+1:end),:);
Y_test{ii}{jj} = labels(sequence(split_point+1:end));
end
end
As such, to access the training data for a particular iteration, you would do:
data = X_train{ii};
ii is the iteration you want to access. data would now be a cell array, so if you want to access the training data for a particular group, you would now do:
group = data{jj};
jj is the group you want to access. However, you can combine this into one step by:
group = X_train{ii}{jj};
You'll see this syntax in various parts of the code I wrote above. You'd do the same for the other data in your code (X_test, Y_train, Y_test).
I think you'll agree that this is more compact and to the point.

Looping a process, outputting numerically labelled variables each time

I have about 50 different arrays and I want to perform the following operation on all of them:
data1(isnan(data1)) = 0;
coldata1 = nonzeros(data1);
avgdata1 = mean(coldata1);
and so on for data2, data3 etc... the goal being to turn data1 into a vector without NaNs and then take a mean, saving the vector and the mean into coldata1 and avgdata1.
I'm looking for a way to automate this for all 50, rather than copy it 50 times and change the numbers... any ideas? I've been playing with eval but no luck so far. Also tried:
for y = 1:50
data(y)(isnan(data(y))) = 0;
coldata(y) = nonzeros(data(y));
avgdata(y) = mean(coldata(y));
end
You can do it with eval but really should not. Rather use a cell array as suggested here: Create variables with names from strings
i.e.
for y = 1:50
data{y}(isnan(data{y})) = 0;
coldata{y} = nonzeros(data{y});
avgdata{y} = mean(coldata{y});
end
Also read How can I create variables A1, A2,...,A10 in a loop? for alternative options.

How to use function return values as matrix rows?

I was trying to plot function return values, one based on another. My function definition is:
function [final_speed, voltage] = find_final_speed(simulink_output)
As you can see, it returns two variables. I need a matrix that looks like this:
final_speed_1 voltage_1
final_speed_2 voltage_1
final_speed_3 voltage_1
final_speed_4 voltage_1
final_speed_5 voltage_1
In the end, voltages should be plotted on X axis, speeds on Y axis.
I originally tried this:
speedpervoltage = [find_final_speed(DATA_1); find_final_speed(DATA_2); ... ];
But that would only result in this matrix, all voltage info gone:
final_speed_1
final_speed_2
...
After all google searches and attempts failed, I did this:
[s1 v1] = find_final_speed(DATA_1);
[s2 v2] = find_final_speed(DATA_2);
[s... v...] = find_final_speed(DATA_...);
speedpervoltage = [0 0;s1 v1;s2 v2;s... v....;];
% Just contains the figure call along with graph properties.
plot_speedpervoltage(speedpervoltage);
This is really not optimal or practical solution. How can I do this more dynamically? Ideally, I'd like to have function create_speedpervoltage which would take array of data matrixes as argument:
plot_speedpervoltage(create_speedpervoltage([DATA_1 DATA_2 ...]));
if you know how many datasets you have, you encapsulate everything in a for loop like this:
Data = [DATA_1, DATA_2,....DATA_N] ;
outMat = [] ;
for i = 1 : length (Data)
[s v] = find_final_speed(Data(i));
outMat = [outMat ; s,v]
end
There is an easy way of doing this in Matlab. This answer is different from User1551892's answer because it doesn't dynamically reallocate the variable thus resulting in faster performance. The code is as follows.
% Declare Return Vectors
final_speed = zeros(20,1);
voltage = zeros(20,1);
% Loop through each data point
for i = 1: length( data )
[final_speed(i,:),voltage(i,:)] = find_final_speed( data(i) );
end
Now this assumes that data is a vector with each element corresponding to final and voltage speeds.
EDIT:
Another method to improve the speed even more is using arrayfun. Assuming your data is 1D, by feeding in the function as a handle into arrayfun, you can replace the 3 line loop with this line and preallocation with this code, which should give you even better performance and less lines.
[final_speed,voltage] = arrayfun( #find_final_speed, data );
Here is a solution with cellfun.
[s, v] = cellfun(#find_final_speed, [{DATA_1}, {DATA_2},... {DATA_N}]);
speedpervoltage = [s(:) v(:)];

Most efficient way to store numbers as strings inside a table

I want to store efficiently some numbers as strings (with different lengths) into a table. This is my code:
% Table with numbers
n = 5;
m = 5;
T_numb = array2table((rand(n,m)));
% I create a table with empty cells (to store strings)
T_string = array2table(cell(n,m));
for i = 1:height(T_numb)
for ii = 1:width(T_numb)
T_string{i,ii} = cellstr(num2str(T_numb{i,ii}, '%.2f'));
end
end
What could I do to improve it? Thank you.
I don't have access to the function cell2table right now, but using the undocumented function sprintfc might work well here (check here for details).
For instance:
%// 2D array
a = magic(5)
b = sprintfc('%0.2f',a)
generates a cell array like this:
b =
'17.00' '24.00' '1.00' '8.00' '15.00'
'23.00' '5.00' '7.00' '14.00' '16.00'
'4.00' '6.00' '13.00' '20.00' '22.00'
'10.00' '12.00' '19.00' '21.00' '3.00'
'11.00' '18.00' '25.00' '2.00' '9.00'
which you can convert to a table using cell2table.
So in 1 line:
YourTable = cell2table(sprintfc('%0.2f',a))
This seems to be quite fast -
T_string = cell2table(reshape(strtrim(cellstr(num2str(A(:),'%.2f'))),size(A)))
Or with regexprep to replace strtrim -
cell2table(reshape(regexprep(cellstr(num2str(A(:),'%.2f')),'\s*',''),size(A)))
Here, A is the 2D input numeric array.