How can I optimize machine learning hyperparameters to be reused in multiple models? - matlab

I have a number of datasets, to each of which I want to fit a Gaussian process regression model. The default hyperparameters selected by fitrgp seem subjectively to produce less-than-ideal models. Enabling hyperparameter optimisation tends to result in a meaningful improvement but occasionally produces extreme overfitted values and is a computationally hungry process which prohibits an optimization for every model anyway.
Since fitrgp simply wraps bayesopt for its hyperparameter optimization, is it possible to call bayesopt directly to minimize some aggregate of the loss for multiple models (say, the mean) rather than the loss for one model at a time?
For example, if each dataset is contained in a cell array of tables tbls, I want to find a single value for sigma which can be imposed in calls to fitrgp for each table:
gprMdls = cellfun(#(tbl) {fitrgp(tbl,'ResponseVarName', 'Sigma',sigma)}, tbls);
Where numel(tbls) == 1 the process would be equivalent to:
gprMdl = fitrgp(tbls{1},'ResponseVarName', 'OptimizeHyperparameters','auto');
sigma = gprMdl.Sigma;
but this implementation doesn't naturally extend to a result where a single Sigma value is optimized for multiple models.

I managed this in the end by directly intervening in the built-in optimization routines.
By placing a breakpoint at the start of bayesopt (via edit bayesopt) and calling fitrgp with a single input dataset, I was able to determine from the Function Call Stack that the objective function used by bayesopt is constructed with a call to classreg.learning.paramoptim.createObjFcn. I also captured and stored the remaining input arguments to bayesopt to ensure my function call would be exactly analagous to one constructed by fitrgp.
Placing a breakpoint at the start of classreg.learning.paramoptim.createObjFcn and making a fresh call to fitrgp I was able to capture and store the input arguments to this function, so I could then create objective functions for different tables of predictors.
For my cell array of tables tbls, and all other variables kept as named in the captured createObjFcn scope:
objFcns = cell(size(tbls));
for ii = 1:numel(tbls)
objFcn{ii} = classreg.learning.paramoptim.createObjFcn( ...
BOInfo, FitFunctionArgs, tbls{ii}, Response, ...
ValidationMethod, ValidationVal, Repartition, Verbose);
end
An overall objective function can then be constructed by taking the mean of the objective functions for each dataset:
objFcn = #(varargin) mean(cellfun(#(f) f(varargin{:}),objFcns));
I was then able to call bayesopt with this objFcn along with the remaining arguments captured from the original call. This produced a set of hyperparameters as required and they seem to perform well for all datasets.

Related

MATLAB - Use a user-defined class method in parallel on GPU

Let's say I have a class named Stack and it has a method that takes some other properties of Stack objects as input arguments which consist of vectors and scalars. The input vectors and output matrix change sizes depending on the object.
classdef Stack
...
...
function out = some_method(scalar1,scalar2,...,vec1,vec2...) % out has different number of
% columns depending on the size of vec1,vec2 etc.
...
out = ... % size of (4,m) where m changes for each object
end
This specific method of Stack should be called thousands of times to be used in another script for some other purpose. Since it takes time to use this method serially, a parallel solution will save a lot of time for bigger calculations.
I attempted to use cellfun by passing the indices of Stack objects inside a function hoping that I could run it on my GPU but cellfun doesn't support GPU computing. I also tried to use arrayfun but it doesn't support UniformOutput parameter on GPUs. I can't store different sizes of outputs inside a cell if I want to use my GPU.
My question is, is parallel computing on GPU possible for a class method/function that returns different sizes of outputs each time? If not, what could be a possible workaround for this problem?
Update: A short summary of my problem is; I would like to run the same function in parallel for each Stack object with their own inputs individually, knowing that their outputs don't match in sizes and collect them in a cell array at the end.

Clean methodology for running a function for a large set of input parameters (in Matlab)

I have a differential equation that's a function of around 30 constants. The differential equation is a system of (N^2+1) equations (where N is typically 4). Solving this system produces N^2+1 functions.
Often I want to see how the solution of the differential equation functionally depends on constants. For example, I might want to plot the maximum value of one of the output functions and see how that maximum changes for each solution of the differential equation as I linearly increase one of the input constants.
Is there a particularly clean method of doing this?
Right now I turn my differential-equation-solving script into a large function that returns an array of output functions. (Some of the inputs are vectors & matrices). For example:
for i = 1:N
[OutputArray1(i, :), OutputArray2(i, :), OutputArray3(i, :), OutputArray4(i, :), OutputArray5(i, :)] = DE_Simulation(Parameter1Array(i));
end
Here I loop through the function. The function solves a differential equation, and then returns the set of solution functions for that input parameter, and then each is appended as a row to a matrix.
There are a few issues I have with my method:
If I want to see the solution to the differential equation for a different parameter, I have to redefine the function so that it is an input of one of the thirty other parameters. For the sake of code readability, I cannot see myself explicitly writing all of the input parameters as individual inputs. (Although I've read that structures might be helpful here, but I'm not sure how that would be implemented.)
I typically get lost in parameter space and often have to update the same parameter across multiple scripts. I have a script that runs the differential-equation-solving function, and I have a second script that plots the set of simulated data. (And I will save the local variables to a file so that I can load them explicitly for plotting, but I often get lost figuring out which file is associated with what set of parameters). The remaining parameters that are not in the input of the function are inside the function itself. I've tried making the parameters global, but doing so drastically slows down the speed of my code. Additionally, some of the inputs are arrays I would like to plot and see before running the solver. (Some of the inputs are time-dependent boundary conditions, and I often want to see what they look like first.)
I'm trying to figure out a good method for me to keep track of everything. I'm trying to come up with a smart method of saving generated figures with a file tag that displays all the parameters associated with that figure. I can save such a file as a notepad file with a generic tagging-number that's listed in the title of the figure, but I feel like this is an awkward system. It's particularly awkward because it's not easy to see what's different about a long list of 30+ parameters.
Overall, I feel as though what I'm doing is fairly simple, yet I feel as though I don't have a good coding methodology and consequently end up wasting a lot of time saving almost-identical functions and scripts to solve fairly simple tasks.
It seems like what you really want here is something that deals with N-D arrays instead of splitting up the outputs.
If all of the OutputArray_ variables have the same number of rows, then the line
for i = 1:N
[OutputArray1(i, :), OutputArray2(i, :), OutputArray3(i, :), OutputArray4(i, :), OutputArray5(i, :)] = DE_Simulation(Parameter1Array(i));
end
seems to suggest that what you really want your function to return is an M x K array (where in this case, K = 5), and you want to pack that output into an M x K x N array. That is, it seems like you'd want to refactor your DE_Simulation to give you something like
for i = 1:N
OutputArray(:,:,i) = DE_Simulation(Parameter1Array(i));
end
If they aren't the same size, then a struct or a table is probably the best way to go, as you could assign to one element of the struct array per loop iteration or one row of the table per loop iteration (the table approach would assume that the size of the variables doesn't change from iteration to iteration).
If, for some reason, you really need to have these as separate outputs (and perhaps later as separate inputs), then what you probably want is a cell array. In that case you'd be able to deal with the variable number of inputs doing something like
for i = 1:N
[OutputArray{i, 1:K}] = DE_Simulation(Parameter1Array(i));
end
I hesitate to even write that, though, because this almost certainly seems like the wrong data structure for what you're trying to do.

Access/Index Array In Simulink

I have a 2D matrix/array in my model, as shown in image. I need to be able to index/access it randomly and pass it as a signal. How do I do this?
I can't use From File block, because the storage is forced to be double and too large for my embedded design.
It doesn't appear I can use From Workspace block...because this array is defined in my model as SoundArray.
This seem like it should be SO SIMPLE, but I just can’t figure it out. The only way I can think of doing it is in custom C code…which I don’t want to do.
Thanks
Array Definition and Model At Bottom
A matlab Function block (formerly EML-block) can pick up model workspace data if it is in "Parameter" scope and you define a Parameter input in the Function block. You could then use other inputs for controlling the random access, then return the desired position as a signal output from the Matlab function block.
function y = fcn(i,j,soundArray)
y = soundArray(i,j);
(Where soundArray is defined as a Parameter, and i and j are Inputs)
Edit:
Or define a Data Store Memory (add definition block). Then put a Data Store Read block for that memory which is routed to a selector block with 2 dimensions and "starting index (port)" for both those dimensions.
I believe you can use Model Workspace data to initialize the Data Store Memory, but I don't think that Model Workspace data is "live" during simulation.

data transferred in calling a user defined function

I need to analyze a big source code.Code contains several function calls.
Depending upon the computation and communication between function calls, i will need to figure out the best configuration scheme for the overall execution of the source code.
According to me ,
Data communicated in calling a function(if it is on different machine,server etc)=Input Data Size+Output Data Size
for getting the input data size and output data size ,i think i should rewrite all functions to have variable number of inputs and variables outputs.
[varargout] samplefunction(varargin) {
FOR i=0:nargin
inputdata=inputdata+sizeof(varargin(i));
% Do stuff here
}
isn't there a way to calculate size of cell array(varargin/varargout) directly in Matlab ?
or if u can suggest another approach to measure communicated data between function call ?
A call to cellfun() with string inputs would be very fast:
sizes = [cellfun('size',varargin,1); cellfun('size',varargin,2)];
or
lenghts = cellfun('length',varargin);
or
numels = cellfun('prodofsize',varargin);

matlab local static variable

In order to test an algorithm in different scenarios, in need to iteratively call a matlab function alg.m.
The bottleneck in alg.m is something like:
load large5Dmatrix.mat
small2Dmatrix=large5Dmatrix(:,:,i,j,k) % i,j and k change at every call of alg.m
clear large5Dmatrix
In order to speed up my tests, i would like to have large5Dmatrix loaded only at the first call of alg.m, and valid for future calls, possibly only within the scope of alg.m
Is there a way to acheve this in matlab other then setting large5Dmatrix as global?
Can you think of a better way to work with this large matrix of constant values within alg.m?
You can use persistent for static local variables:
function myfun(myargs)
persistent large5Dmatrix
if isempty(large5Dmatrix)
load large5Dmatrix.mat;
end
small2Dmatrix=large5Dmatrix(:,:,i,j,k) % i,j and k change at every call of alg.m
% ...
end
but since you're not changing large5Dmatrix, #High Performance Mark answer is better suited and has no computational implications. Unless you really, really don't want large5Dmatrix in the scope of the caller.
When you pass an array as an argument to a Matlab function the array is only copied if the function updates it, if the function only reads the array then no copy is made. So any performance penalty the function pays, in time and space, should only kick in if the function updates the large array.
I've never tested this with a recursive function but I don't immediately see why it should start copying the large array if it is only read from.
So your strategy would be to load the array outside the function, then pass it into the function as an argument.
This note may clarify.