Generating Data Set in Matlab - matlab

I wanted to ask how to generate a data set in Matlab. I need it to test Feature Selection Algorithms on high dimensional data... The data set should be synthetic, multivariate and contain INTERACTING features.
Synthetic data sets like the MONKS problem is available on http://archive.ics.uci.edu/ml/datasets/MONK%27s+Problems .... unfortunately I have no clue how to visualize/generate and modify the data according to my need. The goal is to run an algorithm which detects interacting features.
I will be very thankful for a kind reply.

I'm not sure this is what you are looking for, but if I needed to do this, I would start by generating anonymous functions and generic variable names that I could apply randomly within a dataset.
For example, you could generate a dataset:
myData = rand(100,6);
and create a few functions which include interdependencies
interact = #(x) x*x;
interact2 = #(x) x*(x-1);
then create a random logical distribution
y = round(rand(100,1)); %(100 rows of random 0's or 1's)
go through the dataset and use the interact function on only rows where y is true
dataset(y == 1,:) = interact(dataset(y==1,:));
repeat the above with the other interaction functions you define if you desire. it would probably be useful to do this so that you can avoid row dependencies (see below) so generating a few datasets could be in order, i.e.
dataset2(y==1,:) = interact2(dataset(y==1,:));
A similar approach might be taken with variables (in the example set it shows some categorical variables).
myVariable = repmat('data', 100, 1);
listofvariables = genvarname(cellstr(myVariable));
y = round(rand(100,1)); % logical index for the data
randomly select a generic variable to repeat
applyvar = round(rand(1,1)*100);
selectedVariable = listofvariables(applyvar);
replace indices of the variable list with your repeated variable
listofvariables(y == 1) = selectedVariable;
put together the dataset(s) in some order of your choosing
[cellstr(num2str(dataset(:,1))) listofvariables cellstr(num2str(dataset(:,2)) cellstr(num2str(dataset2(:,2))]

Related

Applying (with as few loops as possible) a function to given elements/voxels (x,y,z) taken from subfields of multiple structs (nifti's) in MATLAB?

I have a dataset of n nifti (.nii) images. Ideally, I'd like to be able to get the value of the same voxel/element from each image, and apply a function to the n data points. I'd like to do this for each voxel/element across the whole image, so that I can reconvert the result back into .nii format.
I've used the Tools for NIfTI and ANALYZE image toolbox to load my images:
data(1)=load_nii('C:\file1.nii');
data(2)=load_nii('C:\file2.nii');
...
data(n)=load_nii('C:\filen.nii');
From which I obtain a struct object with each sub-field containing one loaded nifti. Each of these has a subfield 'img' corresponding to the image data I want to work on. The problem comes from trying to select a given xyz within each img field of data(1) to data(n). As I discovered, it isn't possible to select in this way:
data(:).img(x,y,z)
or
data(1:n).img(x,y,z)
because matlab doesn't support it. The contents of the first brackets have to be scalar for the call to work. The solution from googling around seems to be a loop that creates a temporary variable:
for z = 1:nz
for x = 1:nx
for y = 1:ny
for i=1:n;
points(i)=data(i).img(x,y,z);
end
[p1(x,y,z,:),~,p2(x,y,z)] = fit_data(a,points,b);
end
end
end
which works, but takes too long (several days) for a single set of images given the size of nx, ny, nz (several hundred each).
I've been looking for a solution to speed up the code, which I believe depends on removing those loops by vectorisation, preselecting the img fields (via getfield ?)and concatenating them, and applying something like arrayfun/cellfun/structfun, but i'm frankly a bit lost on how to do it. I can only think of ways to pre-select which themselves require loops, which seems to defeat the purpose of the exercise (though a solution with fewer loops, or fewer nested loops at least, might do it), or fun into the same problem that calls like data(:).img(x,y,z) dont work. googling around again is throwing up ways to select and concatenate fields within a struct, or a given field across multiple structs. But I can't find anything for my problem: select an element from a non-scalar sub-field in a sub-struct of a struct object (with the minimum of loops). Finally I need the output to be in the form of a matrix that the toolbox above can turn back into a nifti.
Any and all suggestions, clues, hints and help greatly appreciated!
You can concatenate images as a 4D array and use linear indexes to speed up calculations:
img = cat(4,data.img);
p1 = zeros(nx,ny,nz,n);
p2 = zeros(nx,ny,nz);
sz = ny*nx*nz;
for k = 1 : sz
points = img(k:sz:end);
[p1(k:sz:end),~,p2(k)] = fit_data(a,points,b);
end

Splitting non-continuous sized matrix in vectors

I'm writing an piece of software within Matlab. Here, the user can define a dimension say 3.
This dimension is subsequently the number of iterations of a for loop. Within this loop, I construct a matrix to store the results which are generated during every iteration. So, the data of every iteration is stored in a row of a matrix.
Therefore, the size of the matrix depends on the size of the loop and thus the user input.
Now, I want to separate each row of this matrix (cl_matrix) and create separate vectors for every row automatically. How would one go on about? I am stuck here...
So far I have:
Angle = [1 7 15];
for i = 1:length(Angle)
%% do some calculations here %%
cl_matrix(i,:) = A.data(:,7);
end
I want to automate this based on the length of Angle:
length(Angle)
cl_1 = cl_matrix(1,:);
cl_7 = cl_matrix(2,:);
cl_15= cl_matrix(3,:);
Thanks!
The only way to dynamically generate in the workspace variables variables whos name is built by aggregating string and numeric values (as in your question) is to use the eval function.
Nevertheless, eval is only one character far from "evil", seductive as it is and dangerous as it is as well.
A possible compromise between directly working with the cl_matrix and generating the set of array cl_1, cl_7 and cl_15 could be creating a structure whos fields are dynamically generated.
You can actually generate a struct whos field are cl_1, cl_7 and cl_15 this way:
cl_struct.(['cl_' num2str(Angle(i))])=cl_matrix(i,:)
(you might notice the field name, e. g. cl_1, is generated in the same way you could generate it by using eval).
Using this approach offers a remarkable advantage with respect to the generation of the arrays by using eval: you can access to the field od the struct (that is to their content) even not knowing their names.
In the following you can find a modified version of your script in which this approach has been implemented.
The script generate two structs:
the first one, cl_struct_same_length is used to store the rows of the cl_matrix
thesecond one, cl_struct_different_length is used to store arrays of different length
In the script there are examples on how to access to the fileds (that is the arrays) to perform some calculations (in the example, to evaluate the mean of each of then).
You can access to the struct fields by using the functions:
getfield to get the values stored in it
fieldnames to get the names (dynamically generated) of the field
Updated script
Angle = [1 7 15];
for i = 1:length(Angle)
% do some calculations here %%
% % % cl_matrix(i,:) = A.data(:,7);
% Populate cl_matrix
cl_matrix(i,:) = randi(10,1,10)*Angle(i);
% Create a struct with dinamic filed names
cl_struct_same_length.(['cl_' num2str(Angle(i))])=cl_matrix(i,:)
cl_struct_different_length.(['cl_' num2str(Angle(i))])=randi(10,1,Angle(i))
end
% Use "fieldnames" to get the names of the dinamically generated struct's field
cl_fields=fieldnames(cl_struct_same_length)
% Loop through the struct's fileds to perform some calculation on the
% stored values
for i=1:length(cl_fields)
cl_means(i)=mean(cl_struct_same_length.(cl_fields{i}))
end
% Assign the value stored in a struct's field to a variable
row_2_of_cl_matrix=getfield(cl_struct_different_length,(['cl_' num2str(Angle(2))]))
Hope this helps.

tfest :: too many parameters for chosen data size

I am trying to find transfer function for some input data and output data using the code
Temperature = [zeros(1,153) 300*ones(1,47)];
out_temp = [zeros(1,147) ScopeData4.signals(1).values'];
N = 1;
tfdata_tem = iddata(out_temp,Temperature,0.001);
sys = tfest(tfdata_tem,N);
but in the end I get the following error despite the fact that i have increased the number of samples and reduced the order to 1
There are too many parameters to estimate for chosen estimation data size. Reduce model order or use a larger data set.
The most likely problem is that your data set doesn't contain a rich enough set of frequencies for the underlying algorithm to estimate a model (of any order).
The iddata1 sample data set gives an example of what typical data should look like.
In particular, note that the input signal is comprised of many steps, occurring at non-regular intervals, unlike your data that has just one step.
load iddata1 z1;
plot(z1);
As the answer by Phil Goddard shows in the figure, you need two column output values and input values. But the values in your programming are two row values. That means you need to change it into
Temperature = [zeros(1,153) 300*ones(1,47)]';
out_temp = [zeros(1,147) ScopeData4.signals(1).values']';
N = 1;
tfdata_tem = iddata(out_temp,Temperature,0.001);
sys = tfest(tfdata_tem,N);

Matlab: How can I call object properties using a string?

I am currently working on a data analysis program that contains two objects: Experiment and RunSummary. The Experiment object contains multiple instances of the RunSummary object. Each RunSummary object contains multiple properties (row matrices) each containing different data points for a given run.
For example: Experiment.RunSummary(5).Tmean is row matrix containing all of the average torque values for run 5 in my experiment.
I am currently trying to find a way to combine selected common properties from specific runs into a single matrix that can be used for further analysis. The current way I have had to do this is:
X(:,1) = [Drilling.Runs(1).Tmean,...
Drilling.Runs(2).Tmean,...
Drilling.Runs(3).Tmean,...
Drilling.Runs(5).Tmean]';
X(:,2) = [Drilling.Runs(1).Fmean,...
Drilling.Runs(2).Fmean,...
Drilling.Runs(3).Fmean,...
Drilling.Runs(5).Fmean]';
This code takes the average torque (Tmean) and average force (Fmean) from runs 1, 2, 3, and 5 and combines them in a single matrix, X, with Tmean for all runs in the first column and Fmean in the second. Although this method works, I have over 20 different properties and 15 different runs making this coding very tedious.
I have tried using code such as get(Experiment.RunSummary(i),'Tmean') to try and retrieve these property matricies, but was met with the error:
Conversion to double from RunSummary is not possible.
Is there a way to easily combine all of these different properties
into a single matrix using strings to determine which properties are used?
Thanks,
metro
Edit: Drilling is the name of the Experiment object. Runs is the name of the RunSummary object.
You can use dynamic fields. The documentation is for structs, but the same principal works for classes (at least on my R2012a install).
You can also use the comma-separate nature of object array indexing to compress the code.
Example:
I = [1,2,3,5] ;
props = {'Tmean','Fmean'} ;
Nprops = length(props) ;
X = zeros(length(I),Nprops);
for k = 1:Nprops
X(:,k) = [Drilling.Runs(I).(props{k})]';
end

Matlab: Query complicated structures

I am using structures in Matlab to organize my results in an intuitive way. My analysis is quite complex and hierarchical, so this works well---logically. For example:
resultObj.multivariate.individual.distributed.raw.alpha10(1).classification(1). Each level of the structure has several fields. Each alpha field is a structured array, indexed for each dataset, and classification is also a structured array, one for each cross validation run on the data.
To simplify, consider the the classification field:
>> classification
ans =
1x8 struct array with fields:
bestLambda
bestBetas
scores
statObj
fitObj
In which statObj has fields (for example):
dprime: 6.5811
hit: 20
miss: 0
falseAlarms: 0
correctRejections: 30
Of course, the fields have different values for each subject and cross validation run. Given this structure, is there a good way to find the mean of dprime over cross validation runs (i.e. the elements of classification) without needing to construct a for loop to extract, store, and finally compute on?
I was hoping that reshape(struct2array(classification.statObj),5,8) would work, so I could construct a matrix with stats as rows and cross validations runs as columns, but this won't work. I put these items in their own structure specifically because the fields of classification hold elements of various types (matrices, structures, integers).
I am not opposed to restructuring my output entirely, but I'd like it to be done in such a way that the organization is fairly self-commenting, and I could say return to this structure a year from now and remember what and where everything is.
I came up with the following, although I'm not sure if it is what you are looking for:
%# create a structure hierarchy similar to yours
%# (I ignore everything before alpha10, and only create a part of it)
alpha10 = struct();
for a=1:5
alpha10(a).classification = struct();
for c=1:8
alpha10(a).classification(c).statObj = struct('dprime',rand());
end
end
%# matrix of 'dprime' for each alpha across each cross-validation run
st = [alpha10.classification];
st = [st.statObj];
dp = reshape([st.dprime], 8, 5)' %# result is 5-by-8 matrix
Next you can compute mean across the second dimension of this matrix dp
For anyone who happens across this post, and is wrestling with something similar, it is worth asking yourself if such a nested structure-of-structures is really your best option. It may be easier to flatten the hierarchy and include descriptive fields as labels. For instance
resultObj.multivariate.individual.distributed.raw.alpha10(1).classification(1)
might instead be
resultObj(1).
AnlaysisType = 'multivariate'
GroupSolution = false
SignalType = 'distributed'
Processing = 'raw'
alpha = 10
crossvalidation = 1
dprime = 6.5811
bestLambda = []
bestBetas = []
scores = []
fitObj = []
That's not valid Matlab syntax there, but it get's the point across. Rather than building a hierarchy out of nested structures, create a 1xN structure with labels and data. It is a more general solution that is easier to query and work with.