I have written a algorithm, storing the data points in column. The results contains mean and std. However, if I store the data points in row instead, the results seem more close to the true value. Is there a proper explanation to this? In my opinion, the results should be independent from the way of storage and thus remain the same.
function [C]=test(n,m)
rng default
B=randn(m,n)
rng default
A=randn(n,m)
C=A'-B
end
I am using this generator to do my calculations. My results turns out to be different depending on if I use randn(m,n) or randn(n,m). (I am taking m-th column or m-row to sum them up cumulatively).
Related
I want to import several arrays of data into Simulink so that I can cycle through each of those arrays, operating on one column at a time, and choosing a different array at random intervals. (So let's say I start cycling through the columns of array 1 for 1 second, then I'll move over to array 2, then array 3 and back to array 1).
I can't use From File blocks because each column then has a specific timestamp associated to it, so I can neither cycle, nor start the simulation selecting a different array each time.
Is there a solution to this problem in Simulink?
Use a MATLAB Function Block. Have your array input to it as a Parameter, which means it'll pick the whole array up from the MATLAB Workspace during model initialization.
Depending on how you want to index into the matrix -- you haven't given enough information to determine this -- you could either,
have 2 signals input to the MATLAB Function block that represent a row index and column index. You'd then have logic in the model that specifies these signal/index values.
have 2 persistent variables within the MATLAB Function block that define the row and column indices. Have logic in the block that specify how these variables change each time step.
When I'm trying load data, from Matlab to Simulink, I get this error:
Error using TSFPnew (line 191)
Invalid matrix-format variable specified as workspace input in 'modelTSFP/From Workspace5'. The matrix
must have two dimensions and at least two columns. Complex signals of any data type and non-double
real signals must be in structure format. The first column must contain time values and the remaining
columns the data values. Matrix values cannot be Inf or NaN.
I have very simple model (I know, it be easier to do this computation on Matlab, but this is only fragment of my model):
All data have these same dimension 1x144:
Why I can't just load it to the Simulink space?
The error message is pretty self-explanatory: the data in the From Workspace block represents a time dependent variable so if you are using an array, the first column of the array must be the time values and the second (or more) columns the corresponding data points. Check the documentation for more details. Your data appears to be only vectors, where is the corresponding time data for your values?
If you want a parameter (that doesn't vary with time), then don't use a From Workspace block, use a Constant block instead.
Suppose that we have this structure:
for i=1:x1
Out = randperm(40);
Out_Final = %% divide 'Out' to 10 parts. and select these parts for some purposes
for j=1:x2
%% Process on `Out_Final`
end
end
I'm using outer loop (for i=1:x1) to repeat main process (for j=1:x2) loop and average between outputs to have more robust results. I want randperm doesn't result equal (or near equal) outputs. I want have different Output for this function as far as possible in every calling in (for i=1:x1) loop.
How can i do that in MATLAB R2014a?
The randomness algorithms used by randperm are very good. So, don't worry about that.
However, if you draw 10 random numbers from 1 to 10, you are likely to see some more frequently than others.
If you REALLY don't want this, you should probably not focus on randomly selecting the numbers, but on selecting the numbers in a way that they are nicely spread out througout their possible range. (This is a quite different problem to solve).
To address your comment:
The rng function allows you to create reproducible results, make sure to check doc rng for examples.
In your case it seems like you actually don't want to reset the rng each time, as that would lead to correlated random numbers.
I'm interested in understanding the variety of zeroes that a given function produces with the ultimate goal of identifying the what frequencies are passed in high/low pass filters. My idea is that finding the lowest value zero of a filter will identify the passband for a LPF specifically. I'm attempting to use the [hz,hp,ht] = zplane(z,p) function to do so.
The description for that function reads "returns vectors of handles to the zero lines, hz". Could someone help me with what a vector of a handle is and what I do with one to be able to find the various zeros?
For example, a simple 5-point running average filter:
runavh = (1/5) * ones(1,5);
using zplane(runavh) gives an acceptable pole/zero plot, but running the [hz,hp,ht] = zplane(z,p) function results in hz=175.1075. I don't know what this number represents and how to use it.
Many thanks.
Using the get command, you can find out things about the data.
For example, type G=get(hz) to get a list of properties of the zero lines. Then the XData is given by G.XData, i.e. X=G.XData.
Alternatively, you can only pull out the data you want
X=get(hz,'XData')
Hope that helps.
I am trying to put my dataset into the MATLAB [ranked,weights] = relieff(X,Ylogical,10, 'categoricalx', 'on') function to rank the importance of my predictor features. The dataset<double n*m> has n observations and m discrete (i.e. categorical) features. It happens that each observation (row) in my dataset has at least one NaN value. These NaNs represent unobserved, i.e. missing or null, predictor values in the dataset. (There is no corruption in the dataset, it is just incomplete.)
relieff() uses this function below to remove any rows that contain a NaN:
function [X,Y] = removeNaNs(X,Y)
% Remove observations with missing data
NaNidx = bsxfun(#or,isnan(Y),any(isnan(X),2));
X(NaNidx,:) = [];
Y(NaNidx,:) = [];
This is not ideal, especially for my case, since it leaves me with X=[] and Y=[] (i.e. no observations!)
In this case:
1) Would replacing all NaN's with a random value, e.g. 99999, help? By doing this, I am introducing a new feature state for all the predictor features so I guess it is not ideal.
2) or is replacing NaNs with the mode of the corresponding feature column vector (as below) statistically more sound? (I am not vectorising for clarity's sake)
function [matrixdata] = replaceNaNswithModes(matrixdata)
for i=1: size(matrixdata,2)
cv= matrixdata(:,i);
modevalue= mode(cv);
cv(find(isnan(cv))) = modevalue;
matrixdata(:,i) = cv;
end
3) Or any other sensible way that would make sense for "categorical" data?
P.S: This link gives possible ways to handle missing data.
I suggest to use a table instead of a matrix.
Then you have functions such as ismissing (for the entire table), and isundefined to deal with missing values for categorical variables.
T = array2table(matrix);
T = standardizeMissing(T); % NaN is standard for double but this
% can be useful for other data type
var1 = categorical(T.var1);
missing = isundefined(var1);
T = T(missing,:); % removes lines with NaN
matrix = table2array(T);
For a start both solutiona (1) and (2) do not help you handle your data more properly, since NaN is in fact a labelling that is handled appropriately by Matlab; warnings will be issued. What you should do is:
Handle the NaNs per case
Use try catch blocks
NaN is like a number, and there is nothing bad about it. Even is you divide by NaN matlab will treat it properly and give you a NaN.
If you still want to replace them, then you will need an assumption that holds. For example, if your data is engine speeds in a timeseries that have been input by the engine operator, but some time instances have not been specified then there are more than one ways to handle the NaN that will appear in the matrix.
Replace with 0s
Replace with the previous value
Replace with the next value
Replace with the average of the previous and the next value
and many more.
As you can see your problem is ill-posed, and depends on the predictor and the data source.
In case of categorical data, e.g. three categories {0,1,2} and supposing NaN occurs in Y.
for k=1:size(Y,2)
[ id ]=isnan(Y(:,k);
m(k)=median(Y(~id),k);
Y(id,k)=round(m(k));
end
I feel really bad that I had to write a for-loop but I cannot see any other way. As you can see I made a number of assumptions, by using median and round. You may want to use a threshold depending on you knowledge about the data.
I think the answer to this has been given by gd047 in dimension-reduction-in-categorical-data-with-missing-values:
I am going to look into this, if anyone has any other suggestions or particular MatLab implementations, it would be great to hear.
You can take a look at this page http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/binary.html the firs a1a, it says transforming categorical into binary. Could possibly work. (: