Random number - Choose seed - matlab

because of one project I have to make use of pseudo random numbers with normal distribution.
To this respect, I'm generally putting this down:
nn_u = complex((normrnd(0,1.0,size(H_u))),(normrnd(0,1.0,size(H_u))));
nn_v = complex((normrnd(0,1.0,size(H_u))),(normrnd(0,1.0,size(H_u))));
nn_w = complex((normrnd(0,1.0,size(H_u))),(normrnd(0,1.0,size(H_u))));
size(H_u) = [4096,1];
This way I don't have any real access to the seed number. What I expect is that, using the above mentioned form, there will be 6 seeds, that means one different seed for any of the six times called normrnd function.
What I'd like to do at the moment is to generate six independent representations, just as happens above, with only one seed point, which I can pick out of the range [1,999].
To achieve this I was thinking to proceed this way:
n = 4096;
nn_tmp = normrnd(0,1,[n*6,1]);
nn_u = complex(nn_tmp(1:n,1),nn_tmp(n+1:2*n,1));
nn_v = complex(nn_tmp(2*n+1:3*n,1),nn_tmp(3*n+1:4*n,1));
nn_w = complex(nn_tmp(4*n+1:5*n,1),nn_tmp(5*n+1:6*n,1));
But this way, I don't have any direct access to the seed; I don't even know if the kind of operation I'd do has any strong theoretical validation.
Any support would be welcome.

I think you can use rng to seed and then use randn instead of normrnd for your problem
So something like
SEED = 120; %for example
rng(SEED, 'twister');
nn_u = complex(randn(size(H_u)),randn(size(H_u)));
nn_v = complex(randn(size(H_u)),randn(size(H_u)));
nn_w = complex(randn(size(H_u)),randn(size(H_u)));

Related

matlab global stream: Any correlation between generated sets of numbers?

I'm just looking for some clarification in creating sets of random numbers in matlab and how this relates to the 'global stream.'
I know that I can set the global stream for reproducibility of my results should I run the code again:
s = RandStream('mt19937ar','Seed',7);
RandStream.setGlobalStream(s);
A = rand(1,10);
Every time I run this, A is the same. For example,
s = RandStream('mt19937ar','Seed',7);
RandStream.setGlobalStream(s);
B = rand(1,10);
I should find that isequal(A,B) is true.
Now my question pertains to the following,
s = RandStream('mt19937ar','Seed',7);
RandStream.setGlobalStream(s);
A = rand(1,10);
B = rand(1,10);
If I run this then A and B are different sets of numbers. Can I take them to be independent sets, or is there some correlation between them? If I wanted to ensure stronger independence between A and B should I create a new and different globabl stream after creating A, but before creating B? For example,
sA = RandStream('mt19937ar','Seed',7);
RandStream.setGlobalStream(sA);
A = rand(1,10);
sB = RandStream('mt19937ar','Seed',3);
RandStream.setGlobalStream(sB);
B = rand(1,10);
Matlab generate random number from a "KNOWN" but complex function,
All pseudorandom number generators are based on deterministic algorithms, and all will fail a sufficiently specific statistical test for randomness
when you change seed number (which you could do it with rng(your_desired_seed_number) too) it just use another part of the function which is not irrelevant to previous random number sequence(at least i think that way) , (it is a mathematical question)
but i suggest to use different generators to have maximum independent random number,
rng(5,'twister'); % you could also use randstream instead of rng
A=rand(1,10);
rng(3,'combRecursive');
B=rand(1,10);

K-Means centroids getting marginalized to having no data points [Matlab]

So I have a sort of strange problem. I have a dataset with 240 points and I'm trying to use k-means to cluster it into 100 clusters. I'm using Matlab but I don't have access to the statistics toolbox, so I had to write my own k-means function. It's pretty simple, so that shouldn't be too hard, right? Well, it seems something is wrong with my code:
function result=Kmeans(X,c)
[N,n]=size(X);
index=randperm(N);
ctrs = X(index(1:c),:);
old_label = zeros(1,N);
label = ones(1,N);
iter = 0;
while ~isequal(old_label, label)
old_label = label;
label = assign_labels(X, ctrs);
for i = 1:c
ctrs(i,:) = mean(X(label == i,:));
if sum(isnan(ctrs(i,:))) ~= 0
ctrs(i,:) = zeros(1,n);
end
end
iter = iter + 1;
end
result = ctrs;
function label = assign_labels(X, ctrs)
[N,~]=size(X);
[c,~]=size(ctrs);
dist = zeros(N,c);
for i = 1:c
dist(:,i) = sum((X - repmat(ctrs(i,:),[N,1])).^2,2);
end
[~,label] = min(dist,[],2);
It seems what happens is that when I go to recompute the centroids, some centroids have no datapoints assigned to them, so I'm not really sure what to do with that. After doing some research on this, I found that this can happen if you supply arbitrary initial centroids, but in this case the initial centroids are taken from the datapoints themselves, so this doesn't really make sense. I've tried re-assigning these centroids to random datapoints, but that causes the code to not converge (or at least after letting it run all night, the code never converged). Basically they get re-assigned, but that causes other centroids to get marginalized, and repeat. I'm not really sure what's wrong with my code, but I ran this same dataset through R's k-means function for k=100 for 1000 iterations and it managed to converge. Does anyone know what I'm messing up here? Thank you.
Let's step through your code one piece at a time and discuss what you're doing with respect to what I know about the k-means algorithm.
function result=Kmeans(X,c)
[N,n]=size(X);
index=randperm(N);
ctrs = X(index(1:c),:);
old_label = zeros(1,N);
label = ones(1,N);
This looks like a function that takes in a data matrix of size N x n, where N is the number of points you have in your dataset, while n is the dimension of a point in your dataset. This function also takes in c: the desired number of output clusters.index provides a random permutation between 1 to as many data points as you have, and then we select at random c points from this permutation which you have used to initialize your cluster centres.
iter = 0;
while ~isequal(old_label, label)
old_label = label;
label = assign_labels(X, ctrs);
for i = 1:c
ctrs(i,:) = mean(X(label == i,:));
if sum(isnan(ctrs(i,:))) ~= 0
ctrs(i,:) = zeros(1,n);
end
end
iter = iter + 1;
end
result = ctrs;
For k-means, we basically keep iterating until the cluster membership of each point from the previous iteration matches with the current iteration, which is what you have going with your while loop. Now, label determines the cluster membership of each point in your dataset. Now, for each cluster that exists, you determine what the mean data point is, then assign this mean data point as the new cluster centre for each cluster. For some reason, should you experience any NaN for any dimension of your cluster centre, you set your new cluster centre to all zeroes instead. This looks very abnormal to me, and I'll provide a suggestion later. Edit: Now I understand why you did this. This is because should you have any clusters that are empty, you would simply make this cluster centre all zeroes as you wouldn't be able to find the mean of empty clusters. This can be solved with my suggestion for duplicate initial clusters towards the end of this post.
function label = assign_labels(X, ctrs)
[N,~]=size(X);
[c,~]=size(ctrs);
dist = zeros(N,c);
for i = 1:c
dist(:,i) = sum((X - repmat(ctrs(i,:),[N,1])).^2,2);
end
[~,label] = min(dist,[],2);
This function takes in a dataset X and the current cluster centres for this iteration, and it should return a label list of where each point belongs to each cluster. This also looks correct because for each column of dist, you are calculating the distance between each point to each cluster, where those distances are in the ith column for the ith cluster. One optimization trick that I would use is to avoid using repmat here and use bsxfun which handles the replication internally. Therefore, do this instead:
function label = assign_labels(X, ctrs)
[N,~]=size(X);
[c,~]=size(ctrs);
dist = zeros(N,c);
for i = 1:c
dist(:,i) = sum(bsxfun(#minus, X, ctrs(i,:)).^2, 2);
end
[~,label] = min(dist,[],2);
Now, this all looks correct. I also ran some tests myself and it all seems to work out, provided that the initial cluster centres are unique. One small problem with k-means is that we implicitly assume that all cluster centres are unique. Should they not be unique, then you'll run into a problem where two clusters (or more) have the exact same initial cluster centres.... so which cluster should the data point be assigned to? When you're doing the min in your assign_labels function, should you have two identical cluster centres, the cluster label that the point gets assigned to will be the minimum of these two numbers. This is why you will have a cluster with no points in it, as all of the points that should have been assigned to this cluster get assigned to the other.
As such, you may have two (or more) initial cluster centres that are the same upon randomization. Even though the permutation of the indices to select are unique, the actual data points themselves may not be unique upon selection. One thing that I can impose is to loop over the permutation until you get a unique set of initial clusters without repeats. As such, try doing this at the beginning of your code instead.
[N,n]=size(X);
index=randperm(N);
ctrs = X(index(1:c),:);
while size(unique(ctrs, 'rows'), 1) ~= c
index=randperm(N);
ctrs = X(index(1:c),:);
end
old_label = zeros(1,N);
label = ones(1,N);
iter = 0;
%// While loop appears here
This will ensure that you have a unique set of initial clusters before you continue on in your code. Now, going back to your NaN stuff inside the for loop. I honestly don't see how any dimension could result in NaN after you compute the mean if your data doesn't have any NaN to begin with. I would suggest you get rid of this in your code as (to me) it doesn't look very useful. Edit: You can now remove the NaN check as the initial cluster centres should now be unique.
This should hopefully fix your problems you're experiencing. Good luck!
"Losing" a cluster is not half as special as one may think, due to the nature of k-means.
Consider duplicates. Lets assume that all your first k points are identical, what would happen in your code? There is a reason you need to carefully handle this case. The simplest solution would be to leave the centroid as it was before, and live with degenerate clusters.
Given that you only have 240 points, but want to use k=100, don't expect too good results. Most objects will be on their own... choosing a much too large k is probably a reason why you do see this degeneration effect a lot. Let's assume out of these 240, fewer than 100 are unique... Then you cannot have 100 non-empty clusters... Plus, I would consider this kind of result "overfitting", anyway.
If you don't have the toolboxes you need in Matlab, maybe you should move on to free software. Octave, R, Weka, ELKI, ... there is plenty of software, some of which is much more powerful when it comes to clustering than pure Matlab (in particular, if you don't have the toolboxes).
Also benchmark. You will be surprised of the performance differences.

Using the events locator in Matlab when solving an ODE with multiple output arguments

I'm having a problem with the events locator in Matlab. I'm looking at a coupled ODE which represents two interfaces going unstable. I have no problems solving it, all I'm trying to do is find the radius (R_2v) when a certain amplitude (10^2) is reached for each interface. Av is a matrix of two column vectors, which are the radial (or time) evolutions of the amplitudes of the two interfaces.
The problem being reported is 'Nonscalar arrays of function handles are not allowed; use cell arrays instead.'
n.b. 2, mu_2_0, mu_2_f, R_gel, and V are all parameters for funsys
options = odeset('Events',[#eventsA,#eventsB]);
[R_2v,Av,R_2Ae,Ae,Aie,R_2Be,Be,Bie] = ode15s(#funsys_back_Vg1,[1 R_max],[1;1],options,2,mu_2_0,mu_2_f,R_gel,V);
function [valueA,isterminalA,directionA] = eventsA(R_2v,Av)
valueA = Av(1) - 10^2;
isterminalA = 0;
directionA = 0;
function [valueB,isterminalB,directionB] = eventsB(R_2v,Av)
valueB = Av(2) - 10^2;
isterminalB = 0;
directionB = 0;
The error is pretty clear: arrays of function handles aren't allowed in Matlab. You may be able to use cell arrays in some cases. See this question. I'm not sure that the ODE suite functions support the evaluation of separate events functions in a cell array (R2013b gives me an error).
However, I'm not sure you need to be doing what you're doing. Event functions can be rather complex and can be written to do many things including detecting multiple events. See this answer of mine for example. Or this one. You should be able to write just one function:
function [value,isterminal,direction] = events(R_2v,Av)
value = Av - 10^2;
isterminal = 0;
direction = 0;
And then your Aie (see below) will tell you the index, 1 or 2, of AV that triggered a particular event at R_2Ae and Ae.
options = odeset('Events',#events);
fun = #(R_2v,Av)funsys_back_Vg1(R_2v,Av,2,mu_2_0,mu_2_f,R_gel,V);
[R_2v,Av,R_2Ae,Ae,Aie] = ode15s(fun,[1 R_max],[1;1],options);
You had too many output arguments so I removed the extra ones. I also "fixed" your call to ode15s to pass parameter via the anonymous function, which has be the preferred and most efficient way to do things for many years now. I may have messed things up relative to you your actual code though because you didn't provide a working example.

Generating Data Set in Matlab

I wanted to ask how to generate a data set in Matlab. I need it to test Feature Selection Algorithms on high dimensional data... The data set should be synthetic, multivariate and contain INTERACTING features.
Synthetic data sets like the MONKS problem is available on http://archive.ics.uci.edu/ml/datasets/MONK%27s+Problems .... unfortunately I have no clue how to visualize/generate and modify the data according to my need. The goal is to run an algorithm which detects interacting features.
I will be very thankful for a kind reply.
I'm not sure this is what you are looking for, but if I needed to do this, I would start by generating anonymous functions and generic variable names that I could apply randomly within a dataset.
For example, you could generate a dataset:
myData = rand(100,6);
and create a few functions which include interdependencies
interact = #(x) x*x;
interact2 = #(x) x*(x-1);
then create a random logical distribution
y = round(rand(100,1)); %(100 rows of random 0's or 1's)
go through the dataset and use the interact function on only rows where y is true
dataset(y == 1,:) = interact(dataset(y==1,:));
repeat the above with the other interaction functions you define if you desire. it would probably be useful to do this so that you can avoid row dependencies (see below) so generating a few datasets could be in order, i.e.
dataset2(y==1,:) = interact2(dataset(y==1,:));
A similar approach might be taken with variables (in the example set it shows some categorical variables).
myVariable = repmat('data', 100, 1);
listofvariables = genvarname(cellstr(myVariable));
y = round(rand(100,1)); % logical index for the data
randomly select a generic variable to repeat
applyvar = round(rand(1,1)*100);
selectedVariable = listofvariables(applyvar);
replace indices of the variable list with your repeated variable
listofvariables(y == 1) = selectedVariable;
put together the dataset(s) in some order of your choosing
[cellstr(num2str(dataset(:,1))) listofvariables cellstr(num2str(dataset(:,2)) cellstr(num2str(dataset2(:,2))]

loop over all possible combinations

I would like to include a loop in my script which finds the correlation of every possible combination of the data. This can be done manually by the following code:
clear all
%generate fake data
LName={'Name1','Name2','Name3'};
Data={rand(12,1),rand(12,1),rand(12,1)};
%place in a structure
d = [LName;Data];
Data = struct(d{:});
%find the correlation
[R,P] = corrcoef(Data.Name1,Data.Name2);
[R2,P2] = corrcoef(Data.Name1,Data.Name3);
[R3,P3] = corrcoef(Data.Name2,Data.Name3);
However, I would like to do this in a loop, I have started but have failed at the first hurdle. My attempted loop, which doesn't work is shown below:
SNames=fieldnames(Data);
for i=1:numel(SNames);
[R{i},P{i}] = corrcoef(Data.(SNames{i}),Data.(SNames{i+1}));
end
I'm struggling on knowing how to tell matlab to loop over a different combination of values with every iteration.
Any help provided would be much appreciated.
Try something like this:
pairs = combnk (1:3,2) % all combinations of 2 elements taken out of the vector [1,2,3]
for i = 1 : size (pairs,1)
[R{i},P{i}] = corrcoef(Data.(SNames{pairs(i,1)}),Data.(SNames{pairs(i,2)}));
end
#ItamarKatz answer is a good one. However, if you don't have the statistics toolbox, you can not use the combnk command.
In that case, you can download combinations generator from here.