I want to generate 100 vectors each of size 1x7. I have the following code currently, but when I plot it, it seems to be too linearly spaced. Is there a way to achieve a similar result only rougher?
P = randi([7 12],100,7)'/10.* repmat(randn(1,7),100,1)';
You may use different distribution for the randomizing part. randi is using uniformly distribution. You can use rng function to control random number generation. There are different generators like :
'twister' Mersenne Twister
'combRecursive' Combined Multiple Recursive
'multFibonacci' Multiplicative Lagged Fibonacci
as an example:
rng('shuffle')
rng(1);
A = rand(2,2);
rng(2);
B = rand(2,2);
it produces different numbers each time.
check this link for more info.
Related
I have a support (supp_epsilon) and a probability mass function (pr_mass_epsilon) in Matlab, constructed as follows.
supp_epsilon=[0.005 0.01 0.015 0.02];
suppsize_epsilon=size(supp_epsilon,2);
pr_mass_epsilon=zeros(suppsize_epsilon,1);
alpha=1;
beta=4;
for j=1:suppsize_epsilon
pr_mass_epsilon(j)=betacdf(supp_epsilon(j),alpha,beta)/sum(betacdf(supp_epsilon,alpha,beta));
end
Note that the components of pr_mass_epsilon sum up to 1. Now, I want to draw n random numbers from pr_mass_epsilon. How can I do this? I would like a code that works for any suppsize_epsilon.
In other words: I want to randomly draw elements from supp_epsilon, each element with a probability given by pr_mass_epsilon.
Using the Statistics Toolbox
The randsample function can do that directly:
result = randsample(supp_epsilon, n, true, pr_mass_epsilon);
Without using toolboxes
Manual approach:
Generate n samples of a uniform random variable in the interval (0,1).
Compare each sample with the distribution function (cumulative sum of mass function).
See in which interval of the distribution function each uniform sample lies.
Index into the array of possible values
result = supp_epsilon(sum(rand(1,n)>cumsum(pr_mass_epsilon(:)), 1)+1);
For your example, with n=1e6 either of the two approaches gives a histogram similar to this:
histogram(result, 'normalization', 'probability')
i have question how to calculate weighted correlations for matrices,from wikipedia i have created three following codes
1.weighted mean calculation
function [y]= weighted_mean(x,w);
n=length(x);
%assume that weight vector and input vector have same length
sum=0.0;
sum_weight=0.0;
for i=1:n
sum=sum+ x(i)*w(i);
sum_weight=sum_weight+w(i);
end
y=sum/sum_weight;
end
2.weighted covariance
function result=cov_weighted(x,y,w)
n=length(x);
sum_covar=0.0;
sum_weight=0;
for i=1:n
sum_covar=sum_covar+w(i)*(x(i)-weighted_mean(x,w))*(y(i)-weighted_mean(y,w));
sum_weight=sum_weight+w(i);
end
result=sum_covar/sum_weight;
end
and finally weighted correlation
3.
function corr_weight=weighted_correlation(x,y,w);
corr_weight=cov_weighted(x,y,w)/sqrt(cov_weighted(x,x,w)*cov_weighted(y,y,w));
end
now i want to apply weighted correlation method for matrices,related to this link
http://www.mathworks.com/matlabcentral/fileexchange/20846-weighted-correlation-matrix/content/weightedcorrs.m
i did not understand anything how to apply,that why i have created my self,but need in case of input are matrices,thanks very much
#dato-datuashvili Maybe I am providing too much information...
1) I would like to stress that the evaluation of Weighted Correlation matrices are very uncommon. This happens because you have to provide beforehand the weights. Unless you have a clear reason to choose the weights, there is no clear way to provide them.
How can you tell that a measurement of your sample is more or less important than another measurement?
Having said that, the weights are up to you! Yo have to choose them!
So, people usually consider just the correlation matrix (no weights or all weights are the same e.g w_i=1).
If you have a clear way to choose good weights, just do not consider this part.
2) I understand that you want to test your code. So, in order to that, you have to have correlated random variables. How to generate them?
Multivariate normal distributions are the simplest case. See the wikipedia page about them: Multivariate Normal Distribution (see the item "Drawing values from the distribution". Wikipedia shows you how to generate the random numbers from this distribution using Choleski Decomposition). The 2-variate case is much simpler. See for instance Generate Correlated Normal Random Variables
The good news is that if you are using Matlab there is a function for you. See Matlab: Random numbers from the multivariate normal distribution.]
In order to use this function you have to provide the desired means and covariances. [Note that you are making the role of nature here. You are generating the data! In real life, you are going to apply your function to the real data. What I am trying to say is that this step is only useful for tests. Furthermore, pay attencion to the fact that in the Matlab function you are providing the variances and evaluating the correlations (covariances normalized by standard errors). In the 2-dimensional case (that is the case of your function it is possible to provide directly the correlation. See the page above that I provided to you of Math.Stackexchange]
3) Finally, you can apply them to your function. Generate X and Y from a normal multivarite distribution and provide the vector of weights w to your function corr_weight_correlation and you are done!
I hope I provide what you need!
Daniel
Update:
% From the matlab page
mu = [2 3];
SIGMA = [1 1.5; 1.5 3];
n=100;
[x,y] = mvnrnd(mu,SIGMA,n);
% Using your code
w=ones(n,1);
corr_weight=weighted_correlation(x,y,w); % Remember that Sigma is covariance and Corr_weight is correlation. In order to calculate the same thing, just use result=cov_weighted instead.
How can I select randomly and fairly some data from a dataset in matlab?
When we use the randperm function to select data, they are random and fair?
As you already suggested, selecting k uniformly random chosen rows out of n can be done with randperm, assuming you don't want duplication.
Example:
dataSet = rand(1000,4);
idx = randperm(size(dataSet,1),10)
dataSet(idx,:)
If you have the Statistics Toolbox, you can use randsample:
sample = randsample(data,k);
takes k values sampled uniformly at random, without replacement, from the values in the vector data. See above link for other options.
Equivalent code with randperm:
ind = randperm(numel(data));
sample = data(ind(1:k));
Yes, either of these approaches gives random samples, and yes, they are fair. I assume that by "fair" you mean "uniform": each entry of data is picked with the same probability.
anything that uses uniform distribution is "fair". because the output is supposed to be distributed randomly in an specific range. for example, rand function in matlab.
I was wondering if it is possible to generate a random distribution that is a function of a certain parameter. In other words, using MATLAB I type rand(1,5) I have a uniformly random distribution of 5 numbers between 0 and 1. It is possible to have this result as a function of a certain parameter? Do you know any algorithm about that? I just need that in an interval don't need a 2D representation.
I think you want to do this:
http://en.wikipedia.org/wiki/Inverse_transform_sampling
In MATLAB, it's quite straightforward, you simply specify the function.
n = 10000; % number of random draws
r = rand(n, 1); % generate uniform random numbers
f = #norminv; % specify transforming function
tr = f(r); % transformed numbers, now normally distributed
hist(tr, 30) % plot histogram
This example is a bit contrived, since we could simply have used randn. But the method holds generally.
If you have the Statistics toolbox, and you want to sample from one of the popular distributions, take a look at the random number generators that are available to you, link.
Is there a statistical difference between generating a series of paths for a montecarlo simulation using the following two methods (note that by path I mean a vector of 350 points, normally distributed):
A)
for path = 1:300000
Zn(path, :) = randn(1, 350);
end
or the far more efficient B)
Zn = randn(300000, 350);
I just want to be sure there is no funny added correlation or dependence between the rows in method B that isn't present in method A. Like maybe method B distributes normally over 2 dimensions where A is over 1 dimension, so maybe that makes the two statistically different?
If there is a difference then I need to know the same for uniform distributions (i.e. rand instead of randn)
Just to add to the answer of #natan (+1), run the following code:
%# Store the seed
Rng1 = rng;
%# Get a matrix of random numbers
X = rand(3, 3);
%# Restore the seed
rng(Rng1);
%# Get a matrix of random numbers one vector at a time
Y = nan(3, 3);
for n = 1:3
Y(:, n) = rand(3, 1);
end
%# Test for differences
if any(any(X - Y ~= 0)); disp('Error'); end;
You'll note that there is no difference between X and Y. That is, there is no difference between building a matrix in one step, and building a matrix from a sequence of vectors.
However, there is a difference between my code and yours. Note I am populating the matrix by columns, not rows, since when rand is used to construct a matrix in one step, it populates by column. By the way, I'm not sure if you realize, but as a general rule you should always try and perform vector operations on the columns of matrices, not the rows. I explained why in a response to a question on SO the other day; see here for more...
Regarding the question of independence/dependence, one needs to be careful with the language one uses. The sequence of numbers generated by rand are perfectly dependent. For the vast majority of statistical tests, they will appear to be independent - nonetheless, in theory, one could construct a statistical test that would demonstrate the dependency between a sequence of numbers generated by rand.
Final thought, if you have a copy of Greene's "Econometric Analysis", he gives a neat discussion of random number generation in section 17.2.
As far as the base R's random number generator is concerned, also, there doesn't appear to be any difference between generating a sequence of random numbers at once or doing it one-by one. Thus, #Colin T Bowers' (+1) suggested behavior above also holds in R. Below is an R version of Colin's code:
#set seed
set.seed(1234)
# generate a sequence of 10,000 random numbers at once
X<-rnorm(10000)
# reset the seed
set.seed(1234)
# create a vector of 10,000 zeros
Y<-rep(0,times=10000)
# generate a sequence of 10,000 random numbers, one at a time
for (i in 1:10000){
Y[i]<-rnorm(1)
}
# Test for differences
if(any(X-Y!=0)){print("Error")}