So I was trying to spread one matrix elements, which were generated with poissrnd, to another with using some bigger (wider?) probability function (for example 100 different possibilities with different weights) to plot both of them and see if the fluctuations after spread went down. After seeing it doesn't work right (fluctuations got bigger) I tried to identify what I did wrong on a really simple example. After testing it for a really long time I still can't understand what's wrong. The example goes like this:
I generate vector with poissrnd and vector for spreading (filled with zeros at the start)
Each element from the poiss vector tells me how many numbers (0.1 of the element value) to generate from the possible options which are: [1,2,3] with corresponding weights [0.2,0.5,0.2]
I spread what I got to my another vector on 3 elements: the corresponding (k-th one), one bofore the corresponding one and one after the corresponding one (so for example if k=3, the elements should be spread like this: most should go into 3rd element of another vector, and rest should go to 2nd and 1st element)
Plot both 0.1*poiss vector and vector after spreading to compare if fluctuations went down
The way I generate weighted numbers is from this thread: Weighted random numbers in MATLAB
and this is the code I'm using:
clear all
clc
eta=0.1;
N=200;
fot=10000000;
ix=linspace(-100,100,N);
mn =poissrnd(fot/N, 1, N);
dataw=zeros(1,N);
a=1:3;
w=[.25,.5,.25];
for k=1:N
[~,R] = histc(rand(1,eta*mn(1,k)),cumsum([0;w(:)./sum(w)]));
R = a(R);
przydz=histc(R,a);
if (k>1) && (k<N)
dataw(1,k)=dataw(1,k)+przydz(1,2);
dataw(1,k-1)=dataw(1,k-1)+przydz(1,1);
dataw(1,k+1)=dataw(1,k+1)+przydz(1,3);
elseif k==1
dataw(1,k)=dataw(1,k)+przydz(1,2);
dataw(1,N)=dataw(1,N)+przydz(1,1);
dataw(1,k+1)=dataw(1,k+1)+przydz(1,3);
else
dataw(1,k)=dataw(1,k)+przydz(1,2);
dataw(1,k-1)=dataw(1,k-1)+przydz(1,1);
dataw(1,1)=dataw(1,1)+przydz(1,3);
end
end
plot(ix,eta*mn,'g',ix,dataw,'r')
The fluctuations are still bigger, and I can't identify what's wrong... Is the method for generating weighted numbers wrong in this situation? Cause it doesn't seem so. The way I'm accumulating data from the first vector seems fine too. Is there another way I could do it (so I could then optimize it for using 'bigger' probability functions)?
Sorry for my terrible English.
[EDIT]:
Here is simple pic to show what I meant (I hope it's understandable)
How about trying negative binomial distribution? It is often used as a hyper-dispersed analogue of Poisson distribution. Additional links can be found in this paper, as well as some apparatus in supplement.
Related
This question of mine is not tightly related to Matlab, but is relevant to it:
I'm looking how to fill in the matrix [[a,b,c],[d,e,f]] in a few nontrivial ways so that as many places as possible in
corrcoef([a,b,c],[d,e,f])
are zero. My attempts yield NaN result in most cases.
Given the current comments, you are trying to understand how two series of random draws from two distributions can have zero correlation. Specifically, exercise 4.6.9 to which you refer mentions draws from two normal distributions.
An issue with your approach is that you are hoping to derive a link between a theoretical property and experimentation, in this case using Matlab. And, as you seem to have noticed, unless you are looking at specific degenerate cases, your experimentation will fail. That is because although the true correlation parameter rho in the exercise might be zero, a sample of random draws will always have some level of correlation. Here is an illustration, and as you'll notice if you run it the actual correlations span the whole spectrum between -1 and 1 despite their average being zero (as it should be since both generators are pseudo-uncorrelated):
n=1e4;
experiment = nan(n,1);
for i=1:n
r = corrcoef(rand(4,1),rand(4,1));
experiment(i)=r(2);
end
hist(experiment);
title(sprintf('Average correlation: %.4f%%',mean(experiment)));
If you look at the definition of Pearson correlation in wikipedia, you will see that the only way this can be zero is when the numerator is zero, i.e. E[(X-Xbar)(Y-Ybar)]=0. Though this might be the case asymptotically, you will be hard-pressed to find a non-degenerate case where this will happen in a small sample. Nevertheless, to show you you can derive some such degenerate cases, let's dig a bit further. If you want the expectation of this product to be zero, you could make either the left or the right hand part zero when the other is non-zero. For one side to be zero, the draw must be exactly equal to the average of draws. Therefore we can imagine creating such a pair of variables using this technique:
we create two vectors of 4 variables, and alternate which draw will be equal to the average.
let's say we want X to average 1, and Y to average 2, and we make even-indexed draws equal to the average for X and odd-indexed draws equal to the average for Y.
one such generation would be: X=[0,1,2,1], Y=[2,0,2,4], and you can check that corrcoef([0,1,2,1],[2,0,2,4]) does in fact produce an identity matrix. This is because, every time a component of X is different than its average of 1, the component in Y is equal to its average of 2.
another example, where the average of X is 3 and that of Y is 4 is: X=[3,-5,3,11], Y=[1008,4,-1000,4]. etc.
If you wanted to know how to create samples from non-correlated distributions altogether, that would be and entirely different question, though (perhaps) more interesting in terms of understanding statistics. If this is your case, and given the exercise you mention discusses normal distributions, I would suggest you take a look at generating antithetic variables using the Box-Muller transform.
Happy randomizing!
I have a non-negative function f defined on a unit square S = [0,1] x [0,1] such that
My question is, how can I use MATLAB to generate a 2D random vector from S according to the probability density function f?
Rejection Sampling
The suggestion Luis Mendo made is very good because it applies to nearly all distribution functions. Based on this answer I wrote code for m.
An important point when using rejection sampling this way is that you must know the maximum of your pdf within the range. If you over-estimate the maximum your code will only run slower. If you under-estimate it it will create wrong numbers!
The idea is that you sample many uniform distributed points and accept depending on the probability density for the points.
pdf=#(x).5.*x(:,1)+3./2.*x(:,2);
maximum=2; %Right maximum for THIS EXAMPLE.
%If you are unable to determine the maximum of your
%function within the [0,1]x[0,1] range, please give an example.
result=[];
n=10;
while (size(result,1)<n)
%1. sample random point:
val=rand(1,2);
%2. Accept with probability pdf(val)/maximum
if rand<pdf(val)/maximum
%append to solution
result(end+1,:)=val;
end
end
I know that this solution is not a fast implementation, but I wanted to start with an implementation as simple as possible to make sure that the concept of rejection sampling becomes clear.
ICDF
Besides rejection sampling there is a different approach to address this issue on a more mathematical level, but you need to sit down and do some math first to end up with a better solution. For 1 dimensional distributions you typically sample using the ICDF (inverted cumulative density function) function simply using ICDF(rand(n,1)) to get random samples.
If you manage to do the math, you could instead for your PDF function define two functions ICDF1 (ICDF for the first dimension) and ICDF2 (ICDF for the second dimension) in matlab.
The first ICDF1 would map unifrom random distributed samples to sample values for the first dimension of your random distribution.
The second ICDF2 would map the output if ICDF1 and uniform distributed samples to your intended solution.
Here is some matlab code assuming you already defined ICDF1 and ICDF2
samples=ICDF1(rand(n,1));
samples(:,2)=ICDF2(samples,rand(n,1));
The great advantage of this solution is, that it does not reject any samples, being potentially much faster.
Okay, so I'm working on a problem related to quantum chaos and one of the things I need to do is to map the unit cube in n-dimensions to a parallelepiped in n-dimensions and find all integer points in the interior of this parallelepiped. I have been trying to do this using the following scheme:
Given the linear map B and the dimension of the cube n, we find the coordinates of the corners of the unit hypercube by converting numbers j from 0 to (2^n -1) into their binary representation and turning them into vectors that describe the vertices of the cube.
The next step was to apply the map B to each of these vectors, which gives me a set of 2^n vectors describing the coordinates of the vertices of the parallelepiped in n dimensions
Now, we take the maximum and minimum value attained by any of these vertices in each coordinate direction, i.e the first element of my vectors might have a maximum value of 4 across all of the vertices and a minimum value of -3 etc. This gives me an n-dimensional rectangular prism that contains my parallelepiped and some extra unwanted space.
I now find all points with integer coordinates in this bounding rectangular prism described as vectors in n dimensions
Finally, I apply the inverse of the map B to each of the points and throw away any points that have any coefficients greater than 1 as they must originally have lain outside my unit hypercube.
My issue arises in step 4, I'm struggling to come up with a way of generating all vectors with integer coordinates in my rectangular hyper-prism such that I can change the number of dimensions n on the fly. Ideally, i'd like to be able to increase n at will until it becomes too computationally heavy to do so, but every method of finding all integer points in the prism i've tried so far has relied on n for loops to permute each element and thus I need to rewrite the code every time.
So I guess my question is this, is there any way to code this up so that I can change n on the fly? Also, any thoughts on the idea of the algorithm itself would be appreciated :) It wouldn't surprise me if i've overcomplicated things massively...
EDIT:
Of course as soon as I post the question I see a lovely little link in the side-bar where a clever method has been given already for how to do this: Generate a matrix containing all combinations of elements taken from n vectors
I'll leave this up for the moment just in case anyone has any comments on the method in general, but otherwise (since I can't upvote yet I'll just say it here) Luis Mendo, you are a hero!
Say, I have a cube of dimensions 1x1x1 spanning between coordinates (0,0,0) and (1,1,1). I want to generate a random set of points (assume 10 points) within this cube which are somewhat uniformly distributed (i.e. within certain minimum and maximum distance from each other and also not too close to the boundaries). How do I go about this without using loops? If this is not possible using vector/matrix operations then the solution with loops will also do.
Let me provide some more background details about my problem (This will help in terms of what I exactly need and why). I want to integrate a function, F(x,y,z), inside a polyhedron. I want to do it numerically as follows:
$F(x,y,z) = \sum_{i} F(x_i,y_i,z_i) \times V_i(x_i,y_i,z_i)$
Here, $F(x_i,y_i,z_i)$ is the value of function at point $(x_i,y_i,z_i)$ and $V_i$ is the weight. So to calculate the integral accurately, I need to identify set of random points which are not too close to each other or not too far from each other (Sorry but I myself don't know what this range is. I will be able to figure this out using parametric study only after I have a working code). Also, I need to do this for a 3D mesh which has multiple polyhedrons, hence I want to avoid loops to speed things out.
Check out this nice random vectors generator with fixed sum FEX file.
The code "generates m random n-element column vectors of values, [x1;x2;...;xn], each with a fixed sum, s, and subject to a restriction a<=xi<=b. The vectors are randomly and uniformly distributed in the n-1 dimensional space of solutions. This is accomplished by decomposing that space into a number of different types of simplexes (the many-dimensional generalizations of line segments, triangles, and tetrahedra.) The 'rand' function is used to distribute vectors within each simplex uniformly, and further calls on 'rand' serve to select different types of simplexes with probabilities proportional to their respective n-1 dimensional volumes. This algorithm does not perform any rejection of solutions - all are generated so as to already fit within the prescribed hypercube."
Use i=rand(3,10) where each column corresponds to one point, and each row corresponds to the coordinate in one axis (x,y,z)
As i've explained in a previous question: I have a dataset consisting of a large semi-random collection of points in three dimensional euclidian space. In this collection of points, i am trying to find the point that is closest to the area with the highest density of points.
As high performance mark answered;
the most straightforward thing to do would be to divide your subset of
Euclidean space into lots of little unit volumes (voxels) and count
how many points there are in each one. The voxel with the most points
is where the density of points is at its highest. Perhaps initially
dividing your space into 2 x 2 x 2 voxels, then choosing the voxel
with most points and sub-dividing that in turn until your criteria are
satisfied.
Mark suggested i use triplequad for this, but this is not a function i am familiar with, or understand very well. Does anyone have any pointers on how i could go about using this function in Matlab for what i am trying to do?
For example, say i have a random normally distributed matrix A = randn([300,300,300]), how could i use triplequad to find the point i am looking for? Because as i understand currently, i also have to provide triplequad with a function fun when using it. Which function should that be for this problem?
Here's an answer which doesn't use triplequad.
For the purposes of exposition I define an array of data like this:
A = rand([30,3])*10;
which gives me 30 points uniformly distributed in the box (0:10,0:10,0:10). Note that in this explanation a point in 3D space is represented by each row in A. Now define a 3D array for the counts of points in each voxel:
counts = zeros(10,10,10)
Here I've chosen to have a 10x10x10 array of voxels, but this is just for convenience, it would be only a little more difficult to have chosen some other number of voxels in each dimension, and there don't have to be the same number of voxels along each axis. Then the code
for ix = 1:size(A,1)
counts(ceil(A(ix,1)),ceil(A(ix,2)),ceil(A(ix,3))) = counts(ceil(A(ix,1)),ceil(A(ix,2)),ceil(A(ix,3)))+1
end
will count up the number of points in each of the voxels in counts.
EDIT
Unfortunately I have to do some work this afternoon and won't be able to get back to wrestling with the triplequad solution until later. Hope this is OK in the meantime.