Controlled random number/dataset generation in MATLAB - matlab

Say, I have a cube of dimensions 1x1x1 spanning between coordinates (0,0,0) and (1,1,1). I want to generate a random set of points (assume 10 points) within this cube which are somewhat uniformly distributed (i.e. within certain minimum and maximum distance from each other and also not too close to the boundaries). How do I go about this without using loops? If this is not possible using vector/matrix operations then the solution with loops will also do.
Let me provide some more background details about my problem (This will help in terms of what I exactly need and why). I want to integrate a function, F(x,y,z), inside a polyhedron. I want to do it numerically as follows:
$F(x,y,z) = \sum_{i} F(x_i,y_i,z_i) \times V_i(x_i,y_i,z_i)$
Here, $F(x_i,y_i,z_i)$ is the value of function at point $(x_i,y_i,z_i)$ and $V_i$ is the weight. So to calculate the integral accurately, I need to identify set of random points which are not too close to each other or not too far from each other (Sorry but I myself don't know what this range is. I will be able to figure this out using parametric study only after I have a working code). Also, I need to do this for a 3D mesh which has multiple polyhedrons, hence I want to avoid loops to speed things out.

Check out this nice random vectors generator with fixed sum FEX file.
The code "generates m random n-element column vectors of values, [x1;x2;...;xn], each with a fixed sum, s, and subject to a restriction a<=xi<=b. The vectors are randomly and uniformly distributed in the n-1 dimensional space of solutions. This is accomplished by decomposing that space into a number of different types of simplexes (the many-dimensional generalizations of line segments, triangles, and tetrahedra.) The 'rand' function is used to distribute vectors within each simplex uniformly, and further calls on 'rand' serve to select different types of simplexes with probabilities proportional to their respective n-1 dimensional volumes. This algorithm does not perform any rejection of solutions - all are generated so as to already fit within the prescribed hypercube."

Use i=rand(3,10) where each column corresponds to one point, and each row corresponds to the coordinate in one axis (x,y,z)

Related

Pearson correlation coefficent

This question of mine is not tightly related to Matlab, but is relevant to it:
I'm looking how to fill in the matrix [[a,b,c],[d,e,f]] in a few nontrivial ways so that as many places as possible in
corrcoef([a,b,c],[d,e,f])
are zero. My attempts yield NaN result in most cases.
Given the current comments, you are trying to understand how two series of random draws from two distributions can have zero correlation. Specifically, exercise 4.6.9 to which you refer mentions draws from two normal distributions.
An issue with your approach is that you are hoping to derive a link between a theoretical property and experimentation, in this case using Matlab. And, as you seem to have noticed, unless you are looking at specific degenerate cases, your experimentation will fail. That is because although the true correlation parameter rho in the exercise might be zero, a sample of random draws will always have some level of correlation. Here is an illustration, and as you'll notice if you run it the actual correlations span the whole spectrum between -1 and 1 despite their average being zero (as it should be since both generators are pseudo-uncorrelated):
n=1e4;
experiment = nan(n,1);
for i=1:n
r = corrcoef(rand(4,1),rand(4,1));
experiment(i)=r(2);
end
hist(experiment);
title(sprintf('Average correlation: %.4f%%',mean(experiment)));
If you look at the definition of Pearson correlation in wikipedia, you will see that the only way this can be zero is when the numerator is zero, i.e. E[(X-Xbar)(Y-Ybar)]=0. Though this might be the case asymptotically, you will be hard-pressed to find a non-degenerate case where this will happen in a small sample. Nevertheless, to show you you can derive some such degenerate cases, let's dig a bit further. If you want the expectation of this product to be zero, you could make either the left or the right hand part zero when the other is non-zero. For one side to be zero, the draw must be exactly equal to the average of draws. Therefore we can imagine creating such a pair of variables using this technique:
we create two vectors of 4 variables, and alternate which draw will be equal to the average.
let's say we want X to average 1, and Y to average 2, and we make even-indexed draws equal to the average for X and odd-indexed draws equal to the average for Y.
one such generation would be: X=[0,1,2,1], Y=[2,0,2,4], and you can check that corrcoef([0,1,2,1],[2,0,2,4]) does in fact produce an identity matrix. This is because, every time a component of X is different than its average of 1, the component in Y is equal to its average of 2.
another example, where the average of X is 3 and that of Y is 4 is: X=[3,-5,3,11], Y=[1008,4,-1000,4]. etc.
If you wanted to know how to create samples from non-correlated distributions altogether, that would be and entirely different question, though (perhaps) more interesting in terms of understanding statistics. If this is your case, and given the exercise you mention discusses normal distributions, I would suggest you take a look at generating antithetic variables using the Box-Muller transform.
Happy randomizing!

Matlab: zero groups of non-zero elements in a matrix based on group size

Essentially I have binary, 3D image masks with the "1"'s in them in groups of various shapes and sizes spread throughout the mask. Working in matlab, I've got tools that allow me to convert this into a matrix, and what I'm looking to do is go through the matrix and zero blobs of 1's (i.e. adjacent sets of non-zero numbers which are surrounded by 0's) if the total size of that group is less than a given number of elements (say 30). Is there a pre-existing function that will do this, or am I going to need to get involved with kernels and the like?
As an aside, I'm still fairly new to Matlab so would really appreciate any answers given being written in a "for dummies" kind of style! Many thanks in advance for any help.
Fortunately, Matlab has a function for that: bwareaopen
maskWithOnlyBigObjects = bwareaopen(mask, 30);
This will eliminate all connected groups of 1's that are smaller than 30 1's. Note that by default, bwareaopen uses 26-connectedness, i.e. two 1's belong together if one is among the 26 possible neighbors in 3D, even if only corners are touching. If two 1's can only belong together if the faces of the voxels touch, use
maskWithOnlyBigObjects = bwareaopen(mask, 30, conndef(6));

How can I generate a set of n dimensional vectors that contains all integer points in an n-dimensional rectangular prism

Okay, so I'm working on a problem related to quantum chaos and one of the things I need to do is to map the unit cube in n-dimensions to a parallelepiped in n-dimensions and find all integer points in the interior of this parallelepiped. I have been trying to do this using the following scheme:
Given the linear map B and the dimension of the cube n, we find the coordinates of the corners of the unit hypercube by converting numbers j from 0 to (2^n -1) into their binary representation and turning them into vectors that describe the vertices of the cube.
The next step was to apply the map B to each of these vectors, which gives me a set of 2^n vectors describing the coordinates of the vertices of the parallelepiped in n dimensions
Now, we take the maximum and minimum value attained by any of these vertices in each coordinate direction, i.e the first element of my vectors might have a maximum value of 4 across all of the vertices and a minimum value of -3 etc. This gives me an n-dimensional rectangular prism that contains my parallelepiped and some extra unwanted space.
I now find all points with integer coordinates in this bounding rectangular prism described as vectors in n dimensions
Finally, I apply the inverse of the map B to each of the points and throw away any points that have any coefficients greater than 1 as they must originally have lain outside my unit hypercube.
My issue arises in step 4, I'm struggling to come up with a way of generating all vectors with integer coordinates in my rectangular hyper-prism such that I can change the number of dimensions n on the fly. Ideally, i'd like to be able to increase n at will until it becomes too computationally heavy to do so, but every method of finding all integer points in the prism i've tried so far has relied on n for loops to permute each element and thus I need to rewrite the code every time.
So I guess my question is this, is there any way to code this up so that I can change n on the fly? Also, any thoughts on the idea of the algorithm itself would be appreciated :) It wouldn't surprise me if i've overcomplicated things massively...
EDIT:
Of course as soon as I post the question I see a lovely little link in the side-bar where a clever method has been given already for how to do this: Generate a matrix containing all combinations of elements taken from n vectors
I'll leave this up for the moment just in case anyone has any comments on the method in general, but otherwise (since I can't upvote yet I'll just say it here) Luis Mendo, you are a hero!

Using triplequad to calculate density (in Matlab)

As i've explained in a previous question: I have a dataset consisting of a large semi-random collection of points in three dimensional euclidian space. In this collection of points, i am trying to find the point that is closest to the area with the highest density of points.
As high performance mark answered;
the most straightforward thing to do would be to divide your subset of
Euclidean space into lots of little unit volumes (voxels) and count
how many points there are in each one. The voxel with the most points
is where the density of points is at its highest. Perhaps initially
dividing your space into 2 x 2 x 2 voxels, then choosing the voxel
with most points and sub-dividing that in turn until your criteria are
satisfied.
Mark suggested i use triplequad for this, but this is not a function i am familiar with, or understand very well. Does anyone have any pointers on how i could go about using this function in Matlab for what i am trying to do?
For example, say i have a random normally distributed matrix A = randn([300,300,300]), how could i use triplequad to find the point i am looking for? Because as i understand currently, i also have to provide triplequad with a function fun when using it. Which function should that be for this problem?
Here's an answer which doesn't use triplequad.
For the purposes of exposition I define an array of data like this:
A = rand([30,3])*10;
which gives me 30 points uniformly distributed in the box (0:10,0:10,0:10). Note that in this explanation a point in 3D space is represented by each row in A. Now define a 3D array for the counts of points in each voxel:
counts = zeros(10,10,10)
Here I've chosen to have a 10x10x10 array of voxels, but this is just for convenience, it would be only a little more difficult to have chosen some other number of voxels in each dimension, and there don't have to be the same number of voxels along each axis. Then the code
for ix = 1:size(A,1)
counts(ceil(A(ix,1)),ceil(A(ix,2)),ceil(A(ix,3))) = counts(ceil(A(ix,1)),ceil(A(ix,2)),ceil(A(ix,3)))+1
end
will count up the number of points in each of the voxels in counts.
EDIT
Unfortunately I have to do some work this afternoon and won't be able to get back to wrestling with the triplequad solution until later. Hope this is OK in the meantime.

Area of Union of n circles(MATLAB code)

I am trying to calculate the area of union of n circles in a plane when it is known that all circles are of equal radii and their centers are also known(of all n circles). I was trying to follow the set theory approach(inclusion-exclusion principle), where we know the formula for union of n sets. I was just using an operator Ar() which gives the area,i.e. Ar(A) gives me the area of A. I first tried to find out which circle is intersecting with which other circle(s) with the help of D<2R(D=dist between the centers of the two circles), then I was trying to calculate the area of intersection between them pairwise and hence find the area of union. But I am getting stuck for n>4. Can anyone provide a soln to this(soln by the set theory approach is necessary). Thanks in advance
If your problem was just for pairs of circles, you'll use the known result about Circle-Circle intersection areas. The formula for the pairwise area between any two circles, based on a standard parameterization of all circles involved, is given there. But as n gets large, the formulas for these areas are not commonly known. There might be a clever way to use recursion to compute the formulas for the intersection of two circles (n=2), the intersection of two asymmetric lens shapes (n=3), the intersection of two instances of whatever shape is the intersection of two asymmetric lens shapes (n=4), and so on. If you can derive formulas for those areas, you can always use inclusion-exclusion to figure out the intersection. The key insight is that the intersection of n instances of the previous shape is really the intersection of n-1 instances of intersections-of-previous-shape. But like the commenter above has said, that question really belongs at Math Overflow.
Practical Aside
For anyone reading who is interested in a practical way to solve this problem, Monte Carlo integration is an excellent choice. All you need to do is compute a large rectangle that bounds all of the circles, and then draw points uniformly in that rectangle. For each circle, check if the point is inside or outside. If it is ever inside, then increment a counter and break out of doing any more checks. At the end, the proportion of that counter to the total points drawn, multiplied by the area of the rectangle, will give the area.
If we assume that for each n-wise intersection area, we need to do n different O(1) steps (assuming we get an analytical formula that we can just plug the radii and center data straight into, which might be optimistic), then this analytical method is still O(n^2).
Monte Carlo is worse, O(Mn) where M is the number of points we draw, if we make the pessimistic assumption that we have to check against all n circles for every point. For moderate n, while M won't need to be intractably large, it certainly won't be close to n. However, we get the added benefit that our function automatically generalizes to the case of mixed radii (for which the general solution is much harder). From a practitioner's point of view, the analytical solution here is not very useful unless the circles barely overlap and the bounding rectangle contains a large amount of empty space.