How to generate a sparse version of a dense point cloud

How to generate a sparse version of a dense point cloud - matlab

I have a large file of 3D point cloud each line in the form of
x(1) y(1) z(1) g(1)
...
...
x(n) y(n) z(n) g(n)
now due to cpu power limitation I can not display all of the 3D points and would like to only select a sub set of it. say only one fifth of the points.
If I do the following
while (){
if x(i) % 5 == 0 keep the 3d point
}
the result gets zibera pattern, so it does not look nice. What algorithm do you suggest to select best candidates to form a subset of points to form a sub 3D point cloud which is most similar to the original dense point cloud?
Thank you
The language does not matter ( matlab, java c, etc) what matters is how we make a sparser version of the original one.

As an alternative to random sub-sampling mentioned in the first answer, you could try this:
Compute the bounding box the point cloud (axis-aligned or oriented bounding box),
Choose a cell size (the bounding box now contains W x H x D cells of this size),
Hash all the points of the point cloud to their respective cell grid and keep only N point(s) maximum per cell (N >= 1), or simpler, just drop or keep every Nth point.

Anything wrong with random subsampling?
f = fractionOfDataToKeep(); // between 0 (0%) and 1 (100%)
while() {
r = rng(1.0); // pseudorandom number between 0 and 1
if (r < f) keepThe3DPoint()
}

Related

3D matrix Indexing using 2D matrix

Could anyone shed some light on how this for loop can be replaced by a single command in MATLAB?
for i = 1 : size(w,3)
x=w(:,:,i);
w1(i,:)=x(B(i),:);
end
clear x
Here, w is a 3D (x by y by z) matrix and B (1 by z) is a vector containing rows pertaining to each layer in w. This for loop takes about 150 seconds to execute when w is 500000 layers deep. I tried using,
Q = w(B,:,:);
Q = reshape(Q(1,:),[500000,2])';
This creates a matrix Q of size 500000 X 2 X 500000 and MATLAB threw me an error saying memory out of bound. Any help would be appreciated!

You are creating intermediate variables (such as x) and using a for loop. The core idea of the following approach is to first pre-populate the indices used and then use linear indexing to access all the elements at once. Then, we can reshape to get the desired result.
ind = [B(1)*ones(size(w,2),1) (1:size(w,2)).' 1*ones(size(w,2),1)];
ind = [ind; [B(2)*ones(size(w,2),1) (1:size(w,2)).' 2*ones(size(w,2),1)]];
ind = [ind; [B(3)*ones(size(w,2),1) (1:size(w,2)).' 3*ones(size(w,2),1)]];
lin_ind = sub2ind(size(w), ind(:,1), ind(:,2), ind(:,3));
w1 = reshape(w(lin_ind),size(w,2),size(w,3)).'
On my system, this matches with w1 computed with the loop given in your question. Note that you may need to use a for loop to pre-populate the indices. I wrote three expressions since I was experimenting with small matrices. Actually, the first three lines can be written in such a way that you don't need loops at all and it still works with any size. I will leave that up to you.

How to remove data points from a data set in Matlab

In Matlab, I have a vector that is a 1x204 double. It represents a biological signal over a certain period of time and over that time the signal varies - sometimes it peaks and goes up and sometimes it remains relatively small, close to the baseline value of 0. I need to plot this the reciprocal of this data (on the xaxis) against another set of data (on the y-axis) in order to do some statistical analysis.
The problem is that due to those points close to 0, for e.g. the smallest point I have is = -0.00497, 1/0.00497 produces a value of -201 and turns into an "outlier", while the rest of the data is very different and the values not as large. So I am trying to remove the very small values close to 0, from the data set so that it does not affect 1/value.
I know that I can use the cftool to remove those points from the plot, but how do I get the vector with those points removed? Is there a way of actually removing the points? From the cftool and removing those points on the original, I was able to generate the code and find out which exact points they are, but I don't know how to create a vector with those points removed.
Can anyone help?
I did try using the following for loop to get it to remove values, with 'total_BOLD_time_course' being my signal and '1/total_BOLD_time_course' is what I want to plot, but the problem with this is that in my if statement total_BOLD_time_course(i) = 1, which is not exactly true - so by doing this the points still exist in the vector but are now taking the value 1. But I just want them to be gone from the vector.
for i = 1:204
if total_BOLD_time_course(i) < 0 && total_BOLD_time_course(i) < -0.01
total_BOLD_time_course(i) = 1;
else if total_BOLD_time_course(i) > 0 && total_BOLD_time_course(i) < 0.01
total_BOLD_time_course(i) = 1 ;
end
end
end

To remove points from an array, use the syntax
total_BOLD_time_course( abs(total_BOLD_time_course<0.01) ) = nan
that makes them 'blank' on the graph, and ignored by further calculations, but without destroying the temporal sequence of the datapoints.
If actually destroying timepoints is not a concern then do
total_BOLD_time_course( abs(total_BOLD_time_course<0.01) ) = []
Then there'll be fewer data points, and they won't map on to any other time_course you have. But the advantage is that it will "close up" the gaps in the graph.
--
PS
note that in your code, the phrase
x<0 && x<-0.01
is redundant because if any number is less than -0.01, it is automatically less than 0. I believe the first should be x>0, and then your code is fine.

As VHarisop suggests, you can set a threshold for outliers and exclude them. But, depending on your plot, it might be important to ensure that the remaining data are not shunted horizontally to fill the gaps. To plot 1./y as a function of x, you could either just plot(x, 1./y) and then set the y limits with ylim to exclude the outliers from view, or use NaNs:
e = 0.01
y( abs(y) < e ) = nan;
plot( x, 1./y )
For quantitative (non-visual) statistical analysis, either remove the values entirely from y as suggested—bearing in mind that this leaves you with a shorter vector—or use statistics functions that know how to treat NaNs as missing data (nanmean, nanstd, etc).

Yeah, you can. You might want to define a threshold, like e = 0.01, and cut off all vector elements whose absolute value is below e.
Example:
# assuming v is your initial vector
e = 0.01
new_vector = v(abs(v) > e);
Alternatively, you could use the excludedata tool from the Curve Fitting Toolbox, since you know the indices of the vector elements you want to exlude.

Modifying matrix values ± a specific index value - MATLAB

I am attempting to create a model whereby there is a line - represented as a 1D matrix populated with 1's - and points on the line are generated at random. Every time a point is chosen (A), it creates a 'zone of exclusion' (based on an exponential function) such that choosing another point nearby has a much lower probability of occurring.
Two main questions:
(1) What is the best way to generate an exponential such that I can multiply the numbers surrounding the chosen point to create the zone of exclusion? I know of exppdf however i'm not sure if this allows me to create an exponential which terminates at 1, as I need the zone of exclusion to end and the probability to return to 1 eventually.
(2) How can I modify matrix values plus/minus a specific index (including that index)? I got as far as:
x(1:100) = 1; % Creates a 1D-matrix populated with 1's
p = randi([1 100],1,1);
x(p) =
But am not sure how to go about using the randomly generated number to alter values in the matrix.
Any help would be much appreciated,
Anna

Don't worry about exppdf, pick the width you want (how far away from the selected point does the probability return to 1?) and define some simple function that makes a small vector with zero in the middle and 1 at the edges. So here I'm just modifying a section of length 11 centred on p and doing nothing to the rest of x:
x(1:100)=1;
p = randi([1 100],1,1);
% following just scaled
somedist = (abs(-5:5).^2)/25;
% note - this will fail if p is at edges of data, but see below
x(p-5:p+5)=x(p-5:p+5).*somedist;
Then, instead of using randi to pick points you can use datasample which allows for giving weights. In this case your "data" is just the numbers 1:100. However, to make edges easier I'd suggest initialising with a "weight" vector which has zero padding - these sections of x will not be sampled from but stop you from having to make edge checks.
x = zeros([1 110]);
x(6:105)=1;
somedist = (abs(-5:5).^2)/25;
nsamples = 10;
for n = 1:nsamples
p = datasample(1:110,1,'Weights',x);
% if required store chosen p somewhere
x(p-5:p+5)=x(p-5:p+5).*somedist;
end
For an exponential exclusion zone you could do something like:
somedist = exp(abs(-5:5))/exp(5)-exp(0)/exp(5);
It doesn't quite return to 1 but fairly close. Here's the central region of x (ignoring the padding) after two separate runs:

Creating image profiles in some parts of the image

I've been struggling with a problem for a while:) in Matlab.
I have an image (A.tif) in which I would like to find maxima (with defined threshold) but more specific coordinates of these maxima. My goal is to create short profiles on the image crossing these maxima (let say +- 20 pixels on both sides of the maximum)
I tried this:
[r c]=find(A==max(max(A)));
I suppose that r and c are coordinates of maximum (only one/first or every maximum?)
How can I implement these coordinates into ,for example improfile function?
I think it should be done using nested loops?
Thanks for every suggestion
Your code is working but it finds only global maximum coordinates.I would like to find multiple maxima (with defined threshold) and properly address its coordinates to create multiple profiles crossing every maximum found. I have little problem with improfile function :
improfile(IMAGE,[starting point],[ending point]) .
Lets say that I get [rows, columns] matrix with coordinates of each maximum and I'm trying to create one direction profile which starts in the same row where maximum is (about 20 pixels before max) and of course ends in the same row (also about 20 pixels from max) .
is this correct expression :improfile(IMAGE,[rows columns-20],[rows columns+20]); It plots something but it seems to only joins maxima rather than making intensity profiles

You're not giving enough information so I had to guess a few things. You should apply the max() to the vectorized image and store the index:
[~,idx] = max(I(:))
Then transform this into x and y coordinates:
[ix,iy] = ind2sub(size(I),idx)
This is your x and y of the maximum of the image. It really depends what profile section you want. Something like this is working:
I = imread('peppers.png');
Ir = I(:,:,1);
[~,idx]=max(Ir(:))
[ix,iy]=ind2sub(size(Ir),idx)
improfile(Ir,[0 ix],[iy iy])
EDIT:
If you want to instead find the k largest values and not just the maximum you can do an easy sort:
[~,idx] = sort(I(:),'descend');
idxk = idx(1:k);
[ix,iy] = ind2sub(size(I),idxk)
Please delete your "reply" and instead edit your original post where you define your problem better

How to generate random matlab vector with these constraints

I'm having trouble creating a random vector V in Matlab subject to the following set of constraints: (given parameters N,D, L, and theta)
The vector V must be N units long
The elements must have an average of theta
No 2 successive elements may differ by more than +/-10
D == sum(L*cosd(V-theta))
I'm having the most problems with the last one. Any ideas?
Edit
Solutions in other languages or equation form are equally acceptable. Matlab is just a convenient prototyping tool for me, but the final algorithm will be in java.
Edit
From the comments and initial answers I want to add some clarifications and initial thoughts.
I am not seeking a 'truly random' solution from any standard distribution. I want a pseudo randomly generated sequence of values that satisfy the constraints given a parameter set.
The system I'm trying to approximate is a chain of N links of link length L where the end of the chain is D away from the other end in the direction of theta.
My initial insight here is that theta can be removed from consideration until the end, since (2) in essence adds theta to every element of a 0 mean vector V (shifting the mean to theta) and (4) simply removes that mean again. So, if you can find a solution for theta=0, the problem is solved for all theta.
As requested, here is a reasonable range of parameters (not hard constraints, but typical values):
5<N<200
3<D<150
L==1
0 < theta < 360

I would start by creating a "valid" vector. That should be possible - say calculate it for every entry to have the same value.
Once you got that vector I would apply some transformations to "shuffle" it. "Rejection sampling" is the keyword - if the shuffle would violate one of your rules you just don't do it.
As transformations I come up with:
switch two entries
modify the value of one entry and modify a second one to keep the 4th condition (Theoretically you could just shuffle two till the condition is fulfilled - but the chance that happens is quite low)
But maybe you can find some more.
Do this reasonable often and you get a "valid" random vector. Theoretically you should be able to get all valid vectors - practically you could try to construct several "start" vectors so it won't take that long.

Here's a way of doing it. It is clear that not all combinations of theta, N, L and D are valid. It is also clear that you're trying to simulate random objects that are quite complex. You will probably have a hard time showing anything useful with respect to these vectors.
The series you're trying to simulate seems similar to the Wiener process. So I started with that, you can start with anything that is random yet reasonable. I then use that as a starting point for an optimization that tries to satisfy 2,3 and 4. The closer your initial value to a valid vector (satisfying all your conditions) the better the convergence.
function series = generate_series(D, L, N,theta)
s(1) = theta;
for i=2:N,
s(i) = s(i-1) + randn(1,1);
end
f = #(x)objective(x,D,L,N,theta)
q = optimset('Display','iter','TolFun',1e-10,'MaxFunEvals',Inf,'MaxIter',Inf)
[sf,val] = fminunc(f,s,q);
val
series = sf;
function value= objective(s,D,L,N,theta)
a = abs(mean(s)-theta);
b = abs(D-sum(L*cos(s-theta)));
c = 0;
for i=2:N,
u =abs(s(i)-s(i-1)) ;
if u>10,
c = c + u;
end
end
value = a^2 + b^2+ c^2;
It seems like you're trying to simulate something very complex/strange (a path of a given curvature?), see questions by other commenters. Still you will have to use your domain knowledge to connect D and L with a reasonable mu and sigma for the Wiener to act as initialization.

So based on your new requirements, it seems like what you're actually looking for is an ordered list of random angles, with a maximum change in angle of 10 degrees (which I first convert to radians), such that the distance and direction from start to end and link length and number of links are specified?
Simulate an initial guess. It will not hold with the D and theta constraints (i.e. specified D and specified theta)
angles = zeros(N, 1)
for link = 2:N
angles (link) = theta(link - 1) + (rand() - 0.5)*(10*pi/180)
end
Use genetic algorithm (or another optimization) to adjust the angles based on the following cost function:
dx = sum(L*cos(angle));
dy = sum(L*sin(angle));
D = sqrt(dx^2 + dy^2);
theta = atan2(dy/dx);
the cost is now just the difference between the vector given by my D and theta above and the vector given by the specified D and theta (i.e. the inputs).
You will still have to enforce the max change of 10 degrees rule, perhaps that should just make the cost function enormous if it is violated? Perhaps there is a cleaner way to specify sequence constraints in optimization algorithms (I don't know how).
I feel like if you can find the right optimization with the right parameters this should be able to simulate your problem.

You don't give us a lot of detail to work with, so I'll assume the following:
random numbers are to be drawn from [-127+theta +127-theta]
all random numbers will be drawn from a uniform distribution
all random numbers will be of type int8
Then, for the first 3 requirements, you can use this:
N = 1e4;
theta = 40;
diffVal = 10;
g = #() randi([intmin('int8')+theta intmax('int8')-theta], 'int8') + theta;
V = [g(); zeros(N-1,1, 'int8')];
for ii = 2:N
V(ii) = g();
while abs(V(ii)-V(ii-1)) >= diffVal
V(ii) = g();
end
end
inline the anonymous function for more speed.
Now, the last requirement,
D == sum(L*cos(V-theta))
is a bit of a strange one...cos(V-theta) is a specific way to re-scale the data to the [-1 +1] interval, which the multiplication with L will then scale to [-L +L]. On first sight, you'd expect the sum to average out to 0.
However, the expected value of cos(x) when x is a random variable from a uniform distribution in [0 2*pi] is 2/pi (see here for example). Ignoring for the moment the fact that our limits are different from [0 2*pi], the expected value of sum(L*cos(V-theta)) would simply reduce to the constant value of 2*N*L/pi.
How you can force this to equal some other constant D is beyond me...can you perhaps elaborate on that a bit more?