generate 3-d random points with minimum distance between each of them? - matlab

there.
I am going to generate 10^6 random points in matlab with this particular characters.
the points should be inside a sphere with radious 25, the are 3-D so we have x, y, z or r, theta, phi.
there is a minimum distance between each points.
first, i decided to generate points and then check the distances, then omit points with do not have these condition. but, it may omit many of points.
another way is to use RSA (Random Sequential Addition), it means generate points one by one with this minimum distance between them. for example generate first point, then generate second randomly out of the minimum distance from point 1. and go on till achieving 10^6 points.
but it takes lots of time and i can not reach 10^6 points, since the speed of searching appropriate position for new points will take long time.
Right now I am using this program:
Nmax=10000;
R=25;
P=rand(1,3);
k=1;
while k<Nmax
theta=2*pi*rand(1);
phi=pi*rand(1);
r = R*sqrt(rand(1));
% convert to cartesian
x=r.*sin(theta).*cos(phi);
y=r.*sin(theta).*sin(phi);
z=r.*cos(theta);
P1=[x y z];
r=sqrt((x-0)^2+(y-0)^2+(z-0)^2);
D = pdist2(P1,P,'euclidean');
% euclidean distance
if D>0.146*r^(2/3)
P=[P;P1];
k=k+1;
end
i=i+1;
end
x=P(:,1);y=P(:,2);z=P(:,3); plot3(x,y,z,'.');
How can I efficiently generate points by these condition?
thank you.

I took a closer look at your algorithm, and concluded there is NO WAY it will ever work - at least not if you really want to get a million points in that sphere. There is a simple picture that explains why not - this is a plot of the number of points that you need to test (using your technique of RSA) to get one additional "good" point. As you can see, this goes asymptotic at just a few thousand points (I ran a slightly faster algorithm against 200k points to produce this):
I don't know if you ever tried to compute the theoretical number of points you could fit in your sphere when you have them perfectly arranged, but I'm beginning to suspect the number is a good deal smaller than 1E6.
The complete code I used to investigate this, plus the output it generated, can be found here. I never got as far as the technique I described in my earlier answer... there was just too much else going on in the setup you described.
EDIT:
I started to think it might not be possible, even with "perfect" arrangement, to get to 1M points. I made a simple model for myself as follows:
Imagine you start on the "outer shell" (r=25), and try to fit points at equal distances. If you divide the area of the "shell" by the area of one "exclusion disk" (of radius r_sub_crit), you get a (high) estimate of the number of points at that distance:
numpoints = 4*pi*r^2 / (pi*(0.146 * r^(2/3))^2) ~ 188 * r^(2/3)
The next "shell" in should be at a radius that is 0.146*r^(2/3) less - but if you think of the points as being very carefully arranged, you might be able to get a tiny bit closer. Again, let's be generous and say the shells can be just 1/sqrt(3) closer than the criteria. You can then start at the outer shell and work your way in, using a simple python script:
import scipy as sc
r = 25
npts = 0
def rc(r):
return 0.146*sc.power(r, 2./3.)
while (r > rc(r)):
morePts = sc.floor(4/(0.146*0.146)*sc.power(r, 2./3.))
npts = npts + morePts
print morePts, ' more points at r = ', r
r = r - rc(r)/sc.sqrt(3)
print 'total number of points fitted in sphere: ', npts
The output of this is:
1604.0 more points at r = 25
1573.0 more points at r = 24.2793037966
1542.0 more points at r = 23.5725257555
1512.0 more points at r = 22.8795314897
1482.0 more points at r = 22.2001865995
1452.0 more points at r = 21.5343566722
1422.0 more points at r = 20.8819072818
1393.0 more points at r = 20.2427039885
1364.0 more points at r = 19.6166123391
1336.0 more points at r = 19.0034978659
1308.0 more points at r = 18.4032260869
1280.0 more points at r = 17.8156625053
1252.0 more points at r = 17.2406726094
1224.0 more points at r = 16.6781218719
1197.0 more points at r = 16.1278757499
1171.0 more points at r = 15.5897996844
1144.0 more points at r = 15.0637590998
1118.0 more points at r = 14.549619404
1092.0 more points at r = 14.0472459873
1066.0 more points at r = 13.5565042228
1041.0 more points at r = 13.0772594652
1016.0 more points at r = 12.6093770509
991.0 more points at r = 12.1527222975
967.0 more points at r = 11.707160503
943.0 more points at r = 11.2725569457
919.0 more points at r = 10.8487768835
896.0 more points at r = 10.4356855535
872.0 more points at r = 10.0331481711
850.0 more points at r = 9.64102993012
827.0 more points at r = 9.25919600154
805.0 more points at r = 8.88751153329
783.0 more points at r = 8.52584164948
761.0 more points at r = 8.17405144976
740.0 more points at r = 7.83200600865
718.0 more points at r = 7.49957037478
698.0 more points at r = 7.17660957023
677.0 more points at r = 6.86298858965
657.0 more points at r = 6.55857239952
637.0 more points at r = 6.26322593726
618.0 more points at r = 5.97681411037
598.0 more points at r = 5.69920179546
579.0 more points at r = 5.43025383729
561.0 more points at r = 5.16983504778
542.0 more points at r = 4.91781020487
524.0 more points at r = 4.67404405146
506.0 more points at r = 4.43840129415
489.0 more points at r = 4.21074660206
472.0 more points at r = 3.9909446055
455.0 more points at r = 3.77885989456
438.0 more points at r = 3.57435701766
422.0 more points at r = 3.37730048004
406.0 more points at r = 3.1875547421
390.0 more points at r = 3.00498421767
375.0 more points at r = 2.82945327223
360.0 more points at r = 2.66082622092
345.0 more points at r = 2.49896732654
331.0 more points at r = 2.34374079733
316.0 more points at r = 2.19501078464
303.0 more points at r = 2.05264138052
289.0 more points at r = 1.91649661498
276.0 more points at r = 1.78644045325
263.0 more points at r = 1.66233679273
250.0 more points at r = 1.54404945973
238.0 more points at r = 1.43144220603
226.0 more points at r = 1.32437870508
214.0 more points at r = 1.22272254805
203.0 more points at r = 1.1263372394
192.0 more points at r = 1.03508619218
181.0 more points at r = 0.94883272297
170.0 more points at r = 0.867440046252
160.0 more points at r = 0.790771268402
150.0 more points at r = 0.718689381062
140.0 more points at r = 0.65105725389
131.0 more points at r = 0.587737626612
122.0 more points at r = 0.528593100237
113.0 more points at r = 0.473486127367
105.0 more points at r = 0.422279001431
97.0 more points at r = 0.374833844693
89.0 more points at r = 0.331012594847
82.0 more points at r = 0.290676989951
75.0 more points at r = 0.253688551418
68.0 more points at r = 0.219908564725
61.0 more points at r = 0.189198057381
55.0 more points at r = 0.161417773651
49.0 more points at r = 0.136428145311
44.0 more points at r = 0.114089257597
38.0 more points at r = 0.0942608092113
33.0 more points at r = 0.0768020649149
29.0 more points at r = 0.0615717987589
24.0 more points at r = 0.0484282253244
20.0 more points at r = 0.0372289153633
17.0 more points at r = 0.0278306908104
13.0 more points at r = 0.0200894920319
10.0 more points at r = 0.013860207063
8.0 more points at r = 0.00899644813842
5.0 more points at r = 0.00535025545232
total number of points fitted in sphere: 55600.0
This seems to confirm that you really can't get to a million, no matter how you try...

There are many things you could do to improve your program - both algorithm, and code.
On the code side, one of the things that is REALLY slowing you down is the fact that not only you use a for loop (which is slow), but in the line
P = [P;P1];
you append elements to an array. Every time that happens, Matlab needs to find a new place to put the data, copying all the points in the process. This quickly becomes very slow. Preallocating the array with
P = zeros(1000000, 3);
keeping track of the number N of points you have found so far, and changing your calculation of distance to
D = pdist2(P1, P(1:N, :), 'euclidean');
would at least address that...
The other issue is that you check new points against all previously found points - so when you have 100 points, you check about 100x100, for 1000 it is 1000x1000. You can see then that this algorithm is O(N^3) at least... not counting the fact that you will get more "misses" as the density goes up. A O(N^3) algorithm with N=10E6 takes at least 10E18 cycles; if you had a 4 GHz machine with one clock cycle per comparison, you would need 2.5E9 seconds = approximately 80 years. You can try parallel processing, but that's just brute force - who wants that?
I recommend that you think about breaking the problem into smaller pieces (quite literally): for example, if you divide your sphere into little boxes that are about the size of your maximum distance, and for each box you keep track of what points are in it, then you only need to check against points in "this" box and its immediate neighbors - 27 boxes in all. If your boxes are 2.5 mm across, you would have 100x100x100 = 1M boxes. That seems like a lot, but now your computation time will be reduced drastically, as you will have (by the end of the algorithm) only 1 point on average per box... Of course with the distance criterion you are using, you will have more points near the center, but that's a detail.
The data structure you would need would be a cell array of 100x100x100, and each cell contains the index of the good points found so far "in that cell". The problem with a cell array is that it doesn't lend itself to vectorization. If instead you have the memory, you could assign it as a 4D array of 10x100x100x100, assuming you will have no more than 10 points per cell (if you do, you will have to handle that separately; work with me here...). Use an index of -1 for points not yet found
Then your check would be something like this:
% initializing:
bigList = zeros(10,102,102,102)-1; % avoid hitting the edge...
NPlist = zeros(102, 102, 102); % track # valid points in each box
bottomcorner = [-25.5, -25.5, -25.5]; % boxes span from -25.5 to +25.5
cellSize = 0.5;
.
% in your loop:
P1= [x, y, z];
cellCoords = ceil(P1/cellSize);
goodFlag = true;
pointsSoFar = bigList(:, cellCoords(1)+(-1:1), cellCoords(2)+(-1:1), cellCoords(3)+(-1:1));
pointsToCheck = find(pointsSoFar>0); % this is where the big gains come...
r=sum(P1.^2);
D = pdist2(P1,P(pointsToCheck, :),'euclidean'); % euclidean distance
if D>0.146*r^(2/3)
P(k,:) = P1;
% check the syntax of this line...
cci = ind2sub([102 102 102], cellCoords(1), cellCoords(2), cellCoords(3));
NP(cci)=NP(cci)+1; % increasing number of points in this box
% you want to handle the case where this > 10!!!
bigList(NP(cci), cci) = k;
k=k+1;
end
....
I don't know if you can take it from here; if you can't, say so in the notes and I may have some time this weekend to code this up in more detail. There are ways to speed it up more with some vectorization, but it quickly becomes hard to manage.
I think that putting a larger number of points randomly in space, and then using the above for a giant vectorized culling, may be the way to go. But I recommend to take little steps first... if you can get the above to work at all well, you can then optimize further (array size, etc).

I found the reference - "Simulated Brain Tumor Growth Dynamics Using a Three-Dimensional Cellular Automaton", Ansal et al (2000).
I agree it is puzzling - until you realize one important thing. They are reporting their results in mm, but your code was written in cm. While that may seem insignificant, the formula for "critical radius", rc = 0.146r^(2/3) includes a constant, 0.146, that is dimensional - the dimensions are mm^(1/3), not cm^(1/3).
When I make that change in my python code to evaluate the number of possible lattice sites, it jumps by a factor 10. Now they claimed that they were using a "jamming limit" of 0.38 - the number where you really cannot find any more sites. If you include that limit, I predict no more than 200k points could be found - still short of their 1.5M, but not quite so crazy.
You might consider contacting the authors to discuss this with them? If you want to include me in the conversation, you can email me at: SO (just two letters) at my handle name dot united states. Same domain as where I posted links above...

Related

Finding the longest linear section of non-linear plot in MATLAB

Apologies for the long post but this takes a bit to explain. I'm trying to make a script that finds the longest linear portion of a plot. Sample data is in a csv file here, it is stress and strain data for calculating the shear modulus of 3D printed samples. The code I have so far is the following:
x_data = [];
y_data = [];
x_data = Data(:,1);
y_data = Data(:,2);
plot(x_data,y_data);
grid on;
answer1 = questdlg('Would you like to load last attempt''s numbers?');
switch answer1
case 'Yes'
[sim_slopes,reg_data] = regr_and_longest_part(new_x_data,new_y_data,str2num(answer2{3}),str2num(answer2{2}),K);
case 'No'
disp('Take a look at the plot, find a range estimate, and press any button to continue');
pause;
prompt = {'Eliminate values ABOVE this x-value:','Eliminate values BELOW this x-value:','Size of divisions on x-axis:','Factor for similarity of slopes:'};
dlg_title = 'Point elimination';
num_lines = 1;
defaultans = {'0','0','0','0.1'};
if isempty(answer2) < 1
defaultans = {answer2{1},answer2{2},answer2{3},answer2{4}};
end
answer2 = inputdlg(prompt,dlg_title,num_lines,defaultans);
uv_of_x_range = str2num(answer2{1});
lv_of_x_range = str2num(answer2{2});
x_div_size = str2num(answer2{3});
K = str2num(answer2{4});
close all;
iB = find(x_data > str2num(answer2{1}),1,'first');
iS = find(x_data > str2num(answer2{2}),1,'first');
new_x_data = x_data(iS:iB);
new_y_data = y_data(iS:iB);
[sim_slopes, reg_data] = regr_and_longest_part(new_x_data,new_y_data,str2num(answer2{3}),str2num(answer2{2}),K);
end
[longest_section0, Midx]= max(sim_slopes(:,4)-sim_slopes(:,3));
longest_section=1+longest_section0;
long_sec_x_data_start = x_div_size*(sim_slopes(Midx,3)-1)+lv_of_x_range;
long_sec_x_data_end = x_div_size*(sim_slopes(Midx,4)-1)+lv_of_x_range;
long_sec_x_data_start_idx=find(new_x_data >= long_sec_x_data_start,1,'first');
long_sec_x_data_end_idx=find(new_x_data >= long_sec_x_data_end,1,'first');
long_sec_x_data = new_x_data(long_sec_x_data_start_idx:long_sec_x_data_end_idx);
long_sec_y_data = new_y_data(long_sec_x_data_start_idx:long_sec_x_data_end_idx);
[b_long_sec, longes_section_reg_data] = robustfit(long_sec_x_data,long_sec_y_data);
plot(long_sec_x_data,b_long_sec(1)+b_long_sec(2)*long_sec_x_data,'LineWidth',3,'LineStyle',':','Color','k');
function [sim_slopes,reg_data] = regr_and_longest_part(x_points,y_points,x_div,lv,K)
reg_data = cell(1,3);
scatter(x_points,y_points,'.');
grid on;
hold on;
uv = lv+x_div;
ii=0;
while lv <= x_points(end)
if uv > x_points(end)
uv = x_points(end);
end
ii=ii+1;
indices = find(x_points>lv & x_points<uv);
temp_x_points = x_points((indices));
temp_y_points = y_points((indices));
if length(temp_x_points) <= 2
break;
end
[b,stats] = robustfit(temp_x_points,temp_y_points);
reg_data{ii,1} = b(1);
reg_data{ii,2} = b(2);
reg_data{ii,3} = length(indices);
plot(temp_x_points,b(1)+b(2)*temp_x_points,'LineWidth',2);
lv = lv+x_div;
uv = lv+x_div;
end
sim_slopes = NaN(length(reg_data),4);
sim_slopes(1,:) = [reg_data{1,1},0,1,1];
idx=1;
for ii=2:length(reg_data)
coff =sim_slopes(idx,1);
if abs(reg_data{ii,1}-coff) <= K*coff
C=zeros(ii-sim_slopes(idx,3)+1,1);
for kk=sim_slopes(idx,3):ii
C(kk)=reg_data{kk,1};
end
sim_slopes(idx,1)=mean(C);
sim_slopes(idx,2)=std(C);
sim_slopes(idx,4)=ii;
else
idx = idx + 1;
sim_slopes(idx,1)=reg_data{ii,1};
sim_slopes(idx,2)=0;
sim_slopes(idx,3)=ii;
sim_slopes(idx,4)=ii;
end
end
end
Apologies for the code not being well optimized, I'm still relatively new to MATLAB. I did not use derivatives because my data is relatively noisy and derivation might have made it worse.
I've managed to get the get the code to find the longest straight part of the plot by splitting the data up into sections called x_div_size then performing a robustfit on each section, the results of which are written into reg_data. The code then runs through reg_data and finds which lines have the most similar slopes, determined by the K factor, by calculating the average of the slopes in a section of the plot and makes a note of it in sim_slopes. It then finds the longest interval with max(sim_slopes(:,4)-sim_slopes(:,3)) and performs a regression on it to give the final answer.
The problem is that it will only consider the first straight portion that it comes across. When the data is plotted, it has a few parts where it seems straightest:
As an example, when I run the script with answer2 = {'0.2','0','0.0038','0.3'} I get the following, where the black line is the straightest part found by the code:
I have the following questions:
It's clear that from about x = 0.04 to x = 0.2 there is a long straight part and I'm not sure why the script is not finding it. Playing around with different values the script always seems to pick the first longest straight part, ignoring subsequent ones.
MATLAB complains that Warning: Iteration limit reached. because there are more than 50 regressions to perform. Is there a way to bypass this limit on robustfit?
When generating sim_slopes there might be section of the plot whose slope is too different from the average of the previous slopes so it gets marked as the end of a long section. But that section sometimes is sandwiched between several other sections on either side which instead have similar slopes. How would it be possible to tell the script to ignore one wayward section and to continue as if it falls within the tolerance allowed by the K value?
Take a look at the Douglas-Peucker algorithm. If you think of your (x,y) values as the vertices of an (open) polygon, this algorithm will simplify it for you, such that the largest distance from the simplified polygon to the original is smaller than some threshold you can choose. The simplified polygon will be the set of straight lines. Find the two vertices that are furthest apart, and you're done.
MATLAB has an implementation in the Mapping Toolbox called reducem. You might also find an implementation on the File Exchange (but be careful, there is also really bad code on there). Or, you can roll your own, it's quite a simple algorithm.
You can also try using the ischange function to detect changes in the intercept and slope of the data, and then extract the longest portion from that.
Using the sample data you provided, here is what I see from a basic attempt:
>> T = readtable('Data.csv');
>> T = rmmissing(T); % Remove rows with NaN
>> T = groupsummary(T,'Var1','mean'); % Average duplicate timestamps
>> [tf,slopes,intercepts] = ischange(T.mean_Var2, 'linear', 'SamplePoints', T.Var1); % find changes
>> plot(T.Var1, T.mean_Var2, T.Var1, slopes.*T.Var1 + intercepts)
which generates the plot
You should be able to extract the longest segment based on the indices given by find(tf).
You can also tune the parameters of ischange to get fewer or more segments. Adding the name-value pair 'MaxNumChanges' with a value of 4 or 5 produces more linear segments with a tighter fit to the curve, for example, which effectively removes the kink in the plot that you see.

Approximation of cosh and sinh functions that give large values in MATLAB

My calculation involves cosh(x) and sinh(x) when x is around 700 - 1000 which reaches MATLAB's limit and the result is NaN. The problem in the code is elastic_restor_coeff rises when radius is small (below 5e-9 in the code). My goal is to do another integral over a radius distribution from 1e-9 to 100e-9 which is still a work in progress because I get stuck at this problem.
My work around solution right now is to approximate the real part of chi_para with a step function when threshold2 hits a value of about 300. The number 300 is obtained from using the lowest possible value of radius and look at the cut-off value from the plot. I think this approach is not good enough for actual calculation since this value changes with radius so I am looking for a better approximation method. Also, the imaginary part of chi_para is difficult to approximate since it looks like a pulse instead of a step.
Here is my code without an integration over a radius distribution.
k_B = 1.38e-23;
T = 296;
radius = [5e-9,10e-9, 20e-9, 30e-9,100e-9];
fric_coeff = 8*pi*1e-3.*radius.^3;
elastic_restor_coeff = 8*pi*1.*radius.^3;
time_const = fric_coeff/elastic_restor_coeff;
omega_ar = logspace(-6,6,60);
chi_para = zeros(1,length(omega_ar));
chi_perpen = zeros(1,length(omega_ar));
threshold = zeros(1,length(omega_ar));
threshold2 = zeros(1,length(omega_ar));
for i = 1:length(radius)
for k = 1:length(omega_ar)
omega = omega_ar(k);
fric_coeff = 8*pi*1e-3.*radius(i).^3;
elastic_restor_coeff = 8*pi*1.*radius(i).^3;
time_const = fric_coeff/elastic_restor_coeff;
G_para_func = #(t) ((cosh(2*k_B*T./elastic_restor_coeff.*exp(-t./time_const))-1).*exp(1i.*omega.*t))./(cosh(2*k_B*T./elastic_restor_coeff)-1);
G_perpen_func = #(t) ((sinh(2*k_B*T./elastic_restor_coeff.*exp(-t./time_const))).*exp(1i.*omega.*t))./(sinh(2*k_B*T./elastic_restor_coeff));
chi_para(k) = (1 + 1i*omega*integral(G_para_func, 0, inf));
chi_perpen(k) = (1 + 1i*omega*integral(G_perpen_func, 0, inf));
threshold(k) = 2*k_B*T./elastic_restor_coeff*omega;
threshold2(k) = 2*k_B*T./elastic_restor_coeff*(omega*time_const - 1);
end
figure(1);
semilogx(omega_ar,real(chi_para),omega_ar,imag(chi_para));
hold on;
figure(2);
semilogx(omega_ar,real(chi_perpen),omega_ar,imag(chi_perpen));
hold on;
end
Here is the simplified function that I would like to approximate:
where x is iterated in a loop and the maximum value of x is about 700.

Summing Values based on Area in Matlab

Im trying to write a code in Matlab to calculate an area of influence type question. This is an exert from my data (Weighting, x-coord, y-coord):
M =
15072.00 486.00 -292
13269.00 486.00 -292
12843.00 414.00 -267
10969.00 496.00 -287
9907.00 411.00 -274
9718.00 440.00 -265
9233.00 446.00 -253
9138.00 462.00 -275
8830.00 496.00 -257
8632.00 432.00 -253
R =
-13891.00 452.00 -398
-13471.00 461.00 -356
-12035.00 492.00 -329
-11309.00 413.00 -353
-11079.00 467.00 -375
-10659.00 493.00 -333
-10643.00 495.00 -338
-10121.00 455.00 -346
-9795.00 456.00 -367
-8927.00 485.00 -361
-8765.00 467.00 -351
I want to make a function to calculate the sum of the weightings at any given position based on a circle of influence of 30 for each coordinate.
I have thought of using a for loop to calculate each point independently and summing the result but seems unnecessarily complicated and inefficient.
I also thought of assigning an intensity of color to each circle and overlaying them but I dont know how to change color intensity based on value here is my attempt so far (I would like to have a visual of the result):
function [] = Influence()
M = xlsread('MR.xlsx','A4:C310');
R = xlsread('MR.xlsx','E4:G368');
%these are my values around 300 coordinates
%M are negative values and R positive, I want to see which are dominant in their regions
hold on
scatter(M(:,2),M(:,3),3000,'b','filled')
scatter(R(:,2),R(:,3),3000,'y','filled')
axis([350 650 -450 -200])
hold off
end
%had to use a scalar of 3000 for some reason as it isnt correlated to the graph size
I'd appreciate any ideas/solutions thank you
This is the same but with ca. 2000 data points
How about this:
r_influence = 30; % radius of influence
r = #(p,A) sqrt((p(1)-A(:,2)).^2 + (p(2)-A(:,3)).^2); % distance
wsum = #(p,A) sum(A(find(r(p,A)<=r_influence),1)); % sum where distance less than roi
% compute sum on a grid
xrange = linspace(350,550,201);
yrange = linspace(-200,-450,201);
[XY,YX] = meshgrid(xrange,yrange);
map_M = arrayfun(#(p1,p2) wsum([p1,p2],M),XY,YX);
map_R = arrayfun(#(p1,p2) wsum([p1,p2],R),XY,YX);
figure(1);
clf;
imagesc(xrange,yrange,map_M + map_R);
colorbar;
Gives a picture like this:
Is that what you are looking for?

Reverse-calculating original data from a known moving average

I'm trying to estimate the (unknown) original datapoints that went into calculating a (known) moving average. However, I do know some of the original datapoints, and I'm not sure how to use that information.
I am using the method given in the answers here: https://stats.stackexchange.com/questions/67907/extract-data-points-from-moving-average, but in MATLAB (my code below). This method works quite well for large numbers of data points (>1000), but less well with fewer data points, as you'd expect.
window = 3;
datapoints = 150;
data = 3*rand(1,datapoints)+50;
moving_averages = [];
for i = window:size(data,2)
moving_averages(i) = mean(data(i+1-window:i));
end
length = size(moving_averages,2)+(window-1);
a = (tril(ones(length,length),window-1) - tril(ones(length,length),-1))/window;
a = a(1:length-(window-1),:);
ai = pinv(a);
daily = mtimes(ai,moving_averages');
x = 1:size(data,2);
figure(1)
hold on
plot(x,data,'Color','b');
plot(x(window:end),moving_averages(window:end),'Linewidth',2,'Color','r');
plot(x,daily(window:end),'Color','g');
hold off
axis([0 size(x,2) min(daily(window:end))-1 max(daily(window:end))+1])
legend('original data','moving average','back-calculated')
Now, say I know a smattering of the original data points. I'm having trouble figuring how might I use that information to more accurately calculate the rest. Thank you for any assistance.
You should be able to calculate the original data exactly if you at any time can exactly determine one window's worth of data, i.e. in this case n-1 samples in a window of length n. (In your case) if you know A,B and (A+B+C)/3, you can solve now and know C. Now when you have (B+C+D)/3 (your moving average) you can exactly solve for D. Rinse and repeat. This logic works going backwards too.
Here is an example with the same idea:
% the actual vector of values
a = cumsum(rand(150,1) - 0.5);
% compute moving average
win = 3; % sliding window length
idx = hankel(1:win, win:numel(a));
m = mean(a(idx));
% coefficient matrix: m(i) = sum(a(i:i+win-1))/win
A = repmat([ones(1,win) zeros(1,numel(a)-win)], numel(a)-win+1, 1);
for i=2:size(A,1)
A(i,:) = circshift(A(i-1,:), [0 1]);
end
A = A / win;
% solve linear system
%x = A \ m(:);
x = pinv(A) * m(:);
% plot and compare
subplot(211), plot(1:numel(a),a, 1:numel(m),m)
legend({'original','moving average'})
title(sprintf('length = %d, window = %d',numel(a),win))
subplot(212), plot(1:numel(a),a, 1:numel(a),x)
legend({'original','reconstructed'})
title(sprintf('error = %f',norm(x(:)-a(:))))
You can see the reconstruction error is very small, even using the data sizes in your example (150 samples with a 3-samples moving average).

Matlab Code to distribute points on plot

I have edited a code that i found online that helps me draw points somehow distributed on a graph based on the minimum distance between them
This is the code that i have so far
x(1)=rand(1)*1000; %Random coordinates of the first point
y(1)=rand(1)*1000;
minAllowableDistance = 30; %IF THIS IS TOO BIG, THE LOOP DOES NOT END
numberOfPoints = 300; % Number of points equivalent to the number of sites
keeperX = x(1); % Initialize first point
keeperY = y(1);
counter = 2;
for k = 2 : numberOfPoints %Dropping another point, and checking if it can be positioned
done=0;
trial_counter=1;
while (done~=1)
x(k)=rand(1)*1000;
y(k)=rand(1)*1000;
thisX = x(k); % Get a trial point.
thisY = y(k);
% See how far is is away from existing keeper points.
distances = sqrt((thisX-keeperX).^2 + (thisY - keeperY).^2);
minDistance = min(distances);
if minDistance >= minAllowableDistance
keeperX(k) = thisX;
keeperY(k) = thisY;
done=1;
trial_counter=trial_counter+1;
counter = counter + 1;
end
if (trial_counter>2)
done=1;
end
end
end
end
So this code is working fine, but sometimes matlab is freezing if the points are above 600. The problem is full , and no more points are added so matlab is doing the work over and over. So i need to find a way when the trial_counter is larger than 2, for the point to find a space that is empty and settle there.
The trial_counter is used to drop a point if it doesn't fit on the third time.
Thank you
Since trial_counter=trial_counter+1; is only called inside if minDistance >= minAllowableDistance, you will easily enter an infinite loop if minDistance < minAllowableDistance (e.g. if your existing points are quite closely packed).
How you do this depends on what your limitations are, but if you're looking at integer points in a set range, one possibility is to keep the points as a binary image, and use bwdist to work out the distance transform, then pick an acceptable point. So each iteration would be (where BW is your stored "image"/2D binary matrix where 1 is the selected points):
D = bwdist(BW);
maybe_points = find(D>minAllowableDistance); % list of possible locations
n = randi(length(maybe_points)); % pick one location
BW(maybe_points(n))=1; % add it to your matrix
(then add some checking such that if you can't find any allowable points the loop quits)