Octave: how can these FOR loops be vectorized? - matlab

I am writing an Octave script to calculate the price of an European option.
The first part uses Monte Carlo to simulate the underlying asset price over n number of time periods. This is repeated nIter number of times.
Octave makes it very easy to setup initial matrices. But I haven't found the way to complete the task in a vectorized way, avoiding FOR loops:
%% Octave simplifies creation of 'e', 'dlns', and 'Prices'
e = norminv(rand(nIter,n));
dlns = cat(2, ones(nIter,1), exp((adj_r+0.5*sigma^2)*dt+sigma*e.*sqrt(dt)));
Prices = zeros(nIter, n+1);
for i = 1:nIter % IS THERE A WAY TO VECTORIZE THESE FOR LOOPS?
for j = 1:n+1
if j == 1
Prices(i,j)=S0;
else
Prices(i,j)=Prices(i,j-1)*dlns(i,j);
end
endfor
endfor
Note that the price in n is equal to price in n-1 times a factor, hence the following does not work...
Prices(i,:) = S0 * dlns(i,:)
...since it takes S0 and multiplies it by all the factors, yielding different results than the expected random walk.

Because of the dependency between iterations to obtain results for each new column with respect to the previous column, it seems you would need at least one loop there, but do all operations within a column in a vectorized fashion and that might speed it up for you. The vectorized replacement for the two nested loops would look something like this -
Prices(:,1)=S0;
for j = 2:n+1
Prices(:,j) = Prices(:,j-1).*dlns(:,j);
endfor
It just occurred to me that the dependency can be taken care of with cumprod that gets us cumulative product which is essentially being done here and thus would lead to a no-loop solution! Here's the implementation -
Prices = [repmat(S0,nIter,1) cumprod(dlns(:,2:end),2)*S0]
Benchmarking on MATLAB
Benchmarking Code -
%// Parameters as told by OP and then create the inputs
nIter= 100000;
n = 100;
adj_r = 0.03;
sigma = 0.2;
dt = 1/n;
S0 = 60;
e = norminv(rand(nIter,n));
dlns = cat(2, ones(nIter,1), exp((adj_r+0.5*sigma^2)*dt+sigma*e.*sqrt(dt)));
disp('-------------------------------------- With Original Approach')
tic
Prices = zeros(nIter, n+1);
for i = 1:nIter
for j = 1:n+1
if j == 1
Prices(i,j)=S0;
else
Prices(i,j)=Prices(i,j-1)*dlns(i,j);
end
end
end
toc, clear Prices
disp('-------------------------------------- With Proposed Approach - I')
tic
Prices2(nIter, n+1)=0; %// faster pre-allocation scheme
Prices2(:,1)=S0;
for j = 2:n+1
Prices2(:,j)=Prices2(:,j-1).*dlns(:,j);
end
toc, clear Prices2
disp('-------------------------------------- With Proposed Approach - II')
tic
Prices3 = [repmat(S0,nIter,1) cumprod(dlns(:,2:end),2)*S0];
toc, clear Prices3
Runtimes results -
-------------------------------------- With Original Approach
Elapsed time is 0.259054 seconds.
-------------------------------------- With Proposed Approach - I
Elapsed time is 0.020566 seconds.
-------------------------------------- With Proposed Approach - II
Elapsed time is 0.067292 seconds.
Now, the runtimes do suggest that the first proposed approach might be a better fit here!

Related

Fastest approach to copying/indexing variable parts of 3D matrix

I have large sets of 3D data consisting of 1D signals acquired in 2D space.
The first step in processing this data is thresholding all signals to find the arrival of a high-amplitude pulse. This pulse is present in all signals and arrives at different times.
After thresholding, the 3D data set should be reordered so that every signal starts at the arrival of the pulse and what came before is thrown away (the end of the signals is of no importance, as of now i concatenate zeros to the end of all signals so the data remains the same size).
Now, I have implemented this in the following manner:
First, i start by calculating the sample number of the first sample exceeding the threshold in all signals
M = randn(1000,500,500); % example matrix of realistic size
threshold = 0.25*max(M(:,1,1)); % 25% of the maximum in the first signal as threshold
[~,index] = max(M>threshold); % indices of first sample exceeding threshold in all signals
Next, I want all signals to be shifted so that they all start with the pulse. For now, I have implemented it this way:
outM = zeros(size(M)); % preallocation for speed
for i = 1:size(M,2)
for j = 1:size(M,3)
outM(1:size(M,1)+1-index(1,i,j),i,j) = M(index(1,i,j):end,i,j);
end
end
This works fine, and i know for-loops are not that slow anymore, but this easily takes a few seconds for the datasets on my machine. A single iteration of the for-loop takes about 0.05-0.1 sec, which seems slow to me for just copying a vector containing 500-2000 double values.
Therefore, I have looked into the best way to tackle this, but for now I haven't found anything better.
I have tried several things: 3D masks, linear indexing, and parallel loops (parfor).
for 3D masks, I checked to see if any improvements are possible. Therefore i first contruct a logical mask, and then compare the speed of the logical mask indexing/copying to the double nested for loop.
%% set up for logical mask copying
AA = logical(ones(500,1)); % only copy the first 500 values after the threshold value
Mask = logical(zeros(size(M)));
Jepla = zeros(500,size(M,2),size(M,3));
for i = 1:size(M,2)
for j = 1:size(M,3)
Mask(index(1,i,j):index(1,i,j)+499,i,j) = AA;
end
end
%% speed comparison
tic
Jepla = M(Mask);
toc
tic
for i = 1:size(M,2)
for j = 1:size(M,3)
outM(1:size(M,1)+1-index(1,i,j),i,j) = M(index(1,i,j):end,i,j);
end
end
toc
The for-loop is faster every time, even though there is more that's copied.
Next, linear indexing.
%% setup for linear index copying
%put all indices in 1 long column
LongIndex = reshape(index,numel(index),1);
% convert to linear indices and store in new variable
linearIndices = sub2ind(size(M),LongIndex,repmat(1:size(M,2),1,size(M,3))',repelem(1:size(M,3),size(M,2))');
% extend linear indices with those of all values to copy
k = zeros(numel(M),1);
count = 1;
for i = 1:numel(LongIndex)
values = linearIndices(i):size(M,1)*i;
k(count:count+length(values)-1) = values;
count = count + length(values);
end
k = k(1:count-1);
% get linear indices of locations in new matrix
l = zeros(length(k),1);
count = 1;
for i = 1:numel(LongIndex)
values = repelem(LongIndex(i)-1,size(M,1)-LongIndex(i)+1);
l(count:count+length(values)-1) = values;
count = count + length(values);
end
l = k-l;
% create new matrix
outM = zeros(size(M));
%% speed comparison
tic
outM(l) = M(k);
toc
tic
for i = 1:size(M,2)
for j = 1:size(M,3)
outM(1:size(M,1)+1-index(1,i,j),i,j) = M(index(1,i,j):end,i,j);
end
end
toc
Again, the alternative approach, linear indexing, is (a lot) slower.
After this failed, I learned about parallelisation, and though this would for sure speed up my code.
By reading some of the documentation around parfor and trying it out a bit, I changed my code to the following:
gcp;
outM = zeros(size(M));
inM = mat2cell(M,size(M,1),ones(size(M,2),1),size(M,3));
tic
parfor i = 1:500
for j = 1:500
outM(:,i,j) = [inM{i}(index(1,i,j):end,1,j);zeros(index(1,i,j)-1,1)];
end
end
end
toc
I changed it so that "outM" and "inM" would both be sliced variables, as I read this is best. Still this is very slow, a lot slower than the original for loop.
So now the question, should I give up on trying to improve the speed of this operation? Or is there another way in which to do this? I have searched a lot, and for now do not see how to speed this up.
Sorry for the long question, but I wanted to show what I tried.
Thank you in advance!
Not sure if an option in your situation, but looks like cell arrays are actually faster here:
outM2 = cell(size(M,2),size(M,3));
tic;
for i = 1:size(M,2)
for j = 1:size(M,3)
outM2{i,j} = M(index(1,i,j):end,i,j);
end
end
toc
And a second idea which also came out faster, batch all data which have to be shifted by the same value:
tic;
for i = 1:unique(index).'
outM(1:size(M,1)+1-i,index==i) = M(i:end,index==i);
end
toc
It totally depends on your data if this approach is actually faster.
And yes integer valued and logical indexing can be mixed

Initialize vector with function in matlab

just started out with matlab and have some troubles finding the solution for the following action:
I am trying to initialize a vector of 1000 different values, with a function that doesn't take any arguments as input. I can do this with a for loop, but haven't found out how to do it without.
What I expected that would work:
z = zeros(1,1000)
result = arrayfun(*functionname*,z)
This however gives an error saying that the first input must be a function handle.
My function is a simple implementation of a monte carlo method to calculate pi:
function Result = mcm()
clear
N=1000;
M=0;
for j=1:N
p=[2*rand-1; 2*rand-1];
if p'*p<1
M=M+1;
end
end
Result=4*M/N
One way to actually vectorize your given function mcm would be -
N = 1000; %// Number of data points
P = [2*rand(1,N)-1; 2*rand(1,N)-1]; %// OR 2*rand(2,N)-1
out = 4*sum(sum(P.^2,1)<1)/N
Runtime tests
Code -
N = 1000000; %// Number of data points
disp('---------------- With Original Approach')
tic
M=0;
for j=1:N
P=[2*rand-1; 2*rand-1];
if P'*P<1
M=M+1;
end
end
Result=4*M/N;
toc
disp('---------------- With Proposed Approach')
tic
P = 2*rand(2,N)-1;
out = 4*sum(sum(P.^2,1)<1)/N;
toc
Timings & Outputs -
---------------- With Original Approach
Elapsed time is 3.952998 seconds.
---------------- With Proposed Approach
Elapsed time is 0.089590 seconds.
>> Result
Result =
3.1422
>> out
out =
3.1428
Since your function takes no arguments you can't use arrayfun. arrayfun applies the function to each element in the array.
Instead use this:
z = ones(1,1000) * mcm;
A side benefit is that mcm will only run once so it will be faster than looping that function 1000 times.

Time series aggregation efficiency

I commonly need to summarize a time series with irregular timing with a given aggregation function (i.e., sum, average, etc.). However, the current solution that I have seems inefficient and slow.
Take the aggregation function:
function aggArray = aggregate(array, groupIndex, collapseFn)
groups = unique(groupIndex, 'rows');
aggArray = nan(size(groups, 1), size(array, 2));
for iGr = 1:size(groups,1)
grIdx = all(groupIndex == repmat(groups(iGr,:), [size(groupIndex,1), 1]), 2);
for iSer = 1:size(array, 2)
aggArray(iGr,iSer) = collapseFn(array(grIdx,iSer));
end
end
end
Note that both array and groupIndex can be 2D. Every column in array is an independent series to be aggregated, but the columns of groupIndex should be taken together (as a row) to specify a period.
Then when we bring an irregular time series to it (note the second period is one base period longer), the timing results are poor:
a = rand(20006,10);
b = transpose([ones(1,5) 2*ones(1,6) sort(repmat((3:4001), [1 5]))]);
tic; aggregate(a, b, #sum); toc
Elapsed time is 1.370001 seconds.
Using the profiler, we can find out that the grpIdx line takes about 1/4 of the execution time (.28 s) and the iSer loop takes about 3/4 (1.17 s) of the total (1.48 s).
Compare this with the period-indifferent case:
tic; cumsum(a); toc
Elapsed time is 0.000930 seconds.
Is there a more efficient way to aggregate this data?
Timing Results
Taking each response and putting it in a separate function, here are the timing results I get with timeit with Matlab 2015b on Windows 7 with an Intel i7:
original | 1.32451
felix1 | 0.35446
felix2 | 0.16432
divakar1 | 0.41905
divakar2 | 0.30509
divakar3 | 0.16738
matthewGunn1 | 0.02678
matthewGunn2 | 0.01977
Clarification on groupIndex
An example of a 2D groupIndex would be where both the year number and week number are specified for a set of daily data covering 1980-2015:
a2 = rand(36*52*5, 10);
b2 = [sort(repmat(1980:2015, [1 52*5]))' repmat(1:52, [1 36*5])'];
Thus a "year-week" period are uniquely identified by a row of groupIndex. This is effectively handled through calling unique(groupIndex, 'rows') and taking the third output, so feel free to disregard this portion of the question.
Method #1
You can create the mask corresponding to grIdx across all
groups in one go with bsxfun(#eq,..). Now, for collapseFn as #sum, you can bring in matrix-multiplication and thus have a completely vectorized approach, like so -
M = squeeze(all(bsxfun(#eq,groupIndex,permute(groups,[3 2 1])),2))
aggArray = M.'*array
For collapseFn as #mean, you need to do a bit more work, as shown here -
M = squeeze(all(bsxfun(#eq,groupIndex,permute(groups,[3 2 1])),2))
aggArray = bsxfun(#rdivide,M,sum(M,1)).'*array
Method #2
In case you are working with a generic collapseFn, you can use the 2D mask M created with the previous method to index into the rows of array, thus changing the complexity from O(n^2) to O(n). Some quick tests suggest this to give appreciable speedup over the original loopy code. Here's the implementation -
n = size(groups,1);
M = squeeze(all(bsxfun(#eq,groupIndex,permute(groups,[3 2 1])),2));
out = zeros(n,size(array,2));
for iGr = 1:n
out(iGr,:) = collapseFn(array(M(:,iGr),:),1);
end
Please note that the 1 in collapseFn(array(M(:,iGr),:),1) denotes the dimension along which collapseFn would be applied, so that 1 is essential there.
Bonus
By its name groupIndex seems like would hold integer values, which could be abused to have a more efficient M creation by considering each row of groupIndex as an indexing tuple and thus converting each row of groupIndex into a scalar and finally get a 1D array version of groupIndex. This must be more efficient as the datasize would be 0(n) now. This M could be fed to all the approaches listed in this post. So, we would have M like so -
dims = max(groupIndex,[],1);
agg_dims = cumprod([1 dims(end:-1:2)]);
[~,~,idx] = unique(groupIndex*agg_dims(end:-1:1).'); %//'
m = size(groupIndex,1);
M = false(m,max(idx));
M((idx-1)*m + [1:m]') = 1;
Mex Function 1
HAMMER TIME: Mex function to crush it:
The base case test with original code from the question took 1.334139 seconds on my machine. IMHO, the 2nd fastest answer from #Divakar is:
groups2 = unique(groupIndex);
aggArray2 = squeeze(all(bsxfun(#eq,groupIndex,permute(groups,[3 2 1])),2)).'*array;
Elapsed time is 0.589330 seconds.
Then my MEX function:
[groups3, aggArray3] = mg_aggregate(array, groupIndex, #(x) sum(x, 1));
Elapsed time is 0.079725 seconds.
Testing that we get the same answer: norm(groups2-groups3) returns 0 and norm(aggArray2 - aggArray3) returns 2.3959e-15. Results also match original code.
Code to generate the test conditions:
array = rand(20006,10);
groupIndex = transpose([ones(1,5) 2*ones(1,6) sort(repmat((3:4001), [1 5]))]);
For pure speed, go mex. If the thought of compiling c++ code / complexity is too much of a pain, go with Divakar's answer. Another disclaimer: I haven't subject my function to robust testing.
Mex Approach 2
Somewhat surprising to me, this code appears even faster than the full Mex version in some cases (eg. in this test took about .05 seconds). It uses a mex function mg_getRowsWithKey to figure out the indices of groups. I think it may be because my array copying in the full mex function isn't as fast as it could be and/or overhead from calling 'feval'. It's basically the same algorithmic complexity as the other version.
[unique_groups, map] = mg_getRowsWithKey(groupIndex);
results = zeros(length(unique_groups), size(array,2));
for iGr = 1:length(unique_groups)
array_subset = array(map{iGr},:);
%// do your collapse function on array_subset. eg.
results(iGr,:) = sum(array_subset, 1);
end
When you do array(groups(1)==groupIndex,:) to pull out array entries associated with the full group, you're searching through the ENTIRE length of groupIndex. If you have millions of row entries, this will totally suck. array(map{1},:) is far more efficient.
There's still unnecessary copying of memory and other overhead associated with calling 'feval' on the collapse function. If you implement the aggregator function efficiently in c++ in such a way to avoid copying of memory, probably another 2x speedup can be achieved.
A little late to the party, but a single loop using accumarray makes a huge difference:
function aggArray = aggregate_gnovice(array, groupIndex, collapseFn)
[groups, ~, index] = unique(groupIndex, 'rows');
numCols = size(array, 2);
aggArray = nan(numel(groups), numCols);
for col = 1:numCols
aggArray(:, col) = accumarray(index, array(:, col), [], collapseFn);
end
end
Timing this using timeit in MATLAB R2016b for the sample data in the question gives the following:
original | 1.127141
gnovice | 0.002205
Over a 500x speedup!
Doing away with the inner loop, i.e.
function aggArray = aggregate(array, groupIndex, collapseFn)
groups = unique(groupIndex, 'rows');
aggArray = nan(size(groups, 1), size(array, 2));
for iGr = 1:size(groups,1)
grIdx = all(groupIndex == repmat(groups(iGr,:), [size(groupIndex,1), 1]), 2);
aggArray(iGr,:) = collapseFn(array(grIdx,:));
end
and calling the collapse function with a dimension parameter
res=aggregate(a, b, #(x)sum(x,1));
gives some speedup (3x on my machine) already and avoids the errors e.g. sum or mean produce, when they encounter a single row of data without a dimension parameter and then collapse across columns rather than labels.
If you had just one group label vector, i.e. same group labels for all columns of data, you could speed further up:
function aggArray = aggregate(array, groupIndex, collapseFn)
ng=max(groupIndex);
aggArray = nan(ng, size(array, 2));
for iGr = 1:ng
aggArray(iGr,:) = collapseFn(array(groupIndex==iGr,:));
end
The latter functions gives identical results for your example, with a 6x speedup, but cannot handle different group labels per data column.
Assuming a 2D test case for the group index (provided here as well with 10 different columns for groupIndex:
a = rand(20006,10);
B=[]; % make random length periods for each of the 10 signals
for i=1:size(a,2)
n0=randi(10);
b=transpose([ones(1,n0) 2*ones(1,11-n0) sort(repmat((3:4001), [1 5]))]);
B=[B b];
end
tic; erg0=aggregate(a, B, #sum); toc % original method
tic; erg1=aggregate2(a, B, #(x)sum(x,1)); toc %just remove the inner loop
tic; erg2=aggregate3(a, B, #(x)sum(x,1)); toc %use function below
Elapsed time is 2.646297 seconds.
Elapsed time is 1.214365 seconds.
Elapsed time is 0.039678 seconds (!!!!).
function aggArray = aggregate3(array, groupIndex, collapseFn)
[groups,ix1,jx] = unique(groupIndex, 'rows','first');
[groups,ix2,jx] = unique(groupIndex, 'rows','last');
ng=size(groups,1);
aggArray = nan(ng, size(array, 2));
for iGr = 1:ng
aggArray(iGr,:) = collapseFn(array(ix1(iGr):ix2(iGr),:));
end
I think this is as fast as it gets without using MEX. Thanks to the suggestion of Matthew Gunn!
Profiling shows that 'unique' is really cheap here and getting out just the first and last index of the repeating rows in groupIndex speeds things up considerably. I get 88x speedup with this iteration of the aggregation.
Well I have a solution that is almost as quick as the mex but only using matlab.
The logic is the same as most of the above, creating a dummy 2D matrix but instead of using #eq I initialize a logical array from the start.
Elapsed time for mine is 0.172975 seconds.
Elapsed time for Divakar 0.289122 seconds.
function aggArray = aggregate(array, group, collapseFn)
[m,~] = size(array);
n = max(group);
D = false(m,n);
row = (1:m)';
idx = m*(group(:) - 1) + row;
D(idx) = true;
out = zeros(m,size(array,2));
for ii = 1:n
out(ii,:) = collapseFn(array(D(:,ii),:),1);
end
end

MATLab Bootstrap without for loop

yesterday I implemented my first bootstrap in MATLab. (and yes, I know, for loops are evil.):
%data is an mxn matrix where the data should be sampled per column but there
can be a NaNs Elements
%from the array (a column of data) n values are sampled nReps times
function result = bootstrap_std(data, n, nReps,quantil)
result = zeros(1,size(data,2));
for i=1:size(data,2)
bootstrap_data = zeros(n,nReps);
values = find(~isnan(data(:,i)));
if isempty(values)
bootstrap_data(:,:) = NaN;
else
for k=1:nReps
bootstrap_data(:,k) = datasample(data(values,i),n);
end
end
stat = zeros(1,nReps);
for k=1:nReps
stat(k) = nanstd(bootstrap_data(:,k));
end
sort(stat);
result(i) = quantile(stat,quantil);
end
end
As one can see, this version works columnwise. The algorithm does what it should but is really slow when the data size increaes. My question is now: Is it possible to implement this logic without using for loops? My problem is here that I could not find a version of datasample which does the sampling columnwise. Or is there a better function to use?
I am happy for any hint or idea how I can speed up this implementation.
Thanks and best regards!
stephan
The bottlenecks in your implementation are
The function spends a lot of time inside nanstd which is unnecessary since you exclude NaN values from your sample anyway.
There are a lot of functions that operate column-wise, but you spend time looping over the columns and calling them many times.
You make many calls to datasample which is a relatively slow function. It's much faster to create a random vector of indices using randi and use that instead.
Here's how I would write the function (actually I probably wouldn't put in this many comments, and I wouldn't use so many temp variables, but I'm doing it now so you can see what all the steps of the computation are).
function result = bootstrap_std_new(data, n, nRep, quantil)
result = zeros(1, size(data,2));
for i = 1:size(data,2)
isbad = isnan(data(:,i)); %// Vector of NaN values
if all(isbad)
result(i) = NaN;
else
data0 = data(~isbad, i); %// Temp copy of this column for indexing
index = randi(size(data0,1), n, nRep); %// Create the indexing vector
bootstrapdata = data0(index); %// Sample the data
stdevs = std(bootstrapdata); %// Stdev of sampled data
result(i) = quantile(stdevs, quantil); %// Find the correct quantile
end
end
end
Here are some timings
>> data = randn(100,10);
>> data(randi(1000, 50, 1)) = NaN;
>> tic, bootstrap_std(data, 50, 1000, 0.5); toc
Elapsed time is 1.359529 seconds.
>> tic, bootstrap_std_new(data, 50, 1000, 0.5); toc
Elapsed time is 0.038558 seconds.
So this gives you about a 35x speedup.
Your main issue seems to be that you may have varying numbers/positions of NaN in each column, so can't work on the full matrix unless you're okay with also sampling NaNs. However, some of the inner loops could be simplified.
for k=1:nReps
bootstrap_data(:,k) = datasample(data(values,i),n);
end
Since you're sampling with replacement, you should be able to just do:
bootstrap_data = datasample(data(values,i), n*nReps);
bootstrap_data = reshape(bootstrap_data, [n nReps]);
Also nanstd can work on a full matrix so no need to loop:
stat = nanstd(bootstrap_data); % or nanstd(x,0,2) to change dimension
It would also be worth just looking over your code with profile to see where the bottlenecks are.

Optimizing a function that compares 1.7 million entries against itself

I have to calculate the distance of all UK postcodes against each other, then sum the population of all postcodes within 1 mile. The postcode & population list are stored in a text file. I'm most familiar with matlab, but I also have Stata & PSPP available. The program is currently scheduled to take about 2 weeks. Is there anything I can do to accelerate this process??? Here is my code. Matlab generated the script to import the text data. The distance function is from the mapping toolbox, and performs the great circle formula.
Any help is greatly appreciated.
function pcdistance(postcode, pop, lat, lon)
%Finds total population for UK postcode within 1 mile radius
fid = fopen('PPC.txt','a');
n = length(postcode);
%Calculates distance of 1 postcode at a time, against all others
%All data that doesn't meet rules is deleted
for i = 1:n;
dist = [];
dist(:,1)= pop;
for j = 1:n;
dist(j,2) = distance(lat(i),lon(i),lat(j),lon(j),3963.17);
good = dist(1:j,2)<= 1;
end
dist = dist(good,:);
tot = sum(dist(:,1));
fprintf(1,'%s,%d;',postcode{i},tot)
end
%Find sum of population within 1 mile
fclose(fid);
end
Here's a small sample of the inputs, from the txt file. The columns are "postcode, pop, lat, long" respectively.
"BD7 1DB",749,53.79,-1.76
"M15 6AA",748,53.46,-2.24
"WR2 6AJ",748,52.19,-2.24
"M15 6PF",745,53.46,-2.23
"IP7 7RA",741,52.12,0.96
"CF62 4WA",740,51.41,-3.41
"M1 2AR",738,53.47,-2.22
"NG1 4BR",737,52.95,-1.14
"ST16 3AW",735,52.81,-2.11
"AB25 1LE",733,57.15,-2.10
"WF2 9AG",730,53.68,-1.50
"DT11 8RH",730,50.86,-2.12
"CW1 5NP",729,53.09,-2.41
"TR12 7RH",724,50.08,-5.25
"ST5 5DY",723,53.00,-2.27
"HA1 3HP",723,51.57,-0.33
"DL10 7NP",722,54.37,-1.62
"M1 7HR",719,53.47,-2.23
"B18 4AS",719,52.49,-1.93
"OX13 6JB",716,51.68,-1.30
Here's the corrected code.
function pcdistance4(postcode, pop, lat, lon)
%Finds total population for UK postcode within 1 mile radius
fid = fopen('PPC.txt','A');
n = length(postcode);
% Pre-allocation
dist = zeros(n,2);
tot = zeros(n,1);
tic
for i = 1:n;
dist(:,1)= pop;
dist(:,2) = distance(lat(i),lon(i),lat(:),lon(:),3963.17);
good = dist(:, 2) <= 1 & dist(:,2) ~=0;
tot(i) = sum(dist(good, 1));
tot(i) = tot(i) + pop(i);
end
toc
tic
for j = 900001:n;
fprintf(fid,'%s,%d;\n',postcode{j},tot(j));
end
toc
fclose(fid);
end
Just a few general tips to get you started:
For your personal education: run your code using profiler to see where the majority of the computational time is spent. This will be the first clue on where to begin your optimization. (link to doc)
You shouldn't be writing to the hard drive on every step of the loop, since I\O are very costly in computation time. Instead you should save a bunch of strings to memory and write these "chunks" every once in a while. link1 link2
You can try using parfor instead of for(link to doc). Or possibly even CUDA if available to you. (link to doc)
Consider using the Geodetic Toolbox. It could be easier to convert the lat\lng coordinates to UTM (i.e. cartesian) and then use some standard functions to find distances.
Also:
My suggestion: don't use i and j as indices for your loops - this is often considered bad practice (due to possible confusion with imaginary numbers).
You should consider memory-complexity as well. Consider pre-allocating the variable dist outside the for-loops and overwrite it element by element.
For example (see comments in the modified function):
function pcdistance(postcode, pop, lat, lon)
fid = fopen('PPC.txt','a');
n = length(postcode);
% Pre-allocation
dist = zeros(n,n);
for i = 1:n;
% Avoid "deleting" the variable, you can overwrite it as the number of
% elements is always the same
% dist = [];
dist(:,1)= pop;
for j = 1:n;
% Unfortunately I do tno have the mentioned toolbox, but there is a
% high chance that you can avoid the for-loop. Probabily something
% like:
% dist(:, 2) = distance(lat(i), lon(i), lat, lon, ...)
% Try to vectorize it.
dist(j,2) = distance(lat(i),lon(i),lat(j),lon(j),3963.17);
% There is no need for this operation, is highly redundant and
% computationally expensive:
% - in the first loop you will check 1
% - in the second loop you will check two elements (1 redundant)
% - in the jth loop you will check j elements (j-1 redundant)
% The total redundant operations are 1+2+3+...+n-1.
%good = dist(1:j,2)<= 1;
end
% better do this
good = dist(:, 2) <= 1;
% also memory expensive.
% dist = dist(good,:);
% Better do the indexing directly
tot = sum(dist(good, 1));
end
% Write outside as recommended by Dev-iL
%Find sum of population within 1 mile
fclose(fid);
end
I cannot believe that distance() can only compare two points at a time when there should be no problem in vectorizing some sinus and cosine functions. So here is a trimmed vectorized version which i wrote for my own purposes some time ago. Maybe because i did not have that toolbox or i did not know about it. Frankly it does not give the exact same result as distance(), which i just tested. If you need the exact result of distance() you better not use this vectorized version.
function dist = distance_on_earth(lat0, lon0, lats, lons, radius)
degree2radians = pi/180;
% phi = 90 - latitude
phi0 = (90-lat0)*degree2radians;
phis = (90-lats)*degree2radians;
% theta = longitude
theta0 = lon0*degree2radians;
thetas = lons*degree2radians;
% sperical distance:
cosine = sin(phi0)*sin(phis)*cos(theta0-thetas)+cos(phi0)*cos(phis);
arc = acos(cosine);
dist = arc*radius;
Also in addition to what Dev-iL suggested, you can at least take the following line out of the inner loop:
good = dist(1:j,2)<= 1;
Good luck!
Nras
The uk is so small that you can still get reasonable results without worrying about the curving of the earth. You can simply estimate the distance by using the difference in latitude and longtitude.
This example is a bit oversimplified, but would suggest that once the data is read in, you could do the actual calculations in under an hour.
x=rand(1.7e6,1); %Fake x data
y=x; %Fake y data
tic
for t=1:1.7e3 % One thousandst part of the work to be done
(x-0.5).^2+(x-0.2).^2>0.01; %Simple distance calculation from a point (0.5,0.2), then comparing to treshold
end
toc %Runs for about 2 seconds
Using the real distance may take a bit longer, but it should still not take more than 1 or 2 hours to complete.