Add random number to cell of matrices - matlab

I have a cell, A of size 1 by 625. i.e.
A = { A1, A2, ..., A625},
where each of the 625 elements of the cell A are 3D matrices of the same size, 42 by 42 by 3.
Problem 1
Since the entries of my matrices represent concentration of red blood cells which are of very small values and I cannot simply work with randn.
For each of the matrix, I try with this command, e.g.:
A1 = A1 .*(1 + randn(42,42,3)/100)
I try with dividing by 100 to minimize the possibility of very negative number (e.g. 1.234e-6) but I cannot eliminate this possibility.
Also, is there any quick way to add different randn(42,42,3) to different 625 matrices. A + randn(42,42,3) won't work because it is adding the same set of random numbers.
Problem 2
I want to make 30 copies of the cell A by adding random numbers to each of entries of the 625 matrices. That is, I want to obtain a cell, Patients which is a cell of 1 by 30 and each of the cell element is another cell element of 625 matrices.
Patients = A % Initialization. I have 30 patients.
for i = 1 : 30;
Patients = { Patients, Patients + `method from problem 1`};
end
I have tried to make my problems clear. I appreciate so much for your help.

Problem 1:
% optional: initialize the random number generator to the current time. T
rng('shuffle');
A = cell(625);
A = cellfun( #(x) rand(42,42,3), A ,'UniformOutput', false)
note the differences between 'rand', 'randn', 'rng'
from the MATLAB documentation:
rng('shuffle') seeds the random number generator based on the current time. Thus, rand, randi, and randn produce a different sequence of numbers after each time you call rng.
rand = Uniformly distributed random numbers -> no negative values [0, 1]
randn = Normally distributed random numbers
If you want to generate numbers in a special interval (from the MATLAB documentation):
In general, you can generate N random numbers in the interval [a,b] with the formula r = a + (b-a).*rand(N,1).
Problem 2:
I highly recommend you to build a struct with your 30 cells. This will facilitate the indexing/looping later significantly! You can name the struct fields after your patients, so you will have less trouble keeping track of them.
You can also build an cell-array. This is the easiest way for indexing.
In both cases: do preallocation of the memmory! (MATLAB stores variables coherent in your memory. If the variable grows, MATLAB might have to relocate the variable...)
% for preallocation + initialization in one step:
A = cell(625);
Patients.Peter = cellfun( #(x) rand(42,42,3), A ,'UniformOutput', false);
Patients.Tom = cellfun( #(x) rand(42,42,3), A ,'UniformOutput', false);
% loop through the struct like this
FldNms = fieldnames( Patients );
for i = 0:length(Patients)
Patients.(FldNms{i}) = % do whatever you like
end
If you prefer an array:
% preallocation:
arry = cell(30);
for i = 1:length(array)
arry(i) = { cellfun( #(x) rand(42,42,3), A ,'UniformOutput', false) };
end
This is a lot of wild indexing and you will get a lot of trouble by using (),{} and [] for indexing.
Think again if you need a cell array in the first place. An array of 625 matrices might suit you as well. Data structure can significantly affect the performance, the readability and your coding time!

For your first question, what about
cellfun(#(x)(x + randn(42,42,3)/100), A, 'uni', 0)
your second problem should (A) be a second question and (B) be trivial to solve once your first is done. Just drop the Patients + and your code should work correctly.

Related

Draw non full matrix of random numbers

I am doing a Monte-Carlo simulation, where each repetition requires the sum or product of a random number of random variables. My problem is how to do this efficiently as the entire simulation should be as vectorized as possible.
For example, say we want to take the sum of 5, 10 and 3 random numbers, represented by the vector len = [5;10;3]. Then what I am currently doing is drawing a full matrix of random numbers:
A = randn(length(len),max(len));
Creating a mask of the non-needed numbers:
lenlen = repmat(len,1,max(len));
idx = repmat(1:max(len),length(len),1);
mask = idx>lenlen;
and then I can "pad", the matrix as I am interested in the sum the padding have to be zero (for the case with the product the padding had to be 1)
A(mask)=0;
To obtain:
A =
1.7708 -1.4609 -1.5637 -0.0340 0.9796 0 0 0 0 0
1.8034 -1.5467 0.3938 0.8777 0.6813 1.0594 -0.3469 1.7472 -0.4697 -0.3635
1.5937 -0.1170 1.5629 0 0 0 0 0 0 0
Whereafter I can sum them together
B = sum(A,2);
However, I find it rather superfluous that I have to draw too many random numbers and then throw them away. In the real case, I need in the range of hundred thousands of repetitions and the vector len might vary a lot, i.e. it can easily be that I have to draw twice or three times the number of random numbers than of what is needed.
You can generate the exact amount of random numbers required, create a grouping variable with repelem, and compute the sum of each group using accumarray:
len = [5; 10; 3];
B = accumarray(repelem(1:numel(len), len).', randn(sum(len),1));
You could just use arrayfun or a loop. You say "efficient" and "vectorized" in the same breath, but they are not necessarily the same thing - since the new(ish) JIT compiler, loops are pretty fast in MATLAB. arrayfun is basically a loop in disguise, but means you could create B like so:
len = [5;10;3];
B = arrayfun( #(x) sum( randn(x,1) ), len );
For each element in len, this creates a vector of length len(i) and takes the sum. The output is an array with one value for each value in len.
This will certainly be a lot more memory friendly for large values and largely different values within len. It may therefore be quicker, your mileage may vary but it cuts out a lot of the operations you're doing.
You mention wanting to take the product sometimes, in which case use prod in place of sum.
Edit: rough and ready benchmark to compare arrayfun and a loop...
len = randi([1e3, 1e7], 100, 1);
tic;
B = arrayfun( #(x) sum( randn(x,1) ), len );
toc % ~8.77 seconds
tic;
out=zeros(size(len));
for ii = 1:numel(len)
out(ii) = sum(randn(len(ii),1));
end
toc % ~8.80 seconds
The "advantage" of the loop over arrayfun is you can pre-generate all of the random numbers in one go, then index. This isn't necesarryily quicker because you're addressing much bigger chunks of memory, and the call to randn is the main bottleneck anyway!
tic;
out = zeros(size(len));
rnd = randn(sum(len),1);
idx = [0; cumsum(len)]; % note: cumsum is very quick (~0.001sec here) so negligible
for ii = 1:numel(len)
out(ii) = sum(rnd(idx(ii)+1:idx(ii+1)),1);
end
toc % ~10.2 sec! Slower because of massive call to randn and the indexing into large array.
As stated at the top, arrayfun and looping are basically the same under the hood, so no reason to expect a big time difference.
The sum of multiple random numbers drawn from a specific distribution is also a random number with a (different) specific distribution. Therefore you can just cut the middleman and draw directly from the latter distribution.
In your case you are summing 3, 10 and 5 numbers drawn from a N(0,1) distribution. As explained here, the resulting distributions therefore are N(0,3), N(0,10) and N(0,5). This page explains how you can draw from non-standard normal distributions in Matlab. As such, we can in this case generate those numbers with randn(3,1).*sqrt([5; 10; 3]).
In case you would want 1000 triples, you could then use
randn(3,1000).*sqrt([5; 10; 3])
or pre Matlab2016b
bsxfun(#times, randn(3,1000), sqrt([5; 10; 3]))
which is of course very fast.
Different distributions have different summation rules, but as long as you are not summing up numbers drawn from different distributions the rules are usually quite simple and found quickly with google.
You can do this using a combination of cumsum and diff. The plan is:
Create all the random numbers in a single call to randn up front
Then, use cumsum to produce a vector of cumulative summations
Use cumsum on the list of number-of-samples-per-result to work out where to read out the results
We also need diff to correct for the prior summations.
Note that this method might lose accuracy if you weren't using randn for the random samples, as cumsum would then build up arithmetic rounding errors.
% We want 100 sums of random numbers
numSamples = 100;
% Here's where we define how many random samples contribute to each sum
numRandsPerSample = randi(5, 1, numSamples);
% Let's make all the random numbers in one call
allRands = randn(1, sum(numRandsPerSample));
% Use CUMSUM to build up a cumulative sum of the whole of allRands. We also
% need a leading 0 for the first sum.
allRandsCS = [0, cumsum(allRands)];
% Use CUMSUM again to pick out the places we need to pick from
% allRandsCS
endIdxs = 1 + [0, cumsum(numRandsPerSample)];
% Use DIFF to subtract the prior sums from the result.
result = diff(allRandsCS(endIdxs))

Iterating in a matrix avoiding loop in MATLAB

I am posing an interesting and useful question that needs to be carried out in MATLAB. It is about efficiency of programming by avoiding using Loops"
Assume a matrix URm whose columns are products and rows are people. The matrix entries are rating of people to these products, and this matrix is sparse as each person normally rates only few products.
URm [n_u, n_i]
Another matrix of interest is F, which contains attribute for each of the products and the attribute is of fixed length:
F [n_f,n_i]
We divide the URm into two sub-matrices randomly: URmTrain and URmTest where the former is used for training the system and the latter for test. These two matrices have similar rows (users) but they could have different number of columns (products).
We can find the similarity between items very fast using pdist() or Matrix transpose:
S = F * F' ;
For each row (user) in URmTest:
URmTestp = zeros(size(URmTest));
u = 1 ; %% Example user 1
for i = 1 : size(URmTest,2)
indTrain = find(URmTrain(u,:)) ; % For each user, search for items in URmTrain that have been rated by the the user (i.e. the have a rating greater than zero)
for j = 1 : length(indTrain)
URmTestp(u,i) = URmTestp(u,i) + S(i,indTrain(j))*URmTrain(u,indTrain(j))
end
end
where URmp is the predicted version of URm and we can compute an error on how good our prediction has been.
Example
Lets's make a simple example. Let's assume the items user 1 has rated items 3 , 5 and 17:
indTrain = [3 5 17]
For each item j in URmTest, I want to predict the rating using the following formula:
URmTestp(u,j) = S(j,3)*URmTrain(u,3) + S(j,5)*URmTrain(u,5) + S(j,17)*URmTrain(u,17)
Once completed this process needs to be repeated for all users.
As URm is typically very big, I prefer options which use least amount of 'loops'. We may be able to take advantage of bsxfun but I am not sure if we can.
Please suggest me ides that can help on accelerating this process as rapid as possible. Thank you
I'm still not sure I completely understand your problem. But it seems to me that if you pre-compute s_ij as
s_ij = F.' * F %'// [ni x ni] matrix
then what you're after is simply
URmTestp(u,indTest) = URmTrain(u,indTrain) * s_ij(indTrain,indTest);
% or
%URmTestp(u,:) = URmTrain(u,indTrain) * s_ij(indTrain,:);
or if you only compute a smaller s_ij block only for the necessary arrays,
s_ij = F(:,indTrain).' * F(:,indTest);
then
URmTestp(u,indTest) = URmTrain(u,indTrain) * s_ij;
Alternatively, you can always compute the necessary subblock of s_ij on the fly:
URmTestp(u,indTest) = URmTrainp(u,indTrain) * F(:,indTrain).'*F(:,indTest);
If I understand correctly that indTest and indTrain are functions of u, such as
URmTestp = zeros(n_u,n_i); %// pre-allocate here!
for u=1:n_u
indTest = testCell{u};
indTrain = trainCell{u};
URmTestp(u,indTest) = URmTrainp(u,indTrain) * F(:,indTrain).'*F(:,indTest); %'
...
end
then probably not much can be vectorized on this loop, unless there's a very tricky indexing scheme that allows you to use linear indices. I'd stick with this setup.

Matlab: creating vector from 2 input vectors [duplicate]

I'm trying to insert multiple values into an array using a 'values' array and a 'counter' array. For example, if:
a=[1,3,2,5]
b=[2,2,1,3]
I want the output of some function
c=somefunction(a,b)
to be
c=[1,1,3,3,2,5,5,5]
Where a(1) recurs b(1) number of times, a(2) recurs b(2) times, etc...
Is there a built-in function in MATLAB that does this? I'd like to avoid using a for loop if possible. I've tried variations of 'repmat()' and 'kron()' to no avail.
This is basically Run-length encoding.
Problem Statement
We have an array of values, vals and runlengths, runlens:
vals = [1,3,2,5]
runlens = [2,2,1,3]
We are needed to repeat each element in vals times each corresponding element in runlens. Thus, the final output would be:
output = [1,1,3,3,2,5,5,5]
Prospective Approach
One of the fastest tools with MATLAB is cumsum and is very useful when dealing with vectorizing problems that work on irregular patterns. In the stated problem, the irregularity comes with the different elements in runlens.
Now, to exploit cumsum, we need to do two things here: Initialize an array of zeros and place "appropriate" values at "key" positions over the zeros array, such that after "cumsum" is applied, we would end up with a final array of repeated vals of runlens times.
Steps: Let's number the above mentioned steps to give the prospective approach an easier perspective:
1) Initialize zeros array: What must be the length? Since we are repeating runlens times, the length of the zeros array must be the summation of all runlens.
2) Find key positions/indices: Now these key positions are places along the zeros array where each element from vals start to repeat.
Thus, for runlens = [2,2,1,3], the key positions mapped onto the zeros array would be:
[X 0 X 0 X X 0 0] % where X's are those key positions.
3) Find appropriate values: The final nail to be hammered before using cumsum would be to put "appropriate" values into those key positions. Now, since we would be doing cumsum soon after, if you think closely, you would need a differentiated version of values with diff, so that cumsum on those would bring back our values. Since these differentiated values would be placed on a zeros array at places separated by the runlens distances, after using cumsum we would have each vals element repeated runlens times as the final output.
Solution Code
Here's the implementation stitching up all the above mentioned steps -
% Calculate cumsumed values of runLengths.
% We would need this to initialize zeros array and find key positions later on.
clens = cumsum(runlens)
% Initalize zeros array
array = zeros(1,(clens(end)))
% Find key positions/indices
key_pos = [1 clens(1:end-1)+1]
% Find appropriate values
app_vals = diff([0 vals])
% Map app_values at key_pos on array
array(pos) = app_vals
% cumsum array for final output
output = cumsum(array)
Pre-allocation Hack
As could be seen that the above listed code uses pre-allocation with zeros. Now, according to this UNDOCUMENTED MATLAB blog on faster pre-allocation, one can achieve much faster pre-allocation with -
array(clens(end)) = 0; % instead of array = zeros(1,(clens(end)))
Wrapping up: Function Code
To wrap up everything, we would have a compact function code to achieve this run-length decoding like so -
function out = rle_cumsum_diff(vals,runlens)
clens = cumsum(runlens);
idx(clens(end))=0;
idx([1 clens(1:end-1)+1]) = diff([0 vals]);
out = cumsum(idx);
return;
Benchmarking
Benchmarking Code
Listed next is the benchmarking code to compare runtimes and speedups for the stated cumsum+diff approach in this post over the other cumsum-only based approach on MATLAB 2014B-
datasizes = [reshape(linspace(10,70,4).'*10.^(0:4),1,[]) 10^6 2*10^6]; %
fcns = {'rld_cumsum','rld_cumsum_diff'}; % approaches to be benchmarked
for k1 = 1:numel(datasizes)
n = datasizes(k1); % Create random inputs
vals = randi(200,1,n);
runs = [5000 randi(200,1,n-1)]; % 5000 acts as an aberration
for k2 = 1:numel(fcns) % Time approaches
tsec(k2,k1) = timeit(#() feval(fcns{k2}, vals,runs), 1);
end
end
figure, % Plot runtimes
loglog(datasizes,tsec(1,:),'-bo'), hold on
loglog(datasizes,tsec(2,:),'-k+')
set(gca,'xgrid','on'),set(gca,'ygrid','on'),
xlabel('Datasize ->'), ylabel('Runtimes (s)')
legend(upper(strrep(fcns,'_',' '))),title('Runtime Plot')
figure, % Plot speedups
semilogx(datasizes,tsec(1,:)./tsec(2,:),'-rx')
set(gca,'ygrid','on'), xlabel('Datasize ->')
legend('Speedup(x) with cumsum+diff over cumsum-only'),title('Speedup Plot')
Associated function code for rld_cumsum.m:
function out = rld_cumsum(vals,runlens)
index = zeros(1,sum(runlens));
index([1 cumsum(runlens(1:end-1))+1]) = 1;
out = vals(cumsum(index));
return;
Runtime and Speedup Plots
Conclusions
The proposed approach seems to be giving us a noticeable speedup over the cumsum-only approach, which is about 3x!
Why is this new cumsum+diff based approach better than the previous cumsum-only approach?
Well, the essence of the reason lies at the final step of the cumsum-only approach that needs to map the "cumsumed" values into vals. In the new cumsum+diff based approach, we are doing diff(vals) instead for which MATLAB is processing only n elements (where n is the number of runLengths) as compared to the mapping of sum(runLengths) number of elements for the cumsum-only approach and this number must be many times more than n and therefore the noticeable speedup with this new approach!
Benchmarks
Updated for R2015b: repelem now fastest for all data sizes.
Tested functions:
MATLAB's built-in repelem function that was added in R2015a
gnovice's cumsum solution (rld_cumsum)
Divakar's cumsum+diff solution (rld_cumsum_diff)
knedlsepp's accumarray solution (knedlsepp5cumsumaccumarray) from this post
Naive loop-based implementation (naive_jit_test.m) to test the just-in-time compiler
Results of test_rld.m on R2015b:
Old timing plot using R2015a here.
Findings:
repelem is always the fastest by roughly a factor of 2.
rld_cumsum_diff is consistently faster than rld_cumsum.
repelem is fastest for small data sizes (less than about 300-500 elements)
rld_cumsum_diff becomes significantly faster than repelem around 5 000 elements
repelem becomes slower than rld_cumsum somewhere between 30 000 and 300 000 elements
rld_cumsum has roughly the same performance as knedlsepp5cumsumaccumarray
naive_jit_test.m has nearly constant speed and on par with rld_cumsum and knedlsepp5cumsumaccumarray for smaller sizes, a little faster for large sizes
Old rate plot using R2015a here.
Conclusion
Use repelem below about 5 000 elements and the cumsum+diff solution above.
There's no built-in function I know of, but here's one solution:
index = zeros(1,sum(b));
index([1 cumsum(b(1:end-1))+1]) = 1;
c = a(cumsum(index));
Explanation:
A vector of zeroes is first created of the same length as the output array (i.e. the sum of all the replications in b). Ones are then placed in the first element and each subsequent element representing where the start of a new sequence of values will be in the output. The cumulative sum of the vector index can then be used to index into a, replicating each value the desired number of times.
For the sake of clarity, this is what the various vectors look like for the values of a and b given in the question:
index = [1 0 1 0 1 1 0 0]
cumsum(index) = [1 1 2 2 3 4 4 4]
c = [1 1 3 3 2 5 5 5]
EDIT: For the sake of completeness, there is another alternative using ARRAYFUN, but this seems to take anywhere from 20-100 times longer to run than the above solution with vectors up to 10,000 elements long:
c = arrayfun(#(x,y) x.*ones(1,y),a,b,'UniformOutput',false);
c = [c{:}];
There is finally (as of R2015a) a built-in and documented function to do this, repelem. The following syntax, where the second argument is a vector, is relevant here:
W = repelem(V,N), with vector V and vector N, creates a vector W where element V(i) is repeated N(i) times.
Or put another way, "Each element of N specifies the number of times to repeat the corresponding element of V."
Example:
>> a=[1,3,2,5]
a =
1 3 2 5
>> b=[2,2,1,3]
b =
2 2 1 3
>> repelem(a,b)
ans =
1 1 3 3 2 5 5 5
The performance problems in MATLAB's built-in repelem have been fixed as of R2015b. I have run the test_rld.m program from chappjc's post in R2015b, and repelem is now faster than other algorithms by about a factor 2:

Best way to join different length column vectors into a matrix in MATLAB

Assuming i have a series of column-vectors with different length, what would be the best way, in terms of computation time, to join all of them into one matrix where the size of it is determined by the longest column and the elongated columns cells are all filled with NaN's.
Edit: Please note that I am trying to avoid cell arrays, since they are expensive in terms of memory and run time.
For example:
A = [1;2;3;4];
B = [5;6];
C = magicFunction(A,B);
Result:
C =
1 5
2 6
3 NaN
4 NaN
The following code avoids use of cell arrays except for the estimation of number of elements in each vector and this keeps the code a bit cleaner. The price for using cell arrays for that tiny bit of work shouldn't be too expensive. Also, varargin gets you the inputs as a cell array anyway. Now, you can avoid cell arrays there too, but it would most probably involve use of for-loops and might have to use variable names for each of the inputs, which isn't too elegant when creating a function with unknown number of inputs. Otherwise, the code uses numeric arrays, logical indexing and my favourite bsxfun, which must be cheap in the market of runtimes.
Function Code
function out = magicFunction(varargin)
lens = cellfun(#(x) numel(x),varargin);
out = NaN(max(lens),numel(lens));
out(bsxfun(#le,[1:max(lens)]',lens)) = vertcat(varargin{:}); %//'
return;
Example
Script -
A1 = [9;2;7;8];
A2 = [1;5];
A3 = [2;6;3];
out = magicFunction(A1,A2,A3)
Output -
out =
9 1 2
2 5 6
7 NaN 3
8 NaN NaN
Benchmarking
As part of the benchmarking, we are comparing our solution to #gnovice's solution that was mostly based on using cell arrays. Our intention here to see that after avoiding cell arrays, what speedups we are getting if there's any. Here's the benchmarking code with 20 vectors -
%// Let's create row vectors A1,A2,A3.. to be used with #gnovice's solution
num_vectors = 20;
max_vector_length = 1500000;
vector_lengths = randi(max_vector_length,num_vectors,1);
vs =arrayfun(#(x) randi(9,1,vector_lengths(x)),1:numel(vector_lengths),'uni',0);
[A1,A2,A3,A4,A5,A6,A7,A8,A9,A10,A11,A12,A13,A14,A15,A16,A17,A18,A19,A20] = vs{:};
%// Maximally cell-array based approach used in linked #gnovice's solution
disp('--------------------- With #gnovice''s approach')
tic
tcell = {A1,A2,A3,A4,A5,A6,A7,A8,A9,A10,A11,A12,A13,A14,A15,A16,A17,A18,A19,A20};
maxSize = max(cellfun(#numel,tcell)); %# Get the maximum vector size
fcn = #(x) [x nan(1,maxSize-numel(x))]; %# Create an anonymous function
rmat = cellfun(fcn,tcell,'UniformOutput',false); %# Pad each cell with NaNs
rmat = vertcat(rmat{:});
toc, clear tcell maxSize fcn rmat
%// Transpose each of the input vectors to get column vectors as needed
%// for our problem
vs = cellfun(#(x) x',vs,'uni',0); %//'
[A1,A2,A3,A4,A5,A6,A7,A8,A9,A10,A11,A12,A13,A14,A15,A16,A17,A18,A19,A20] = vs{:};
%// Our solution
disp('--------------------- With our new approach')
tic
out = magicFunction(A1,A2,A3,A4,A5,A6,A7,A8,A9,A10,...
A11,A12,A13,A14,A15,A16,A17,A18,A19,A20);
toc
Results -
--------------------- With #gnovice's approach
Elapsed time is 1.511669 seconds.
--------------------- With our new approach
Elapsed time is 0.671604 seconds.
Conclusions -
With 20 vectors and with a maximum length of 1500000, the speedups are between 2-3x and it was seen that the speedups have increased as we have increased the number of vectors. The results to prove that are not shown here to save space, as we have already used quite a lot of it here.
If you use a cell matrix you won't need them to be filled with NaNs, just write each array into one column and the unused elements stay empty (that would be the space efficient way). You could either use:
cell_result{1} = A;
cell_result{2} = B;
THis would result in a size 2 cell array which contains all elements of A,B in his elements. Or if you want them to be saved as columns:
cell_result(1,1:numel(A)) = num2cell(A);
cell_result(2,1:numel(B)) = num2cell(B);
If you need them to be filled with NaN's for future coding, it would be the easiest to find the maximum length you got. Create yourself a matrix of (max_length X Number of arrays).
So lets say you have n=5 arrays:A,B,C,D and E.
h=zeros(1,n);
h(1)=numel(A);
h(2)=numel(B);
h(3)=numel(C);
h(4)=numel(D);
h(5)=numel(E);
max_No_Entries=max(h);
result= zeros(max_No_Entries,n);
result(:,:)=NaN;
result(1:numel(A),1)=A;
result(1:numel(B),2)=B;
result(1:numel(C),3)=C;
result(1:numel(D),4)=D;
result(1:numel(E),5)=E;

Calculating all two element sums of [a1 a2 a3 ... an] in Matlab

Context: I'm working on Project Euler Problem 23 using Matlab in order to practice my barely existing programming skills.
My Problem:
Now I have a vector with roughly 6500 numbers (ranging from 12 to 28122) as elements and want to calculate all the two element sums. That is I only need one instance of every sum, so having calculated a1 + an it's not necessary to calculate an + a1.
Edit for clarification: This includes the sums a1+a1, a2+a2,..., an+an.
The problem is that this is much too slow.
Problem specific constraints:
It's a given that sums 28123 or over aren't necessary to calculate, since those can't be used to solve the problem further.
My approach:
AbundentNumberSumsRaw=[];
for i=1:3490
AbundentNumberSumsRaw=[AbundentNumberSumRaw AbundentNumbers(i)+AbundentNumbers(i:end);
end
This works terribly :p
My Comments:
I'm pretty sure that incrementally increasing the vector AbundentNumbersRaw is bad coding, since that means memory usage will spike unnecessarily. I haven't done so, since a) I don't know what size vector to pre-allocate and b) I couldn't come up with a way to inject the sums into AbundentNumbersRaw in a orderly manner without using some ugly looking nested loops.
"for i=1:3490" is lower than the numbers of elements simply because I checked and saw that all the resulting sums for numbers whose index are above 3490 would be too large for me to use anyway.
I'm pretty sure my main issue is that the program need to do a lot of incremental increases of the vector AbundentNumbersRaw.
Any and all help and suggestions would be much appreciated :)
Cheers
Rasmus
Suppose
a = 28110*rand(6500,1)+12;
then
sums = [
a(1) + a(1:end)
a(2) + a(2:end)
...
];
is the calculation you're after.
You also state that sums whose value goes over 28123 should be discarded.
This can be generalized like so:
% Compute all 2-element sums without repetitions
C = arrayfun(#(x) a(x)+a(x:end), 1:numel(a), 'uniformoutput', false);
C = cat(1, C{:});
% discard sums exceeding threshold
C(C>28123) = [];
or using a loop
% Compute all 2-element sums without repetitions
E = cell(numel(a),1);
for ii = 1:numel(a)
E{ii} = a(ii)+a(ii:end); end
E = cat(1, E{:});
% discard sums exceeding threshold
E(E>28123) = [];
Simple testing shows that arrayfun is somewhat faster than the loop, so I'd go for the arrayfun option.
As your primary problem is to find out, which integers in a given set can be written as the sum of two integers of a different set, I'd choose a different approach:
AbundantNumbers = 1:6500; % replace with the list you generated somewhere else
maxInteger = 28122;
AbundantNumberSum(1:maxInteger) = true; % logical array
for i = 1:length(AbundantNumbers)
sumIndices = AbundantNumbers(i) + AbundantNumbers;
AbundantNumberSum(sumIndices(sumIndices <= maxInteger)) = false;
end
Unfortunantely, this is not an answer to your question but to your problem ;-) For the MatLab way to solve your original question, see the elegant answer of Rody Oldenhuis.
My approach would be the following:
v = 1:3490; % your vector here
s = length(v);
result = zeros(s); % preallocate memory
for m = 1:s
result(m,m:end) = v(m)+v(m:end);
end
You will get a matrix of 3490 x 3490 elements and more than half of them 0.