Iterating over all integer vectors summing up to a certain value in MATLAB? - matlab

I would like to find a clean way so that I can iterate over all the vectors of positive integers of length, say n (called x), such that sum(x) == 100 in MATLAB.
I know it is an exponentially complex task. If the length is sufficiently small, say 2-3 I can do it by a for loop (I know it is very inefficient) but how about longer vectors?
Thanks in advance,

Here is a quick and dirty method that uses recursion. The idea is that to generate all vectors of length k that sum to n, you first generate vectors of length k-1 that sum to n-i for each i=1..n, and then add an extra i to the end of each of these.
You could speed this up by pre-allocating x in each loop.
Note that the size of the output is (n + k - 1 choose n) rows and k columns.
function x = genperms(n, k)
if k == 1
x = n;
elseif n == 0
x = zeros(1,k);
else
x = zeros(0, k);
for i = 0:n
y = genperms(n-i,k-1);
y(:,end+1) = i;
x = [x; y];
end
end
Edit
As alluded to in the comments, this will run into memory issues for large n and k. A streaming solution is preferable, which generates the outputs one at a time. In a non-strict language like Haskell this is very simple -
genperms n k
| k == 1 = return [n]
| n == 0 = return (replicate k 0)
| otherwise = [i:y | i <- [0..n], y <- genperms (n-i) (k-1)]
viz.
>> mapM_ print $ take 10 $ genperms 100 30
[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,100]
[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,99]
[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,98]
[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3,97]
[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,4,96]
[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,5,95]
[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,6,94]
[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,7,93]
[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,8,92]
[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,9,91]
which runs virtually instantaneously - no memory issues to worry about.
In Python you could achieve something nearly as simple using generators and the yield keyword. In Matlab it is certainly possible, but I leave the translation up to you!

This is one possible method to generate all vectors at once (will give memory problems for moderately large n):
s = 10; %// desired sum
n = 3; %// number of digits
vectors = cell(1,n);
[vectors{:}] = ndgrid(0:s); %// I assume by "integer" you mean non-negative int
vectors = cell2mat(cellfun(#(c) reshape(c,1,[]), vectors, 'uni', 0).');
vectors = vectors(:,sum(vectors)==s); %// each column is a vector
Now you can iterate over those vectors:
for vector = vectors %// take one column at each iteration
%// do stuff with the vector
end
To avoid memory problems it is better to generate each vector as needed, instead of generating all of them initially. The following approach iterates over all possible n-vectors in one for loop (regardless of n), rejecting those vectors whose sum is not the desired value:
s = 10; %// desired sum
n = 3;; %// number of digits
for number = 0: s^n-1
vector = dec2base(number,s).'-'0'; %// column vector of n rows
if sum(vector) ~= s
continue %// reject that vector
end
%// do stuff with the vector
end

Related

How to programmatically compute this summation

I want to compute the above summation, for a given 'x'. The summation is to be carried out over a block of lengths specified by an array , for example block_length = [5 4 3]. The summation is carried as follows: from -5 to 5 across one dimension, -4 to 4 in the second dimension and -3 to 3 in the last dimension.
The pseudo code will be something like this:
sum = 0;
for i = -5:5
for j = -4:4
for k = -3:3
vec = [i j k];
tv = vec * vec';
sum = sum + 1/(1+tv)*cos(2*pi*x*vec'));
end
end
end
The problem is that I want to find the sum when the number of dimensions are not known ahead of time, using some kind of variable nested loops hopefully. Matlab uses combvec, but it returns all possible combinations of vectors, which is not required as we only compute the sum. When there are many dimensions, combvec returning all combinations is not feasible memory wise.
Appreciate any ideas towards solutions.
PS: I want to do this at high number of dimensions, for example 650, as in machine learning.
Based on https://www.mathworks.com/matlabcentral/answers/345551-function-with-varying-number-of-for-loops I came up with the following code (I haven't tested it for very large number of indices!):
function sum = fun(x, block_length)
sum = 0;
n = numel(block_length); % Number of loops
vec = -ones(1, n) .* block_length; % Index vector
ready = false;
while ~ready
tv = vec * vec';
sum = sum + 1/(1+tv)*cos(2*pi*x*vec');
% Update the index vector:
ready = true; % Assume that the WHILE loop is ready
for k = 1:n
vec(k) = vec(k) + 1;
if vec(k) <= block_length(k)
ready = false;
break; % v(k) increased successfully, leave "for k" loop
end
vec(k) = -1 * block_length(k); % v(k) reached the limit, reset it
end
end
end
where x and block_length should be both 1-x-n vectors.
The idea is that, instead of using explicitly nested loops, we use a vector of indices.
How good/efficient is this when tackling the suggested use case where block_length can have 650 elements? Not much! Here's a "quick" test using merely 16 dimensions and a [-1, 1] range for the indices:
N = 16; tic; in = 0.1 * ones(1, N); sum = fun(in, ones(size(in))), toc;
which yields an elapsed time of 12.7 seconds on my laptop.

Construct a Matlab array through nested loops

Suppose I have three matrices A_1, A_2, A_3 each of dimension mxn, where m and n are large. These matrices contain strictly positive numbers.
I want to construct a matrix check of dimension mx1 such that, for each i=1,...,m:
check(i)=1 if there exists j,k such that
A_1(i,1)+A_2(j,1)+A_3(k,1)<=quantile(A_1(i,2:end)+A_2(j,2:end)+A_3(k,3:end), 0.95)
In my case m is large (m=10^5) and n=500. Therefore, I would like your help to find an efficient way to do this.
Below I reproduce an example. I impose m smaller than in reality and report my incomplete and probably inefficient attempt to construct check.
clear
rng default
m=4;
n=500;
A_1=betarnd(1,2,m,n);
A_2=betarnd(1,2,m,n);
A_3=betarnd(1,2,m,n);
check=zeros(m,1);
for i=1:m
for j=1:m
for k=1:m
if A_1(i,1)+A_2(j,1)+A_3(k,1)<=quantile(A_1(i,2:end)+A_2(j,2:end)+A_3(k,2:end), 0.95)
check(i)=1;
STOP TO LOOP OVER j AND k, MOVE TO THE NEXT i (INCOMPLETE!)
else
KEEP SEARCHING FOR j,k SUCH THAT THE CONDITION IS SATISFIED (INCOMPLETE!)
end
end
end
end
Given a scalar x and a vector v the expression x <=quantile (v, .95) can be written as sum( x > v) < Q where Q = .95 * numel(v) *.
Also A_1 can be splitted before the loop to avoid extra indexing.
Moreover the most inner loop can be removed in favor of vectorization.
Af_1 = A_1(:,1);
Af_2 = A_2(:,1);
Af_3 = A_3(:,1);
As_1 = A_1(:,2:end);
As_2 = A_2(:,2:end);
As_3 = A_3(:,2:end);
Q = .95 * (n -1);
for i=1:m
for j=1:m
if any (sum (Af_1(i) + Af_2(j) + Af_3 > As_1(i,:) + As_2(j,:) + As_3, 2) < Q)
check(i) = 1;
break;
end
end
end
More optimization can be achieved by rearranging the expressions involved in the inequality and pre-computation:
lhs = A_3(:,1) - A_3(:,2:end);
lhsi = A_1(:,1) - A_1(:,2:end);
rhsj = A_2(:,2:end) - A_2(:,1);
Q = .95 * (n - 1);
for i=1:m
LHS = lhs + lhsi(i,:);
for j=1:m
if any (sum (LHS > rhsj(j,:), 2) < Q)
check(i) = 1;
break;
end
end
end
Note that because of the method that is used in the computation of quantile you may get a slightly different result.
Option 1:
Because all numbers are positive, you can do some optimizations. 95 percentile will be only higher if you add A1 to the mix - if you find the j and k of greatest 95 percentile of A2+A3 on the right side compared to the sum of the first 2 elements, you can simply take that for every i.
maxDif = -inf;
for j = 1 : m
for k = 1 : m
newDif = quantile(A_2..., 0.95) - A_2(j,1)-A_3(k,1);
maxDif = max(newDif, maxDif);
end
end
If even that is too slow, you can first get maxDifA2 and maxDifA3, then estimate that maxDif will be for those particular j and k values and calculate it.
Now, for some numbers you will get that maxDif > A_1, then the check is 1. For some numbers you will get that maxDif + quantile(A1, 0.95) < A_1, here check is 0 (if you estimated maxDif by separate calculation of A2 and A3 this isn't true!). For some (most?) you will unfortunately get values in between and this won't be helpful at all. Then what remains is option 2 (it is also more straightforward):
Option 2:
You could save some time if you can save summation A_2+A_3 on the right side, as that calculation repeats for every different i, but that requires A LOT of memory. But quantile is the more expensive operation anyway, so you aren't saving a lot of time. Something along the lines of
for j = 1 : m
for k = 1 : m
A23R(j,k,:) = A2(j,:)+A3(k,:); % Unlikely to fit in memory.
end
end
Then you can perform your loops, using A23R and avoiding to repeat that sum for every i.

Create a variable number of terms in an anonymous function that outputs a vector

I'd like to create an anonymous function that does something like this:
n = 5;
x = linspace(-4,4,1000);
f = #(x,a,b,n) a(1)*exp(b(1)^2*x.^2) + a(2)*exp(b(2)^2*x.^2) + ... a(n)*exp(b(n)^2*x.^2);
I can do this as such, without passing explicit parameter n:
f1 = #(x,a,b) a(1)*exp(-b(1)^2*x.^2);
for j = 2:n
f1 = #(x,a,b) f1(x,a,b) + a(j)*exp(b(j)^2*x.^2);
end
but it seems, well, kind of hacky. Does someone have a better solution for this? I'd like to know how someone else would treat this.
Your hacky solution is definitely not the best, as recursive function calls in MATLAB are not very efficient, and you can quickly run into the maximum recursion depth (500 by default).
You can introduce a new dimension along which you can sum up your arrays a and b. Assuming that x, a and b are row vectors:
f = #(x,a,b,n) a(1:n)*exp((b(1:n).^2).'*x.^2)
This will use the first dimension as summing dimension: (b(1:n).^2).' is a column vector, which produces a matrix when multiplied by x (this is a dyadic product, to be precise). The resulting n * length(x) matrix can be multiplied by a(1:n), since the latter is a matrix of size [1,n]. This vector-matrix product will also perform the summation for us.
Mini-proof:
n = 5;
x = linspace(-4,4,1000);
a = rand(1,10);
b = rand(1,10);
y = 0;
for k=1:n
y = y + a(k)*exp(b(k)^2*x.^2);
end
y2 = a(1:n)*exp((b(1:n).^2).'*x.^2); %'
all(abs(y-y2))<1e-10
The last command returns 1, so the two are essentially identical.

average number of different values in a column

I had a question in Matlab. It is so, I try to take average of the different number of values ​​in a column. For example, if we have the column below,
X = [1 1 2 3 4 3 8 2 1 3 5 6 7 7 5]
first I want to start by taking the average of 5 values ​​and plot them. In the case above, I should receive three averages that I could plot. Then take 10 values ​​at a time and so on.
I wonder if you have to write custom code to fix it.
The fastest way is probably to rearrange your initial vector X into some matrix, with each column storing the required values to average:
A = reshape(X, N, []);
where N is the desired number of rows in the new matrix, and the empty brackets ([]) tell MATLAB to calculate the number of columns automatically. Then you can average each column using mean:
X_avg = mean(A);
Vector X_avg stores the result. This can be done in one line like so:
X_avg = mean(reshape(X, N, []));
Note that the number of elements in X has to be divisible by N, otherwise you'll have to either pad it first (e.g with zeroes), or handle the "leftover" tail elements separately:
tail = mod(numel(X), N);
X_avg = mean(reshape(X(1:numel(X) - tail), N, [])); %// Compute average values
X_avg(end + 1) = mean(X(end - tail + 1:end)); %// Handle leftover elements
Later on you can put this code in a loop, computing and plotting the average values for a different value of N in each iteration.
Example #1
X = [1 1 2 3 4 3 8 2 1 3 5 6 7 7 5];
N = 5;
tail = mod(numel(X), N);
X_avg = mean(reshape(X(1:numel(X) - tail), N, []))
X_avg(end + 1) = mean(X(end - tail + 1:end))
The result is:
X_avg =
2.2000 3.4000 6.0000
Example #2
Here's another example (this time the length of X is not divisible by N):
X = [1 1 2 3 4 3 8 2 1 3 5 6 7 7 5];
N = 10;
tail = mod(numel(X), N);
X_avg = mean(reshape(X(1:numel(X) - tail), N, []))
X_avg(end + 1) = mean(X(end - tail + 1:end))
The result is:
X_avg =
2.8000 6.0000
This should do the trick:
For a selected N (the number of values you want to take the average of):
N = 5;
mean_vals = arrayfun(#(n) mean(X(n-1+(1:N))),1:N:length(X))
Note: This does not check if Index exceeds matrix dimensions.
If you want to skip the last numbers, this should work:
mean_vals = arrayfun(#(n) mean(X(n-1+(1:N))),1:N:(length(X)-mod(length(X),N)));
To add the remaining values:
if mod(length(X),N) ~= 0
mean_vals(end+1) = mean(X(numel(X)+1-mod(length(X),N):end))
end
UPDATE: This is a modification of Eitan's first answer (before it was edited). It uses nanmean(), which takes the mean of all values that are not NaN. So, instead of filling the remaining rows with zeros, fill them with NaN, and just take the mean.
X = [X(:); NaN(mod(N - numel(X), N), 1)];
X_avg = nanmean(reshape(X, N, []));
It would be helpful if you posted some code and point out exactly what is not working.
As a first pointer. If
X = [1 1 2 3 4 3 8 2 1 3 5 6 7 7 5]
the three means in blocks of 5 you are interested in are
mean(X(1:5))
mean(X(6:10))
mean(X(11:15))
You will have to come up with a for loop or maybe some other way to iterate through the indices.
I think you want something like this (I didn't use Matlab in a while, I hope the syntax is right):
X = [1 1 2 3 4 3 8 2 1 3 5 6 7 7 5],
currentAmount=5,
block=0,
while(numel(X)<=currentAmount)
while(numel(X)<=currentAmount+block*currentAmount)
mean(X(block*currentAmount+1:block*currentAmount+currentAmount));
block =block+1;
end;
currentAmount = currentAmount+5;
block=0;
end
This code will first loop through all elements calculating means of 5 elements at a time. Then, it will expand to 10 elements. Then to 15, and so on, until the number of elements from which you want to make the mean is bigger than the number of elements in the column.
If you are looking to average K random samples in your N-dimensional vector, then you could use:
N = length(X);
K = 20; % or 10, or 30, or any integer less than or equal to N
indices = randperm(N, K); % gives you K random indices from the range 1:N
result = mean(X(indices)); % averages the values of X at the K random
% indices from above
A slightly more compact form would be:
K = 20;
result = mean(X(randperm(length(X), K)));
If you are just looking to take every K consecutive samples from the list and average them then I am sure one of the previous answers will give you what you want.
If you need to do this operation a lot, it might be worth writing your own function for it. I would recommend using #EitanT's basic idea: pad the data, reshape, take mean of each column. However, rather than including the zero-padded numbers at the end, I recommend taking the average of the "straggling" data points separately:
function m = meanOfN(x, N)
% function m = meanOfN(x, N)
% create groups of N elements of vector x
% and return their mean
% if numel(x) is not a multiple of N, the last value returned
% will be for fewer than N elements
Nf = N * floor( numel( x ) / N ); % largest multiple of N <= length of x
xr = reshape( x( 1:Nf ), N, []);
m = mean(xr);
if Nf < N
m = [m mean( x( Nf + 1:end ) )];
end
This function will return exactly what you were asking for: in the case of a 15 element vector with N=5, it returns 3 values. When the size of the input vector is not a multiple of N, the last value returned will be the "mean of what is left".
Often when you need to take the mean of a set of numbers, it is the "running average" that is of interest. So rather than getting [mean(x(1:5)) mean(x(6:10)) mean(11:15))], you might want
m(1) = mean(x(1:N));
m(2) = mean(x(2:N+1));
m(3) = mean(x(3:N+2));
...etc
That could be achieved using a simple convolution of your data with a vector of ones; for completeness, here is a possible way of coding that:
function m = meansOfN(x, n)
% function m = meansOfN(x, n)
% taking the running mean of the values in x
% over n samples. Returns a row vector of size (sizeof(x) - n + 1)
% if numel(x) < n, this returns an empty matrix
mv = ones(N,1) / N; % vector of ones, normalized
m = convn(x(:), mv, 'valid'); % perform 1D convolution
With these two functions in your path (save them in a file called meanOfN.m and meansOfN.m respectively), you can do anything you want. In any program you will be able to write
myMeans = meanOfN(1:30, 5);
myMeans2 = meansOfN(1:30, 6);
etc. Matlab will find the function, perform the calculation, return the result. Writing your custom functions for specific operations like this can be very helpful - not only does it keep your code clean, but you only have to test the function once...

Looping with two variables from a vector

I have a 30-vector, x where each element of x follows a standardised normal distribution.
So in Matlab,
I have:
for i=1:30;
x(i)=randn;
end;
Now I want to create 30*30=900 elements from vector, x to make a 900-vector, C defined as follows:
I am unable to do the loop for two variables (k and l) properly. I have:
for k=1:30,l=1:30;
C(k,l)=(1/30)*symsum((x(i))*(x(i-abs(k-l))),1,30+abs(k-l));
end
It says '??? Undefined function or method 'symsum' for input arguments of type
'double'.'
I hope to gain from this a 900-vector, C which I will then rewrite as a matrix. The reason I have using two indices k and l instead of one is because I eventually want these indices to denote the (k,l)-entry of such a matrix so it is important that that my 900-vector will be in the form of C = [ row 1 row 2 row 3 ... row 30 ] so I can use the reshape tool i.e.
C'=reshape(C,30,30)
Could anyone help me with the code for the summation and getting such a 900 vector.
Let's try to make this a bit efficient.
n = 30;
x = randn(n,1);
%# preassign C for speed
C = zeros(n);
%# fill only one half of C, since it's symmetric
for k = 2:n
for l = 1:k-1
%# shift the x-vector by |k-l| and sum it up
delta = k-l; %# k is always larger than l
C(k,l) = sum( x(1:end-delta).*x(1+delta:end) );
end
end
%# fill in the other half of C
C = C + C';
%# add the diagonal (where delta is 0, and thus each
%# element of x is multiplied with itself
C(1:n+1:end) = sum(x.^2);
It seems to me that you want a matrix C of 30x30 elements.
Given the formula that you provided I would do
x = randn(1,30)
C = zeros(30,30)
for k=1:30
for l=1:30
v = abs(k-l);
for i =1:30-v
C(k,l) = C(k,l) + x(i)*x(i+v);
end
end
end
if you actually need the vector you can obtain it from the matrix.