Multiple sampling with different sizes on Matlab - matlab

I am trying to implement this code so it works as quickly as possible.
Say I have a population of 100 different values, you can think of it as pop = 1:100 or pop = randn(1,100) to keep things simple. I have a vector n which gives me the size of samples I want to get. Say, for example, that n=[1 3 10 6 2]. What I want to do is to take 5 (which in reality is length(n)) different samples of pop, each consisting of n(i) elements without replacement. This means that for my first sample I want 1 element out of pop, for the second sample I want 3, for the third I want 10, and so on.
To be honest, I am not really interested in which elements are sampled. What I want to get is the sum of those elements that are present in the ith-sample. This would be trivial if I implemented it with a loop, but I am trying to avoid using them to keep my code as quick as possible. I have to do this for many different populations and with length(n)being very large.
If I had to do it with a loop, this would be how:
pop = randn(1,100);
n = [1 3 10 6 2];
sum_sample = zeros(length(n),1);
for i = 1:length(n)
sum_sample(i,1) = sum(randsample(pop,n(i)));
end
Is there a way to do this?

The only way to figure out what is fastest for you is to do a comparison of the different methods.
In fact the loop appears to be very fast in this case!
pop = randn(1,100);
n = [1 3 10 6 2];
tic
sr = #(n) sum(randsample(pop,n));
sum_sample = arrayfun(sr,n);
toc %% Returns about 0.004
clear su
tic
for t=numel(n):-1:1
su(t)=sum(randsample(pop,n(t)));
end
toc %% Returns about 0.003

You can create a function handle which choses the random samples and sums these up. Then you can use arrayfun to execute this function for all values of n:
pop = randn(1,100);
n = [1 3 10 6 2];
sr = #(n) sum(randsample(pop,n));
sum_sample = arrayfun(sr,n);

You can do something like this:
pop = randn(1,100);
n = [1 3 10 6 2];
sampled_data_index = randi(length(pop),1,sum(n));
sampled_data = pop(sampled_data_index);
The randi function randomly selects integer values in a specified range that is suitable for indexing. After you have the indices you can use those at once to sample the data from the pop database.
If you want to have unique indices you can replace the randi function with randperm:
sampled_data_index = randperm(length(pop),sum(n));
Finally:
You can have all the sampled values as a cell variable using the following code:
pop = randn(1,100);
n = [1 3 10 6 2];
fun = #(m) pop(randperm(length(pop),m));
C = arrayfun(fun,n,'UniformOutput',0)
Also having the sum of the sampled data:
funs = #(m) sum(pop(randperm(length(pop),m)));
sumC = arrayfun(funs,n)

Related

Calculate the set of autocorrelation functions and then sum them

Good evening! I have a 3D vector. It has the first dimension 1. For clarity, I set it exactly the same as used in my program. "с" is like a number of experiments, in this case there are three, so I calculate the correlation function three times and then add them up.
In fact, the number of experiments is 100. I have to calculate 100 correlation functions and add them.
Tell me how you can do it automatically. And if possible, then no cycles. Thank you.
And yes, in the beginning I set the 3D vector using a loop. Is it possible to set it without a loop as well? This is certainly not my main question, but I would also like to know the answer to it.
d = [1 2 3];
c = [4 2 6];
for i = 1: length(c)
D(1,:,i) = d.*c(i);
end
D
X1 = xcorr(D(:,:,1));
X2 = xcorr(D(:,:,2));
X3 = xcorr(D(:,:,3));
X = X1+X2+X3;
With the help of a loop, my solution looks like this:
d = [1 2 3];
c = [4 2 6];
for i = 1: length(c)
D(1,:,i) = d.*c(i);
x(:,:,i) = xcorr(D(:,:,i));
end
X = sum(x,3)
It seems to be correct. Is it possible to do this without a cycle?
You can easily set your first array D without any loop, even though I don't know why you want to keep the first singleton dimension...
D(1, :, :) = d'.*c;
As for the sum of the autocorrelations, I'm not sure you can do it without a loop. The only thing that you can perhaps do is to not use an array to store the correlation for each index (if memory consumption is a problem for you) and just update the sum:
X = zeros(1, 2*length(d)-1); % initialize the sum array
for i = 1:length(c)
X = X + xcorr(D(:, :, i)); % update the sum
end

Implementing this equation in MATLAB with a for loop?

I am looking to implement the following equation in MATLAB since I have a very large matrix,
How would I be able to do this? It is not really about the 261 and for the sake of simplicity, we can assume d = 0.94, and there is no need to worry about the squared term nor mean term as I will be able to figure that out if I can get the loop concept down. So for instance, I will just try to calculate an average of all the past values in the rows with specific weights attached to them.
To clarify, we can essentially think of i as indexing the rows of a matrix and so this consists of an entire column which I provided as an example below. Ignoring the infinity, we can just sum it to period t, but the idea is that there is a certain weight placed on all the previous values of the rows where the most recent row has the greatest weight.
I was thinking of using something like this:
R = [1; 2; 3; 4; 5; 6; 7; 8; 9; 10];
d = 0.94;
r = zeros(10,1);
for t = 2:10
r(t,1) = R(t,1);
for i = 1:10
W(i,1) = (1-d)*(d.^i)*r(t,1);
end
end
Or even indexing t = 1:10.
None of these works. In essence, I want to be able to calculate a mean for which there is greater weight placed to the most recent value. So for example, at row t=4, the value I would get would be:
(1-0.94)(0.94^3)*(1) + (1-0.94)(0.94^2)(2) +(1-0.94)(0.94)(3).
Right, if I understand you correctly, I think the following should work:
R = [1 2 3 4 5 6 7 8 9 10];
d = 0.94;
W = zeros(size(R));
% at t = 1, sigma will be 0
for t = 2:length(R)
meanR = mean(R(1:t-1));
for i = 1:t-1
W(t) = W(t) + 261*(1-d)*(d.^(t-i))*(R(i) - meanR)^2;
end
end

Generate numbers randomly from a set?

In MATLAB, I have a set of P numbers. I would like to generate a random array of size N from this set.
For the sake of example, let say I have the set {1, 4}. Let say I would like to generate an array of size 5 (e.g., [1 1 4 1 4]).
What I did is this: I generated the following array using randi.
N = 5;
v = randi([1 4],[1 N]);
The problem is that I got a random array which contains values in 1:4 and not in {1, 4}.
I can simply do this but I need a better way.
for i = 1:length(v)
if v(i) ~= 1 || v(i) ~= 4
v(i) = 1; % or v(i) = 4
end
end
I think I am missing a simple hint here.
You should use datasample,
y = datasample(data,k) returns k observations sampled uniformly at random, with replacement, from the data in data.
a = [1,4];
datasample(a,5)
Depending on the usage, you might consider using,
datasample(unique(a),5)
If you don't have the Statistics Toolbox (which contains the datasample function), you can use randi:
N = 5; %// desired number of samples
data = [1 4]; %// data values
sample = data(randi(numel(data),1,N));
And if you use a very old version of Matlab that doesn't have randi, you can employ rand:
sample = data(ceil(numel(data)*rand(1,N)));

Efficient aggregation of high dimensional arrays

I have a 3 dimensional (or higher) array that I want to aggregate by another vector. The specific application is to take daily observations of spatial data and average them to get monthly values. So, I have an array with dimensions <Lat, Lon, Day> and I want to create an array with dimensions <Lat, Lon, Month>.
Here is a mock example of what I want. Currently, I can get the correct output using a loop, but in practice, my data is very large, so I was hoping for a more efficient solution than the second loop:
% Make the mock data
A = [1 2 3; 4 5 6];
X = zeros(2, 3, 9);
for j = 1:9
X(:, :, j) = A;
A = A + 1;
end
% Aggregate the X values in groups of 3 -- This is the part I would like help on
T = [1 1 1 2 2 2 3 3 3];
X_agg = zeros(2, 3, 3);
for i = 1:3
X_agg(:,:,i) = mean(X(:,:,T==i),3);
end
In 2 dimensions, I would use accumarray, but that does not accept higher dimension inputs.
Before getting to your answer let's first rewrite your code in a more general way:
ag = 3; % or agg_size
X_agg = zeros(size(X)./[1 1 ag]);
for i = 1:ag
X_agg(:,:,i) = mean(X(:,:,(i-1)*ag+1:i*ag), 3);
end
To avoid using the for loop one idea is to reshape your X matrix to something that you can use the mean function directly on.
splited_X = reshape(X(:), [size(X_agg), ag]);
So now splited_X(:,:,:,i) is the i-th part
that contains all the matrices that should be aggregated which is X(:,:,(i-1)*ag+1:i*ag)) (like above)
Now you just need to find the mean in the 3rd dimension of splited_X:
temp = mean(splited_X, 3);
However this results in a 4D matrix (where its 3rd dimension size is 1). You can again turn it into 3D matrix using reshape function:
X_agg = reshape(temp, size(X_agg))
I have not tried it to see how much more efficient it is, but it should do better for large matrices since it doesn't use for loops.

How do I compare elements of one row with every other row in the same matrix

I have the matrix:
a = [ 1 2 3 4;
2 4 5 6;
4 6 8 9]
and I want to compare every row with every other two rows one by one. If they share the same key then the result will tell they have a common key.
Using #gnovice's idea of getting all combinations with nchoosek, I propose yet another two solutions:
one using ismember (as noted by #loren)
the other using bsxfun with the eq function handle
The only difference is that intersect sorts and keeps only the unique common keys.
a = randi(30, [100 20]);
%# a = sort(a,2);
comparisons = nchoosek(1:size(a,1),2);
N = size(comparisons,1);
keys1 = cell(N,1);
keys2 = cell(N,1);
keys3 = cell(N,1);
tic
for i=1:N
keys1{i} = intersect(a(comparisons(i,1),:),a(comparisons(i,2),:));
end
toc
tic
for i=1:N
query = a(comparisons(i,1),:);
set = a(comparisons(i,2),:);
keys2{i} = query( ismember(query, set) ); %# unique(...)
end
toc
tic
for i=1:N
query = a(comparisons(i,1),:);
set = a(comparisons(i,2),:)';
keys3{i} = query( any(bsxfun(#eq, query, set),1) ); %'# unique(...)
end
toc
... with the following time comparisons:
Elapsed time is 0.713333 seconds.
Elapsed time is 0.289812 seconds.
Elapsed time is 0.135602 seconds.
Note that even by sorting a beforehand and adding a call to unique inside the loops (commented parts), these two methods are still faster than intersect.
Here's one solution (which is generalizable to larger matrices than the sample in the question):
comparisons = nchoosek(1:size(a,1),2);
N = size(comparisons,1);
keys = cell(N,1);
for i = 1:N
keys{i} = intersect(a(comparisons(i,1),:),a(comparisons(i,2),:));
end
The function NCHOOSEK is used to generate all of the unique combinations of row comparisons. For the matrix a in your question, you will get comparisons = [1 2; 1 3; 2 3], meaning that we will need to compare rows 1 and 2, then 1 and 3, and finally 2 and 3. keys is a cell array that stores the results of each comparison. For each comparison, the function INTERSECT is used to find the common values (i.e. keys). For the matrix a given in the question, you will get keys = {[2 4], 4, [4 6]}.