Generating all ordered samples with replacement - matlab

I would like to generate an array which contains all ordered samples of length k taken from a set of n elements {a_1,...,a_n}, that is all the k-tuples (x_1,...,x_k) where each x_j can be any of the a_i (repetition of elements is allowed), and whose total number is n^k.
Is there a built-in function in Matlab to obtain it?
I have tried to write a code that iteratively uses the datasample function, but I couldn't get what desired so far.

An alternative way to get all the tuples is based on k-base integer representation.
If you take the k-base representation of all integers from 0 to n^k - 1, it gives you all possible set of k indexes, knowing that these indexes start at 0.
Now, implementing this idea is quite straightforward. You can use dec2base if k is lower than 10:
X = A(dec2base(0:(n^k-1), k)-'0'+1));
For k between 10 and 36, you can still use dec2base but you must take care of letters as there is a gap in ordinal codes between '9' and 'A':
X = A(dec2base(0:(n^k-1), k)-'0'+1));
X(X>=17) = X(X>=17)-7;
Above 36, you must use a custom made code for retrieving the representation of the integer, like this one. But IMO you may not need this as 2^36 is quite huge.

What you are looking for is ndgrid: it generates the grid elements in any dimension.
In the case k is fixed at the moment of coding, get all indexes of all elements a this way:
[X_1, ..., X_k] = ndgrid(1:n);
Then build the matrix X from vector A:
X = [A(X_1(:)), ..., A(X_k(:))];
If k is a parameter, my advice would be to look at the code of ndgrid and adapt it in a new function so that the output is a matrix of values instead of storing them in varargout.

What about this solution, I don't know if it's as fast as yours, but do you think is correct?
function Y = ordsampwithrep(X,K)
%ordsampwithrep Ordered samples with replacement
% Generates an array Y containing in its rows all ordered samples with
% replacement of length K with elements of vector X
X = X(:);
nX = length(X);
Y = zeros(nX^K,K);
Y(1,:) = datasample(X,K)';
k = 2;
while k < nX^K +1
temprow = datasample(X,K)';
%checknew = find (temprow == Y(1:k-1,:));
if not(ismember(temprow,Y(1:k-1,:),'rows'))
Y(k,:) = temprow;
k = k+1;
end
end
end

Related

how to define an array with fractional index number

Like suppose that I need to create a function named pressure denoted by p (a 2-D matrix) which depends on 2 variables r and z.
u, v, w are linear matrices which also depend on 2 variables r and z.
r and z are linear matrix defined below take i={1,2,3,4,5,6,7,8,9,10}
r(i)=i/10
z(i)=i/10
u(i) = 2*r(i) + 3*z(i)
v(i) = 8*r(i) + 4*z(i)
w(i) = 3*r(i) + 2*z(i)
p = p(r,z) %, which is given as,
p(r(i),z(j)) = 2*v(i) - 4*u(i) + w(j)
Now suppose the value of p at a given location (r,z) say (0.4,0.8) is needed, I want that if I give the input p(0.4,0.8), I get the result.
In your case the easiest way is to convert the fractional numbers to integers by multiplying by 10.
This way the location (r,z) = (0.4, 0.8) will become (4,8).
If you don't want to remember every time to provide the locations multiplied by 10, just create a function that will do it for you, so you can call the function with the fractional location.
If your matrices are linear, you will always find a multiplying factor to get rid of the fractional coordinates.
Not entirely sure what you mean here, but if your matrix is only defined in the indices you give (i.e. you only want to draw values from the fixed set of indices you defined), then this should do it:
% the query indices
r_i = 0.4;
z_i = 0.8;
value = p(r_i*10,z_i*10);
if you want to look at values between the ones you defined, you need to look at interpolation:
% the query indices
r_i = 0.46;
z_i = 0.84;
value = interp2(r,z, p, r_i, z_i);
(I may have gotten r and z in that last function in the wrong order, try it out).

Assign labels based on given examples for a large dataset effectively

I have matrix X (100000 X 10) and vector Y (100000 X 1). X rows are categorical and assume values 1 to 5, and labels are categorical too (11 to 20);
The rows of X are repetitive and there are only ~25% of unique rows, I want Y to have statistical mode of all the labels for a particular unique row.
And then there comes another dataset P (90000 X 10), I want to predict labels Q based on the previous exercise.
What I tried is finding unique rows of X using unique in MATLAB, and then assign statistical mode of each of these labels for the unique rows. For P, I can use ismember and carry out the same.
The issue is in the size of the dataset and it takes an 1.5-2 hours to complete the process. Is there a vectorize version possible in MATLAB?
Here is my code:
[X_unique,~,ic] = unique(X,'rows','stable');
labels=zeros(length(X_unique),1);
for i=1:length(X_unique)
labels(i)=mode(Y(ic==i));
end
Q=zeros(length(P),1);
for j=1:length(X_unique)
Q(all(repmat(X_unique(j,:),length(P),1)==P,2))=label(j);
end
You will be able to accelerate your first loop a great deal if you replace it entirely with:
labels = accumarray(ic, Y, [], #(y) mode(y));
The second loop can be accelerated by using all(bsxfun(#eq, X_unique(i,:), P), 2) inside Q(...). This is a good vectorized approach assuming your arrays are not extremely large w.r.t. the available memory on your machine. In addition, to save more time, you could use the unique trick you did with X on P, run all the comparisons on a much smaller array:
[P_unique, ~, IC_P] = unique(P, 'rows', 'stable');
EDIT:
to compute Q_unique in the following way: and then convert it back to the full array using:
Q_unique = zeros(length(P_unique),1);
for i = 1:length(X_unique)
Q_unique(all(bsxfun(#eq, X_unique(i,:), P_unique), 2)) = labels(i)
end
and convert back to Q_full to match the original P input:
Q_full = Q_unique(IC_P);
END EDIT
Finally, if memory is an issue, in addition to everything above, you might want you use a semi-vectorized approach inside your second loop:
for i = 1:length(X_unique)
idx = true(length(P), 1);
for j = 1:size(X_unique,2)
idx = idx & (X_unique(i,j) == P(:,j));
end
Q(idx) = labels(i);
% Q(all(bsxfun(#eq, X_unique(i,:), P), 2)) = labels(i);
end
This would take about x3 longer compared with bsxfun but if memory is limited then you gotta pay with speed.
ANOTHER EDIT
Depending on your version of Matlab, you could also use containers.Map to your advantage by mapping textual representations of the numeric sequences to the calculated labels. See example below.
% find unique members of X to work with a smaller array
[X_unique, ~, IC_X] = unique(X, 'rows', 'stable');
% compute labels
labels = accumarray(IC_X, Y, [], #(y) mode(y));
% convert X to cellstr -- textual representation of the number sequence
X_cellstr = cellstr(char(X_unique+48)); % 48 is ASCII for 0
% map each X to its label
X_map = containers.Map(X_cellstr, labels);
% find unique members of P to work with a smaller array
[P_unique, ~, IC_P] = unique(P, 'rows', 'stable');
% convert P to cellstr -- textual representation of the number sequence
P_cellstr = cellstr(char(P_unique+48)); % 48 is ASCII for 0
% --- EDIT --- avoiding error on missing keys in X_map --------------------
% find which P's exist in map
isInMapP = X_map.isKey(P_cellstr);
% pre-allocate Q_unique to the size of P_unique (can be any value you want)
Q_unique = nan(size(P_cellstr)); % NaN is safe to use since not a label
% find the labels for each P_unique that exists in X_map
Q_unique(isInMapP) = cell2mat(X_map.values(P_cellstr(isInMapP)));
% --- END EDIT ------------------------------------------------------------
% convert back to full Q array to match original P
Q_full = Q_unique(IC_P);
This takes about 15 seconds to run on my laptop. Most of which is consumed by computation of mode.

Can someone help vectorise this matlab loop?

i am trying to learn how to vectorise matlab loops, so im just doing a few small examples.
here is the standard loop i am trying to vectorise:
function output = moving_avg(input, N)
output = [];
for n = N:length(input) % iterate over y vector
summation = 0;
for ii = n-(N-1):n % iterate over x vector N times
summation += input(ii);
endfor
output(n) = summation/N;
endfor
endfunction
i have been able to vectorise one loop, but cant work out what to do with the second loop. here is where i have got to so far:
function output = moving_avg(input, N)
output = [];
for n = N:length(input) % iterate over y vector
output(n) = mean(input(n-(N-1):n));
endfor
endfunction
can someone help me simplify it further?
EDIT:
the input is just a one dimensional vector and probably maximum 100 data points. N is a single integer, less than the size of the input (typically probably around 5)
i don't actually intend to use it for any particular application, it was just a simple nested loop that i thought would be good to use to learn about vectorisation..
Seems like you are performing convolution operation there. So, just use conv -
output = zeros(size(input1))
output(N:end) = conv(input1,ones(1,N),'valid')./N
Please note that I have replaced the variable name input with input1, as input is already used as the name of a built-in function in MATLAB, so it's a good practice to avoid such conflicts.
Generic case: For a general case scenario, you can look into bsxfun to create such groups and then choose your operation that you intend to perform at the final stage. Here's how such a code would look like for sliding/moving average operation -
%// Create groups of indices for each sliding interval of length N
idx = bsxfun(#plus,[1:N]',[0:numel(input1)-N]) %//'
%// Index into input1 with those indices to get grouped elements from it along columns
input1_indexed = input1(idx)
%// Finally, choose the operation you intend to perform and apply along the
%// columns. In this case, you are doing average, so use mean(...,1).
output = mean(input1_indexed,1)
%// Also pre-append with zeros if intended to match up with the expected output
Matlab as a language does this type of operation poorly - you will always require an outside O(N) loop/operation involving at minimum O(K) copies which will not be worth it in performance to vectorize further because matlab is a heavy weight language. Instead, consider using the
filter function where these things are typically implemented in C which makes that type of operation nearly free.
For a sliding average, you can use cumsum to minimize the number of operations:
x = randi(10,1,10); %// example input
N = 3; %// window length
y = cumsum(x); %// compute cumulative sum of x
z = zeros(size(x)); %// initiallize result to zeros
z(N:end) = (y(N:end)-[0 y(1:end-N)])/N; %// compute order N difference of cumulative sum

How to vectorize this Matlab loop

I need some help to vectorize the following operation since I'm a little confused.
So, I have a m-by-2 matrix A and n-by-1 vector b. I want to create a n-by-1 vector c whose entries should be the values of the second column of A whose line is given by the line where the correspondent value of b would fall...
Not sure if I was clear enough. Anyway, the code below does compute c correctly so you can understand what is my desired output. However, I want to vectorize this function since my real n and m are in the order of many thousands.
Note that values of bare non-integer and not necessarily equal to any of those in the first column of A (these ones could be non-integers too!).
m = 5; n = 10;
A = [(0:m-1)*1.1;rand(1,m)]'
b = (m-1)*rand(n,1)
[bincounts, ind] = histc(b,A(:,1))
for i = 1:n
c(i) = A(ind(i),2);
end
All you need is:
c = A(ind,2);

Permuting n elements by swapping each element by no more than k positions

What I have is a vector (n = 4 in the example):
x = '0123';
What I want is a vector y of the same size of x and with the same elements as in x in different order:
y = ['0123'; '0132'; '0213'; '0231'; '0312'; '0321'; '1023'; '1032'; '1203'; '1302'; '2013'; '2031'; '2103'; '2301'];
y(ceil(rand * numel(y(:, 1))), :)
i.e. a permutation such that each element in y is allowed to randomly change no more than k positions with respect to its original position in x (k = 2 in the example). The probability distribution must be uniform (i.e. each permutation must be equally likely to occur).
An obvious but inefficient way to do it is of course to find a random unconstrained permutation and check ex post whether or not this happens to respect the constraint. For small vectors you can find all the permutations, delete those that are not allowed and randomly pick among the remaining ones.
Any idea about how to do the same more efficiently, for example by actually swapping the elements?
Generating all the permutations can be done easily using constraint programming. Here is a short model using MiniZinc for the above example (note that we assume that x will contain n different values here):
include "globals.mzn";
int: k = 2;
int: n = 4;
array[1..n] of int: x = [0, 1, 2, 3];
array[1..n] of var int: y;
constraint forall(i in 1..n) (
y[i] in {x[i + offset] | offset in -min(k, i-1)..min(k, n-i)}
);
constraint all_different(y);
solve :: int_search(y, input_order, indomain_min, complete)
satisfy;
output [show(y)];
In most cases, constraint programming systems have the possibility to use a random search. However, this would not give you a uniform distribution of the results. Using CP will however generate all valid permutations more efficiently than the naive method (generate and test for validity).
If you need to generate a random permutation of your kind efficiently, I think that it would be possible to modify the standard Fisher-Yates shuffle to handle it directly. The standard algorithm uses the rest of the array to choose the next value from, and chooses the value with a probability distribution that is uniform. It should be possible to keep a list of only the currently valid choices, and to change the probability distribution of the values to match the desired output.
I don't see any approach other than the rejection method that you mention. However, instead of listing all allowed permutations and then picking one, it's more efficient to avoid that listing. Thus, you can randomly generate a permutation, check if it's valid, and repeat if it's not:
x = '0123';
k = 2;
n = numel(x);
done = 0;
while ~done
perm = randperm(n);
done = all( abs(perm-(1:n)) <= k ); %// check condition
end
y = x(perm);