How to modify a dataset to make a unique dataset in matlab - matlab

I have a matrix M in MATLAB, containing m x n numbers. I want to add very small noises to repeated rows to make it unique, i.e., size(M,1) = size(unique(M,'rows'),1).
EDIT:
I have tried this. But it is not deterministic.
while size(unique(allDataUnnormalized,'rows'),1)~=size(allDataUnnormalized,1)
[~, tmpDist] = knnsearch (allDataUnnormalized,allDataUnnormalized,'k',2);
importantIdx = find(tmpDist(:,2)==0);
allDataUnnormalized(importantIdx,:)=allDataUnnormalized(importantIdx,:)+rand(numel(importantIdx),NDims)*epsilon^4;
end

Adding noise is cheap, why wont you just try something like:
allDataUnnormalized = allDataUnnormalized + eps*rand(size(allDataUnnormalized));

Related

Efficient generation of permutation matrices in MATLAB

I'm trying to generate a 100-by-5 matrix where every line is a permutation of 1..100 (that is, every line is 5 random numbers from [1..100] without repetitions).
So far I've only been able to do it iteratively with a for-loop. Is there a way to do it more efficiently (using fewer lines of code), without loops?
N = 100;
T = zeros(N, 5);
for i = 1:N
T(i, :) = randperm(100, 5);
end
Let
N = 100; % desired number of rows
K = 5; % desired number of columns
M = 100; % size of population to sample from
Here's an approach that's probably fast; but memory-expensive, as it generates an intermediate M×N matrix and then discards N-K rows:
[~, result] = sort(rand(N, M), 2);
result = result(:, 1:K);
There is very little downside to using a loop here, at least in this minimal example. Indeed, it may well be the best-performing solution for MATLAB's execution engine. But perhaps you don't like assigning the temporary variable i or there are other advantages to vectorization in your non-minimal implementation. Consider this carefully before blindly implementing a solution.
You need to call randperm N times, but each call has no dependency on its position in the output. Without a loop index you will need something else to regulate the number of calls, but this can be just N empty cells cell(N,1). You can use this cell array to evaluate a function that calls randperm but ignores the contents (or, rather, lack of contents) of the cells, and then reassemble the function outputs into one matrix with cell2mat:
T = cell2mat(cellfun(#(~) {randperm(100,5)}, cell(N,1)));

Can someone help vectorise this matlab loop?

i am trying to learn how to vectorise matlab loops, so im just doing a few small examples.
here is the standard loop i am trying to vectorise:
function output = moving_avg(input, N)
output = [];
for n = N:length(input) % iterate over y vector
summation = 0;
for ii = n-(N-1):n % iterate over x vector N times
summation += input(ii);
endfor
output(n) = summation/N;
endfor
endfunction
i have been able to vectorise one loop, but cant work out what to do with the second loop. here is where i have got to so far:
function output = moving_avg(input, N)
output = [];
for n = N:length(input) % iterate over y vector
output(n) = mean(input(n-(N-1):n));
endfor
endfunction
can someone help me simplify it further?
EDIT:
the input is just a one dimensional vector and probably maximum 100 data points. N is a single integer, less than the size of the input (typically probably around 5)
i don't actually intend to use it for any particular application, it was just a simple nested loop that i thought would be good to use to learn about vectorisation..
Seems like you are performing convolution operation there. So, just use conv -
output = zeros(size(input1))
output(N:end) = conv(input1,ones(1,N),'valid')./N
Please note that I have replaced the variable name input with input1, as input is already used as the name of a built-in function in MATLAB, so it's a good practice to avoid such conflicts.
Generic case: For a general case scenario, you can look into bsxfun to create such groups and then choose your operation that you intend to perform at the final stage. Here's how such a code would look like for sliding/moving average operation -
%// Create groups of indices for each sliding interval of length N
idx = bsxfun(#plus,[1:N]',[0:numel(input1)-N]) %//'
%// Index into input1 with those indices to get grouped elements from it along columns
input1_indexed = input1(idx)
%// Finally, choose the operation you intend to perform and apply along the
%// columns. In this case, you are doing average, so use mean(...,1).
output = mean(input1_indexed,1)
%// Also pre-append with zeros if intended to match up with the expected output
Matlab as a language does this type of operation poorly - you will always require an outside O(N) loop/operation involving at minimum O(K) copies which will not be worth it in performance to vectorize further because matlab is a heavy weight language. Instead, consider using the
filter function where these things are typically implemented in C which makes that type of operation nearly free.
For a sliding average, you can use cumsum to minimize the number of operations:
x = randi(10,1,10); %// example input
N = 3; %// window length
y = cumsum(x); %// compute cumulative sum of x
z = zeros(size(x)); %// initiallize result to zeros
z(N:end) = (y(N:end)-[0 y(1:end-N)])/N; %// compute order N difference of cumulative sum

MATLAB: I want to threshold a matrix, based on thresholds in a vector, without a for loop. Possible?

Let us say I have the following:
M = randn(10,20);
T = randn(1,20);
I would like to threshold each column of M, by each entry of T. For example, find all indicies of all elements of M(:,1) that are greater than T(1). Find all indicies of all elements in M(:,2) that are greater than T(2), etc etc.
Of course, I would like to do this without a for-loop. Is this possible?
You can use bsxfun like this:
I = bsxfun(#gt, M, T);
Then I will be a logcial matrix of size(M) with ones where M(:,i) > T(i).
You can use bsxfun to do things like this, but it may not be faster than a for loop (more below on this).
result = bsxfun(#gt,M,T)
This will do an element wise comparison and return you a logical matrix indicating the relationship governed by the first argument. I have posted code below to show the direct comparison, indicating that it does return what you are looking for.
%var declaration
M = randn(10,20);
T = randn(1,20);
% quick method
fastres = bsxfun(#gt,M,T);
% looping method
res = false(size(M));
for i = 1:length(T)
res(:,i) = M(:,i) > T(i);
end
% check to see if the two matrices are identical
isMatch = all(all(fastres == res))
This function is very powerful and can be used to help speed up processes, but keep in mind that it will only speed things up if there is a lot of data. There is a bit of background work that bsxfun must do, which can actually cause it to be slower.
I would only recommend using it if you have several thousand data points. Otherwise, the traditional for-loop will actually be faster. Try it out for yourself by changing the size of the M and T variables.
You can replicate the threshold vector and use matrix comparison:
s=size(M);
T2=repmat(T, s(1), 1);
M(M<T2)=0;
Indexes=find(M);

How should I progressively add results to a matrix?

I want to initialise a matrix in MATLAB and add things to it with a loop. I am unsure of how big it should be to start off with, but I want to be able to add as many sub-matrices to it as is required.
You can define it empty:
matrix = [];
and then append rows, columns, or submatrices:
matrix = [matrix; newSubMatrix];
matrix = [matrix, newSubMatrix];
However, enlarging the matrix this way causes Matlab to reallocate memory. If this happens at each loop iteration your code will be slow.
A better approach is to initialize to an approximate size:
matrix = zeros(M,N);
and then fill elements in:
matrix(m,n) = exampleEntry;
matrix(m,:) = exampleRow;
matrix(:,n) = exampleCol;
This way, only if m or n get larger than M and N does Matlab need to enlarge the matrix.
I would suggest to initialise a larger matrix:
x=nan(n,m)
After adding your data, cut it:
[a,b]=ind2sub(size(x),find(~isnan(x),1,'last'))
x=x(1:a,1:b)
This assumes you do not use nan in your data.

Create matrix of matrix by loading data "MATLAB"

I want make a vector of matrix with loading data of text file.
I am using cat.
n : number of matrices.
p : number of columns of matrices.
every matrix has 4 row.
for example I have 1200 numbers in one text file and p is 3, so n=100.
How can make it?!
This is what I tried to do:
X = cat(n,[1...p; ; ; ],...,[ ; ; ; ]);
The description is a bit vague, but here is what I would recommend:
Read all the data into matlab (It seems like you know how to do this)
Put everything in one big matrix or vector
Only after putting everything together, use the reshape command
In your case you may want to do something like this for step 3:
raw = rand(1200,1); %Assuming your data looks something like this
X = reshape(raw,[],4,3);
For 1200 values this will give you a 100x4x3 answer. Just make sure it is a nice multiple of 4x3 if you apply reshape like this.
Update
Apparently this was the variation the asker was looking for, a 4x3x100 matrix:
X =reshape(r,4,3,100)
If you know the number of matrices (n), you can store it into a cell array like this
myCellArray = cell(n,1);
for it = 1:n
myCellArray{it} = (...) %Load matrix whatever how you do it (Load, fread,...)
end
or just by using dynamic allocation, but not really efficient,
myArray = [];
myArray = [myArray newLoadArray];