I have an input 2D histogram that I want to do 2-fold cross-validation with. The problem is I don't know how to extract two mutually exclusive random samples of the data from a histogram. If it was a couple of lists of the positional information of each data point, that would be easy - shuffle the data in the lists in the same way, and split the lists equally.
So for a list I would do this:
list1 = [1,2,3,3,5,6,1];
list2 = [1,3,6,6,5,2,1];
idx = randperm(length(list1)); % ie. idx = [4 3 1 5 6 2 7]
shlist1 = list1(idx); % shlist1 = [3,3,1,5,6,2,1]
shlist2 = list2(idx); % shlist2 = [6,6,1,5,2,3,1]
slist1 = shlist1(1:3); % slist1 = [3,3,1]
elist1 = shlist1(4:6); % elist1 = [5,6,2,1]
slist2 = shlist2(1:3); % slist2 = [6,6,1]
elist2 = shlist2(4:6); % elist2 = [5,2,3,1]
But if this same data was presented to me as a histogram
hist = [2 0 0 0 0 0]
[0 0 0 0 0 1]
[0 1 0 0 0 0]
[0 0 0 0 0 0]
[0 0 0 0 1 0]
[0 0 2 0 0 0]
I want the result to be something like this
hist1 = [0 0 0 0 0 0]
[0 0 0 0 0 1]
[0 1 0 0 0 0]
[0 0 0 0 0 0]
[0 0 0 0 0 0]
[0 0 1 0 0 0]
hist2 = [2 0 0 0 0 0]
[0 0 0 0 0 0]
[0 0 0 0 0 0]
[0 0 0 0 0 0]
[0 0 0 0 1 0]
[0 0 1 0 0 0]
so that different halves of the data are randomly, and equally assigned to two new histograms.
Would this be equivalent to taking a random integer height of each bin hist(i,j), and adding that to the equivalent bin in hist1(i,j), and the difference to hist2(i,j)?
% hist as shown above
hist1 = zeros(6);
hist2 = zeros(6);
for i = 1:length(hist(:,1))*length(hist(1,:))
randNum = rand;
hist1(i) = round(hist(i)*randNum);
hist2(i) = hist(i) - hist1(i);
end
And if that is equivalent, is there a better way/built-in way of doing it?
My actual histogram is 300x300 bins, and contains about 6,000,000 data points, and it needs to be fast.
Thanks for any help :)
EDIT:
The suggested bit of code I made is not equivalent to taking a random sample of positional points from a list, as it does not maintain the overall probability density function of the data.
Halving the histograms should be fine for my 6,000,000 points, but I was hoping for a method that would still work for few points.
You can use rand or randi to generate two histograms. The first method is more efficient however the second is more random.
h = [[2 0 0 0 0 0]
[0 0 0 0 0 1]
[0 1 0 0 0 0]
[0 0 0 0 0 0]
[0 0 0 0 1 0]
[0 0 2 0 0 0]];
%using rand
h1 = round(rand(size(h)).*h);
h2 = h - h1;
%using randi
h1 = zeros(size(h));
for k = 1:numel(h)
h1(k) = randi([0 h(k)]);
end
h2 = h - h1;
Suppose H is your 2D histogram. The following code extracts a single random index with a probability proportional to the count at that index - which I think is what you want.
cc = cumsum(H(:));
if cc(1) ~= 0
cc = [0; cc];
end
m = cc(end);
ix = find(cc > m*rand, 1);
To extract multiple samples, you need to write your own find function (preferably a binary search for efficiency) that extracts some n number of samples in one call. This will give you a vector of indices (call it ix_vec) chosen with probability proportional to the Histogram count at each index.
Then if we denote by X the numerical values corresponding to each location in the Histogram, your random sample is:
R1 = X(ix_vec);
Repeat for the second random sample set.
Related
I would like to do an operation on a 2-D matrix which somehow looks like the outer product of a vector. I already have written some codes for this task, but it is pretty slow, so I would like to know if there is anything I can do to accelerate it.
I would like to show the code I wrote first, followed by an example to illustrate the task I wanted to do.
My code, version row-by-row
function B = outer2D(A)
B = zeros(size(A,1),size(A,2),size(A,2)); %Pre-allocate the output array
for J = 1 : size(A,1)
B(J,:,:) = transpose(A(J,:))*A(J,:); %Perform outer product on each row of A and assign to the J-th layer of B
end
end
Using the matrix A = randn(30000,20) as the input for testing, it spends 0.317 sec.
My code, version page-by-page
function B = outer2D(A)
B = zeros(size(A,1),size(A,2),size(A,2)); %Pre-allocate the output array
for J = 1 : size(A,2)
B(:,:,J) = repmat(A(:,J),1,size(A,2)).*A; %Evaluate B page-by-page
end
end
Using the matrix A = randn(30000,20) as the input for testing, it spends 0.146 sec.
Example 1
A = [3 0; 1 1; 1 0; -1 1; 0 -2]; %A is the input matrix.
B = outer2D(A);
disp(B)
Then I would expect
(:,:,1) =
9 0
1 1
1 0
1 -1
0 0
(:,:,2) =
0 0
1 1
0 0
-1 1
0 4
The first row of B, [9 0; 0 0], is the outer product of [3 0],
i.e. [3; 0]*[3 0] = [9 0; 0 0].
The second row of B, [1 1; 1 1], is the outer product of [1 1],
i.e. [1; 1]*[1 1] = [1 1; 1 1].
The third row of B, [1 0; 0 0], is the outer product of [1 0],
i.e. [1; 0]*[1 0] = [1 0; 0 0].
And the same for the remaining rows.
Example 2
A =
0 -1 -2
0 1 0
-3 0 2
0 0 0
1 0 0
B = outer2D(A)
disp(B)
Then, similar to the example 1, the expected output is
(:,:,1) =
0 0 0
0 0 0
9 0 -6
0 0 0
1 0 0
(:,:,2) =
0 1 2
0 1 0
0 0 0
0 0 0
0 0 0
(:,:,3) =
0 2 4
0 0 0
-6 0 4
0 0 0
0 0 0
Because the real input in my project is like in the size of 30000 × 2000 and this task is to be performed for many times. So the acceleration of this task is quite essential for me.
I am thinking of eliminating the for-loop in the function. May I have some opinions on this problem?
With auto expansion:
function B = outer2D(A)
B=permute(permute(A,[3 1 2]).*A',[2 3 1]);
end
Without auto expansion:
function B = outer2Dold(A)
B=permute(bsxfun(#times,permute(A,[3 1 2]),A'),[2 3 1]);
end
Outer products are not possible in the matlab language.
I have a cell array (11000x500) with three different type of elements.
1) Non-zero doubles
2) zero
3) Empty cell
I would like to find all occurances of a non-zero number between two zeros.
E.g. A = {123 13232 132 0 56 0 12 0 0 [] [] []};
I need the following output
out = logical([0 0 0 0 1 0 1 0 0 0 0 0]);
I used cellfun and isequal like this
out = cellfun(#(c)(~isequal(c,0)), A);
and got the follwoing output
out = logical([1 1 1 0 1 0 1 0 0 1 1 1]);
I need help to perform the next step where i can ignore the consecutive 1's and only take the '1's' between two 0's
Could someone please help me with this?
Thanks!
Here is a quick way to do it (and other manipulations binary data) using your out:
out = logical([1 1 1 0 1 0 1 0 0 1 1 1]);
d = diff([out(1) out]); % find all switches between 1 to 0 or 0 to 1
len = 1:length(out); % make a list of all indices in 'out'
idx = [len(d~=0)-1 length(out)]; % the index of the end each group
counts = [idx(1) diff(idx)]; % the number of elements in the group
elements = out(idx); % the type of element (0 or 1)
singles = idx(counts==1 & elements==1)
and you will get:
singles =
5 7
from here you can continue and create the output as you need it:
out = false(size(out)); % create an output vector
out(singles) = true % fill with '1' by singles
and you get:
out =
0 0 0 0 1 0 1 0 0 0 0 0
You can use conv to find the elements with 0 neighbors (notice that the ~ has been removed from isequal):
out = cellfun(#(c)(isequal(c,0)), A); % find 0 elements
out = double(out); % cast to double for conv
% elements that have more than one 0 neighbor
between0 = conv(out, [1 -1 1], 'same') > 1;
between0 =
0 0 0 0 1 0 1 0 0 0 0 0
(Convolution kernel corrected to fix bug found by #TasosPapastylianou where 3 consecutive zeros would result in True.)
That's if you want a logical vector. If you want the indices, just add find:
between0 = find(conv(out, [1 -1 1], 'same') > 1);
between0 =
5 7
Another solution, this completely avoids your initial logical matrix though, I don't think you need it.
A = {123 13232 132 0 56 0 12 0 0 [] [] []};
N = length(A);
B = A; % helper array
for I = 1 : N
if isempty (B{I}), B{I} = nan; end; % convert empty cells to nans
end
B = [nan, B{:}, nan]; % pad, and collect into array
C = zeros (1, N); % preallocate your answer array
for I = 1 : N;
if ~any (isnan (B(I:I+2))) && isequal (logical (B(I:I+2)), logical ([0,1,0]))
C(I) = 1;
end
end
C = logical(C)
C =
0 0 0 0 1 0 1 0 0 0 0 0
In each iteration I want to add 1 randomly to binary vector,
Let say
iteration = 1,
k = [0 0 0 0 0 0 0 0 0 0]
iteration = 2,
k = [0 0 0 0 1 0 0 0 0 0]
iteration = 3,
k = [0 0 1 0 0 0 0 1 0 0]
, that goes up to length(find(k)) = 5;
Am thinking of for loop but I don't have an idea how to start.
If it's important to have the intermediate vectors (those with 1, 2, ... 4 ones) as well as the final one, you can generate a random permutation and, in your example, use the first 5 indices one at a time:
n = 9; %// number of elements in vector
m = 5; %// max number of 1's in vector
k = zeros(1, n);
disp(k); %// output vector of all 0's
idx = randperm(n);
for p = 1:m
k(idx(p)) = 1;
disp(k);
end
Here's a sample run:
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 1
1 0 0 0 0 0 0 0 1
1 0 0 1 0 0 0 0 1
1 0 0 1 1 0 0 0 1
1 1 0 1 1 0 0 0 1
I wouldn't even use a loop. I would generate a random permutation of indices that sample from a vector going from 1 up to the length of k without replacement then just set these locations to 1. randperm suits the task well:
N = 10; %// Length 10 vector
num_vals = 5; %// 5 values to populate
ind = randperm(N, num_vals); %// Generate a vector from 1 to N and sample num_vals values from this vector
k = zeros(1, N); %// Initialize output vector k to zero
k(ind) = 1; %// Set the right values to 1
Here are some sample runs when I run this code a few times:
k =
0 0 1 1 0 1 1 0 0 1
k =
1 0 0 0 1 0 1 1 0 1
k =
1 0 0 0 1 0 1 1 0 1
k =
0 1 1 1 0 0 1 0 0 1
However, if you insist on using a loop, you can generate a vector from 1 up to the desired length, randomly choose an index in this vector then remove this value from the vector. You'd then use this index to set the location of the output:
N = 10; %// Length 10 vector
num_vals = 5; %// 5 values to populate
vec = 1 : N; %// Generate vector from 1 up to N
k = zeros(1, N); %// Initialize output k
%// Repeat the following for as many times as num_vals
for idx = 1 : num_vals
%// Obtain a value from the vector
ind = vec(randi(numel(vec), 1));
%// Remove from the vector
vec(ind) = [];
%// Set location in output to 1
k(ind) = 1;
end
The above code should still give you the desired effect, but I would argue that it's less efficient.
How would a list of points be converted to a binary matrix? Hopefully the operation will perform suitably for 640x640 images. Here is an example:
% the points
p = [2 2;2 3;3 3]
% the images is 4x4
img=zeros(4,4)
% set img to 1 for all points in p
??? this is the question?
% resulting binary image
img =
0 0 0 0
0 1 0 0
0 1 1 0
0 0 0 0
What about this:
linearInd = sub2ind(size(img), p(:,2), p(:,1));
img(linearInd) = 1;
You can simply do:
img(sub2ind(size(img), p(:,2), p(:,1))) = 1;
For example:
p = [2 2;2 3;3 3];
img = zeros(4,4);
img(sub2ind(size(img), p(:,2), p(:,1))) = 1
This would give you:
img =
0 0 0 0
0 1 0 0
0 1 1 0
0 0 0 0
Good day,
In Matlab I have got a matrix which is very sparse. Now I would like to plot the 'density' of the matrix. Let's say I have a matrix A:
A = [3 0 0
0 2 0
0 0 1];
Now the plot should look something like:
x
x
x
So there should be a dot (or something else) at each location (row, column) in which matrix A has got a nonzero value.
Any ideas?
spy is what you need:
% taken from MatLab documentation
B = bucky;
spy(B)
Consider something like this:
subs = zeros(0,2);
for ind = [find(A)']
[r,c] = ind2sub(size(A), ind);
subs = [subs; [r,c]];
end
scatter(subs(:,2), subs(:,1));
set(gca,'YDir','Reverse')
xlim([1 size(A,2)])
ylim([1 size(A,1)])
Which, for the matrix A:
0 1 0 1 1
0 0 0 0 0
0 1 0 0 0
0 1 0 1 1
0 0 1 1 0
Gives you the following scatter plot:
What about this :
A=[3 0 0; 0 2 0; 0 0 1];
imagesc(A)