I was trying to resample (with replacement) my database using 'bootstrap' in Matlab as follows:
D = load('Data.txt');
lead = D(:,1);
depth = D(:,2);
X = D(:,3);
Y = D(:,4);
%Bootstraping to resample 100 times
[resampling100,bootsam] = bootstrp(100,'corr',lead,depth);
%plottig the bootstraping result as histogram
hist(resampling100,10);
... ... ...
... ... ...
Though the script written above is correct, I wonder how I would be able to see/load the resampled 100 datasets created through bootstrap? 'bootsam(:)' display the indices of the data/values selected for the bootstrap samples, but not the new sample values!! Isn't it funny that I'm creating fake data from my original data and I can't even see what is created behind the scene?!?
My second question: is it possible to resample the whole matrix (in this case, D) altogether without using any function? However, I know how to create random values from a vector data using 'unidrnd'.
Thanks in advance for your help.
The answer to question 1 is that bootsam provides the indices of the resampled data. Specifically, the nth column of bootsam provides the indices of the nth resampled dataset. In your case, to obtain the nth resampled dataset you would use:
lead_resample_n = lead(bootsam(:, n));
depth_resample_n = depth(bootsam(:, n));
Regarding the second question, I'm guessing what you mean is, how would you just get a re-sampled dataset without worrying about applying a function to the resampled data. Personally, I would use randi, but in this situation, it is irrelevant whether you use randi or unidrnd. An example follows that assumes 4 columns of some data matrix D (as in your question):
%# Build an example dataset
T = 10;
D = randn(T, 4);
%# Obtain a set of random indices, ie indices of draws with replacement
Ind = randi(T, T, 1);
%# Obtain the resampled data
DResampled = D(Ind, :);
To create multiple re-sampled data, you can simply loop over the creation of random indices. Or you could do it in one step by creating a matrix of random indices and using that to index D. With careful use of reshape and permute you can turn this into a T*4*M array, where indexing m = 1, ..., M along the third dimension yields the mth resampled dataset. Example code follows:
%# Build an example dataset
T = 10;
M = 3;
D = randn(T, 4);
%# Obtain a set of random indices, ie indices of draws with replacement
Ind = randi(T, T, M);
%# Obtain the resampled data
DResampled = permute(reshape(D(Ind, :)', 4, T, []), [2 1 3]);
Related
I would like your help to vectorise (or, more generally, make more efficient) a Matlab code where:
Step 1: I construct a matrix C of size sgxR, where each row contains a sequence of ones and zeros, according to whether certain logical conditions are satisfied.
Step 2: I identify the indices of the unique rows of C.
I now describe the code in more details.
Step 1: Creation of the matrix C. I divide this step in 3 sub-steps.
Step 1.a: Create the 1x3 cell U_grid. For j=1,2,3, U_grid{j} is a sg x K matrix of numbers.
clear
rng default
n_U_sets=3; %This parameter will not be changed
sg=4; %sg is usually quite large, for instance 10^6
K=5; %This parameter can range between 3 and 8
Ugrid=cell(n_U_sets,1);
for j=1:n_U_sets
Ugrid{j}=randn(sg,K);
end
Step 1.b: For each g=1,...,sg
Take the 3 rows Ugrid{1}(g,:), Ugrid{2}(g,:), Ugrid{3}(g,:).
Take all possible 1x3 rows that can be formed such that the first element is from Ugrid{1}(g,:), the second element is from
Ugrid{2}(g,:), and the third element is from Ugrid{3}(g,:). There are K^3 such rows.
Create the matrix D{g} storing row-wise all possible pairs of such 1x3 rows. D{g} will have size (K^3*(K^3-1)/2)x6
This is coded as:
%Row indices of all possible pairs of rows
[y, x] = find(tril(logical(ones(K^(n_U_sets))), -1));
indices_pairs = [x, y]; %K^3*(K^3-1)/2
%Create D{g}
for g=1:sg
vectors = cellfun(#(x) {x(g,:)}, Ugrid); %1x3
T_temp = cell(1,n_U_sets);
[T_temp{:}] = ndgrid(vectors{:});
T_temp = cat(n_U_sets+1, T_temp{:});
T = reshape(T_temp,[],n_U_sets);
D{g}=[T(indices_pairs(:,1),:) T(indices_pairs(:,2),:)]; %(K^3*(K^3-1)/2) x (6)
end
Step 1.c: From D create C. Let R=(K^3*(K^3-1)/2). R is the size of any D{g}. C is a sg x R matrix constructed as follows: for g=1,...,sg and for r=1,...,R
if D{g}(r,1)>=D{g}(r,5)+D{g}(r,6) or D{g}(r,4)<=D{g}(r,2)+D{g}(r,3)
then C(g,r)=1
otherwise C(g,r)=0
This is coded as:
R=(K^(n_U_sets)*(K^(n_U_sets)-1)/2);
C=zeros(sg,R);
for g=1:sg
for r=1:R
if D{g}(r,1)>=D{g}(r,5)+D{g}(r,6) || D{g}(r,4)<=D{g}(r,2)+D{g}(r,3)
C(g,r)=1;
end
end
end
Step 2: Assign the same index to any two rows of C that are equal.
[~,~,idx] = unique(C,"rows");
Question: Steps 1.b and 1.c are the critical ones. With sg large and K between 3 and 8, they take a lot of time, due to the loop and reshape. Do you see any way to simplify them, for instance by vectorising?
I am implementing a neural network and am trying to one hot encode a matrix of column vectors based on the max value in each column. Previously, I had been iterating through the matrix vector by vector, but I've been told that this is unnecessary and that I can actually one hot encode every column vector in the matrix at the same time. Unfortunately, after perusing SO, GitHub, and MathWorks, nothing seems to be getting the job done. I've listed my previous code below. Please help! Thanks :)
UPDATE:
This is what I am trying to accomplish...except this only changed the max value in the entire matrix to 1. I want to change the max value in each COLUMN to 1.
one_hots = bsxfun(#eq, mini_batch_activations, max(mini_batch_activations(:)))
UPDATE 2:
This is what I am looking for, but it only works for rows. I need columns.
V = max(mini_batch_activations,[],2);
idx = mini_batch_activations == V;
Iterative code:
% This is the matrix I want to one hot encode
mini_batch_activations = activations{length(layers)};
%For each vector in the mini_batch:
for m = 1:size(mini_batch_activations, 2)
% Isolate column vector for mini_batch
vector = mini_batch_activations(:,m);
% One hot encode vector to compare to target vector
one_hot = zeros(size(mini_batch_activations, 1),1);
[max_val,ind] = max(vector);
one_hot(ind) = 1;
% Isolate corresponding column vector in targets
mini_batch = mini_batch_y{k};
target_vector = mini_batch(:,m);
% Compare one_hot to target vector , and increment result if they match
if isequal(one_hot, target_vector)
num_correct = num_correct + 1;
endif
...
endfor
You’ve got the maxima for each column:
V = max(mini_batch_activations,[],1); % note 1, not 2!
Now all you need to do is equality comparison, the output is a logical array that readily converts to 0s and 1s. Note that MATLAB and Octave do implicit singleton expansion:
one_hot = mini_batch_activations==V;
I have a vector of numbers (temperatures), and I am using the MATLAB function mink to extract the 5 smallest numbers from the vector to form a new variable. However, the numbers extracted using mink are automatically ordered from lowest to largest (of those 5 numbers). Ideally, I would like to retain the sequence of the numbers as they are arranged in the original vector. I hope my problem is easy to understand. I appreciate any advice.
The function mink that you use was introduced in MATLAB 2017b. It has (as Andras Deak mentioned) two output arguments:
[B,I] = mink(A,k);
The second output argument are the indices, such that B == A(I).
To obtain the set B but sorted as they appear in A, simply sort the vector of indices I:
B = A(sort(I));
For example:
>> A = [5,7,3,1,9,4,6];
>> [~,I] = mink(A,3);
>> A(sort(I))
ans =
3 1 4
For older versions of MATLAB, it is possible to reproduce mink using sort:
function [B,I] = mink(A,k)
[B,I] = sort(A);
B = B(1:k);
I = I(1:k);
Note that, in the above, you don't need the B output, your ordered_mink can be written as follows
function B = ordered_mink(A,k)
[~,I] = sort(A);
B = A(sort(I(1:k)));
Note: This solution assumes A is a vector. For matrix A, see Andras' answer, which he wrote up at the same time as this one.
First you'll need the corresponding indices for the extracted values from mink using its two-output form:
[vals, inds] = mink(array);
Then you only need to order the items in val according to increasing indices in inds. There are multiple ways to do this, but they all revolve around sorting inds and using the corresponding order on vals. The simplest way is to put these vectors into a matrix and sort the rows:
sorted_rows = sortrows([inds, vals]); % sort on indices
and then just extract the corresponding column
reordered_vals = sorted_rows(:,2); % items now ordered as they appear in "array"
A less straightforward possibility for doing the sorting after the above call to mink is to take the sorting order of inds and use its inverse to reverse-sort vals:
reverse_inds = inds; % just allocation, really
reverse_inds(inds) = 1:numel(inds); % contruct reverse permutation
reordered_vals = vals(reverse_inds); % should be the same as previously
I have matrix X (100000 X 10) and vector Y (100000 X 1). X rows are categorical and assume values 1 to 5, and labels are categorical too (11 to 20);
The rows of X are repetitive and there are only ~25% of unique rows, I want Y to have statistical mode of all the labels for a particular unique row.
And then there comes another dataset P (90000 X 10), I want to predict labels Q based on the previous exercise.
What I tried is finding unique rows of X using unique in MATLAB, and then assign statistical mode of each of these labels for the unique rows. For P, I can use ismember and carry out the same.
The issue is in the size of the dataset and it takes an 1.5-2 hours to complete the process. Is there a vectorize version possible in MATLAB?
Here is my code:
[X_unique,~,ic] = unique(X,'rows','stable');
labels=zeros(length(X_unique),1);
for i=1:length(X_unique)
labels(i)=mode(Y(ic==i));
end
Q=zeros(length(P),1);
for j=1:length(X_unique)
Q(all(repmat(X_unique(j,:),length(P),1)==P,2))=label(j);
end
You will be able to accelerate your first loop a great deal if you replace it entirely with:
labels = accumarray(ic, Y, [], #(y) mode(y));
The second loop can be accelerated by using all(bsxfun(#eq, X_unique(i,:), P), 2) inside Q(...). This is a good vectorized approach assuming your arrays are not extremely large w.r.t. the available memory on your machine. In addition, to save more time, you could use the unique trick you did with X on P, run all the comparisons on a much smaller array:
[P_unique, ~, IC_P] = unique(P, 'rows', 'stable');
EDIT:
to compute Q_unique in the following way: and then convert it back to the full array using:
Q_unique = zeros(length(P_unique),1);
for i = 1:length(X_unique)
Q_unique(all(bsxfun(#eq, X_unique(i,:), P_unique), 2)) = labels(i)
end
and convert back to Q_full to match the original P input:
Q_full = Q_unique(IC_P);
END EDIT
Finally, if memory is an issue, in addition to everything above, you might want you use a semi-vectorized approach inside your second loop:
for i = 1:length(X_unique)
idx = true(length(P), 1);
for j = 1:size(X_unique,2)
idx = idx & (X_unique(i,j) == P(:,j));
end
Q(idx) = labels(i);
% Q(all(bsxfun(#eq, X_unique(i,:), P), 2)) = labels(i);
end
This would take about x3 longer compared with bsxfun but if memory is limited then you gotta pay with speed.
ANOTHER EDIT
Depending on your version of Matlab, you could also use containers.Map to your advantage by mapping textual representations of the numeric sequences to the calculated labels. See example below.
% find unique members of X to work with a smaller array
[X_unique, ~, IC_X] = unique(X, 'rows', 'stable');
% compute labels
labels = accumarray(IC_X, Y, [], #(y) mode(y));
% convert X to cellstr -- textual representation of the number sequence
X_cellstr = cellstr(char(X_unique+48)); % 48 is ASCII for 0
% map each X to its label
X_map = containers.Map(X_cellstr, labels);
% find unique members of P to work with a smaller array
[P_unique, ~, IC_P] = unique(P, 'rows', 'stable');
% convert P to cellstr -- textual representation of the number sequence
P_cellstr = cellstr(char(P_unique+48)); % 48 is ASCII for 0
% --- EDIT --- avoiding error on missing keys in X_map --------------------
% find which P's exist in map
isInMapP = X_map.isKey(P_cellstr);
% pre-allocate Q_unique to the size of P_unique (can be any value you want)
Q_unique = nan(size(P_cellstr)); % NaN is safe to use since not a label
% find the labels for each P_unique that exists in X_map
Q_unique(isInMapP) = cell2mat(X_map.values(P_cellstr(isInMapP)));
% --- END EDIT ------------------------------------------------------------
% convert back to full Q array to match original P
Q_full = Q_unique(IC_P);
This takes about 15 seconds to run on my laptop. Most of which is consumed by computation of mode.
I've got 2 different files, one of them is an input matrix (X) which has 3823*63 elements (3823 input and 63 features), the other one is a class vector (R) which has 3823*1 elements; those elements have values from 0 to 9 (there are 10 classes).
I have to compute covariance matrices for every classes. So far, i could only compute mean vectors for every classes with so many nested loops. However, it leads me to brain dead.
Is there any other easy way?
There is the code for my purpose (thanks to Sam Roberts):
xTra = importdata('optdigits.tra');
xTra = xTra(:,2:64); % first column's inputs are all zero
rTra = importdata('optdigits.tra');
rTra = rTra(:,65); % classes of the data
c = numel(unique(rTra));
for i = 1:c
rTrai = (rTra==i-1); % Get indices of the elements from the ith class
meanvect{i} = mean(xTra(rTrai,:)); % Calculate their mean
covmat{i} = cov(xTra(rTrai,:)); % Calculate their covariance
end
Does this do what you need?
X = rand(3263,63);
R = randi(10,3263,1)-1;
numClasses = numel(unique(R));
for i = 1:numClasses
Ri = (R==i); % Get indices of the elements from the ith class
meanvect{i} = mean(X(Ri,:)); % Calculate their mean
covmat{i} = cov(X(Ri,:)); % Calculate their covariance
end
This code loops through each of the classes, selects the rows of R that correspond to observations from that class, and then gets the same rows from X and calculates their mean and covariance. It stores them in a cell array, so you can access the results like this:
% Display the mean vector of class 1
meanvect{1}
% Display the covariance matrix of class 2
covmat{2}
Hope that helps!
Don't use mean and sum as a variable names because they are names of useful Matlab built-in functions. (Type doc mean or doc sum for usage help)
Also cov will calculate the covariance matrix for you.
You can use logical indexing to pull out the examples.
covarianceMatrices = cell(m,1);
for k=0:m-1
covarianceMatrices{k} = cov(xTra(rTra==k,:));
end
One-liner
covarianceMatrices = arrayfun(#(k) cov(xTra(rTra==k,:)), 0:m-1, 'UniformOutput', false);
First construct the data matrix for each class.
Second compute the covariance for each data matrix.
The code below does this.
% assume allData contains all the data you've read in, each row is one data point
% assume classVector contains the class of each data point
numClasses = 10;
data = cell(10,1); %use cells to store each of the data matrices
covariance = cell(10,1); %store each of the class covariance matrices
[numData dummy] = size(allData);
%get the data out of allData and into each class' data matrix
%there is probably a nice matrix way to do this, but this is hopefully clearer
for i = 1:numData
currentClass = classVector(i) + 1; %Matlab indexes from 1
currentData = allData(i,:);
data{currentClass} = [data{currentClass}; currentData];
end
%calculate the covariance matrix for each class
for i = 1:numClasses
covariance{i} = cov(data{i});
end