I have a large dataset as below. From the data, I want to randomly sample based on 'id'. Since the data has 5 ids, I would like to sample 5 ids with replacement and produce a new dataset with observations of sampled ids.
id value var1 var2 …
1 1
1 2
1 3
1 4
2 5
2 6
2 7
3 8
3 9
3 10
4 11
4 12
4 13
5 14
5 15
5 16
Let's suppose, I randomly draw 5 values from 1 to 5 (because there are 5 unique ids) and the result is (2 4 3 2 1). Then, I would like to have this data
id value var1 var2 …
2 5
2 6
2 7
4 11
4 12
4 13
3 8
3 9
3 10
2 5
2 6
2 7
1 1
1 2
1 3
1 4
Here is a sample code for ids varying from 1 through 5.
% data = [1 1; 1 2; 1 3; 1 4; 2 5; 2 6; 2 7; 3 8; 3 9; 3 10; 4 11; 4 12; 4 13;...
% 5 14; 5 15; 5 16];
data = rand(10000000,10);
data(:,1) = randi([1,5], length(data),1);
% Get all the indices from the 1st column;
indxCell = cell(5,1);
for i=1:5
tmpIndx = find(data(:,1) == i);
indxCell{i} = tmpIndx;
end
% Rearrange the indices
randIndx = randperm(5);
randIndxCell = indxCell(randIndx, 1);
% Generate a vector of indices by rearranging the 1st column of data matrix.
numDataPts = length(data);
newIndices = zeros(numDataPts,1);
endIndx = 1;
for i=1:5
startIndx = endIndx;
endIndx = startIndx + length(randIndxCell{i});
newIndices(startIndx:endIndx-1, 1) = randIndxCell{i};
end
newData = data(newIndices,:);
For more unique ids, you could modify the code.
Edits: Modified the data size and also rewrote the 2nd for-loop.
Similarly to How to combine vectors of different length in a cell array into matrix in MATLAB I would like to combine matrix having different dimension, stored in a cell array, into a matrix having zeros instead of the empty spaces. Specifically, I have a cell array {1,3} having 3 matrix of size (3,3) (4,3) (4,3):
A={[1 2 3; 4 5 6; 7 8 9] [1 2 3; 4 5 6; 7 8 9; 9 9 9] [1 2 3; 4 5 6; 7 8 9; 4 4 4]}
and I would like to obtain something like:
B =
1 2 3 1 2 3 1 2 3
4 5 6 4 5 6 4 5 6
7 8 9 7 8 9 7 8 9
0 0 0 9 9 9 4 4 4
I tried using cellfun and cell2mat but I do not figure out how to do this. Thanks.
Even if other answers are good, I'd like to submit mine, using cellfun.
l = max(cellfun(#(x) length(x),A))
B = cell2mat(cellfun(#(x) [x;zeros(l-length(x),3)], A, 'UniformOutput', 0));
Using bsxfun's masking capability -
%// Convert A to 1D array
A1d = cellfun(#(x) x(:).',A,'Uni',0) %//'
%// Get dimensions of A cells
nrows = cellfun('size', A, 1)
ncols = cellfun('size', A, 2)
%// Create a mask of valid positions in output numeric array, where each of
%// those numeric values from A would be put
max_nrows = max(nrows)
mask = bsxfun(#le,[1:max_nrows]',repelem(nrows,ncols)) %//'
%// Setup output array and put A values into its masked positions
B = zeros(max_nrows,sum(ncols))
B(mask) = [A1d{:}]
Sample run
Input -
A={[1 2 3 5 6; 7 8 9 3 8] [1 2 3; 4 5 6; 7 8 9; 9 9 9] [1 2 3; 4 5 6; 7 8 9; 4 4 4]}
Output -
B =
1 2 3 5 6 1 2 3 1 2 3
7 8 9 3 8 4 5 6 4 5 6
0 0 0 0 0 7 8 9 7 8 9
0 0 0 0 0 9 9 9 4 4 4
I would be surprised if this is possible in one or a few lines. You will probably have to do some looping yourself. The following achieves what you want in the specific case of incompatible first dimension lengths:
A={[1 2 3; 4 5 6; 7 8 9] [1 2 3; 4 5 6; 7 8 9; 9 9 9] [1 2 3; 4 5 6; 7 8 9; 4 4 4]}
maxsize = max(cellfun(#(x) size(x, 1), A));
B = A;
for k = 1:numel(B)
if size(B{k}, 1) < maxsize
tmp = B{k};
B{k} = zeros(maxsize, size(tmp,1));
B{k}(1:size(tmp,1),1:size(tmp,2)) = tmp;
end
end
B = cat(2, B{:});
Now B is:
B =
1 2 3 1 2 3 1 2 3
4 5 6 4 5 6 4 5 6
7 8 9 7 8 9 7 8 9
0 0 0 9 9 9 4 4 4
I would do it using a good-old for loop, which is quite intuitive I think.
Here is the commented code:
clc;clear var
A={[1 2 3; 4 5 6; 7 8 9] [1 2 3; 4 5 6; 7 8 9; 9 9 9] [1 2 3; 4 5 6; 7 8 9; 4 4 4]};
%// Find the maximum rows and column # to initialize the output array.
MaxRow = max(cell2mat(cellfun(#(x) size(x,1),A,'Uni',0)));
SumCol = sum(cell2mat(cellfun(#(x) size(x,2),A,'Uni',0)));
B = zeros(MaxRow,SumCol);
%// Create a counter to keep track of the current columns to fill
ColumnCounter = 1;
for k = 1:numel(A)
%// Get the # of rows and columns for each cell from A
NumRows = size(A{k},1);
NumCols = size(A{k},2);
%// Fill the array
B(1:NumRows,ColumnCounter:ColumnCounter+NumCols-1) = A{k};
%// Update the counter
ColumnCounter = ColumnCounter+NumCols;
end
disp(B)
Output:
B =
1 2 3 1 2 3 1 2 3
4 5 6 4 5 6 4 5 6
7 8 9 7 8 9 7 8 9
0 0 0 9 9 9 4 4 4
[max_row , max_col] = max( size(A{1}) , size(A{2}) , size(A{3}) );
A{1}(end:max_row , end:max_col)=0;
A{2}(end:max_row , end:max_col)=0;
A{3}(end:max_row , end:max_col)=0;
B=[A{1} A{2} A{3}];
for this specific problem, simply this will do:
B=cat(1,A{:});
or what I often just give a try for 2D cells, and works for your example as well:
B=cell2mat(A');
if you literally don't give a f* what dimension it will be cut in (and you're exceedingly lazy): put the same into a try-catch-block and loop over some dims as below.
function A=cat_any(A)
for dims=1:10% who needs more than 10 dims? ... otherwise replace 10 with: max(cellfun(#ndims,in),[],'all')
try, A=cat(dims,A{:}); end
if ~iscell(A), return A; end
end
disp('Couldn''t cat!') %if we can't cat, tell the user
end
Beware, this might lead to unexpected results ... but in most cases simply just worked for me.
I have a matrix and want to add padding around it but the padded values have to be mirrored.
I have tried using A = padarray(B,[1 1],'symmetric','both');
but it mirrors the edge values of matrix B.
Meaning if
B = [1 2 3;
4 5 6;
7 8 9];
the result will be
A = [1 1 2 3 3;
1 1 2 3 3;
4 4 5 6 6;
7 7 8 9 9;
7 7 8 9 9]
But I need A to look like this:
A = [5 4 5 6 5;
2 1 2 3 2;
5 4 5 6 5;
8 7 8 9 8;
5 4 5 6 5]
Is there some function like padarray I can use for that or do I have to do it manually?
You could use symmetric with [2 2] and remove the extra parts,
B = [1 2 3; 4 5 6; 7 8 9];
c = padarray(B,[2 2],'both','symmetric');
c(end-1,:) = [];
c(:,end-1) = [];
c(:,2) = [];
c(2,:) = [];
gives,
c =
5 4 5 6 5
2 1 2 3 2
5 4 5 6 5
8 7 8 9 8
5 4 5 6 5
I have a large matrix (time x frequency), which I want to reduce partially. I want to sum every 1000 rows (time-samples) together keepinq the frequency information, it is kind of a segmentation.
Is there any way to do it without any cycle in MATLAB?
A smaller example:
M=[1 2 3; 2 3 4; 5 8 7; 5 6 7; 1 2 3; 1 2 4];
and I want to sum every 2 rows together so, that I get:
[3 5 7; 10 14 14; 2 4 7]
Suppose you have a matrix with N rows and M columns and you want to sum every R rows together (where N is divisible by R),
>> mat = [1 2 3; 2 3 4; 5 8 7; 5 6 7; 1 2 3; 1 2 4]
mat =
1 2 3
2 3 4
5 8 7
5 6 7
1 2 3
1 2 4
>> [N, M] = size(mat); %=> [6, 3]
>> R = 2;
The following will allow you to sum groups of R rows:
>> res = reshape(mat, R, [])
res =
1 5 1 2 8 2 3 7 3
2 5 1 3 6 2 4 7 4
>> res = sum(res)
res =
3 10 2 5 14 4 7 14 7
>> res = reshape(res, [], M)
res =
3 5 7
10 14 14
2 4 7
You can also do everything in one line:
>> reshape(sum(reshape(mat, R, [])), [], M)
ans =
3 5 7
10 14 14
2 4 7
Suppose I have the inputs data = [1 2 3 4 5 6 7 8 9 10]
and num = 4. I want to use these to generate the following:
i = [1 2 3 4 5 6; 2 3 4 5 6 7; 3 4 5 6 7 8; 4 5 6 7 8 9]
o = [5 6 7 8 9 10]
which is based on the following logic:
length of data = 10
num = 4
10 - 4 = 6
i = [first 6; second 6;... num times]
o = [last 6]
What is the best way to automate this in MATLAB?
Here's one option using the function HANKEL:
>> data = 1:10;
>> num = 4;
>> i = hankel(data(1:num),data(num:end-1))
i =
1 2 3 4 5 6
2 3 4 5 6 7
3 4 5 6 7 8
4 5 6 7 8 9
>> o = i(end,:)+1
o =
5 6 7 8 9 10