labelling chunks of data after finding indices - matlab

I would like to find certain indices of a variable called "heading". I want to find the indices referring to the value in "heading" between 0 and 20, 20 and 40, 40 and 60, etc. I then want to extract other variables depending on these indices, such as speed. I have written this, but I guess is inefficient, so would like to but it into some kind of loop or indexing algorithm.
heading_index1 = find(heading>0 & heading<=20);
heading_index2 = find(heading>20 & heading<=40);
heading_index3 = find(heading>40 & heading<=60);
etc
speed1 = speed(heading_index1);
speed2 = speed(heading_index2);
speed3 = speed(heading_index3);
etc

it is more efficient if you use logical-indexing directly (so no find command in between). If you want to automatize the whole thing, you can use a struct or cells:
% random vectors to create a minimum working example
heading = rand(100,1)*100;
speed = rand(size(heading));
% limit vector
lim = [0 20 40 60];
% declare storage structures
S = struct();
C = cell(1,length(lim)-1);
% loop through limits
for i = 1:length(lim)-1
% logical vector
lg = heading > lim(i) & heading <= lim(i+1);
% index (its faster to use logical vectors for indexing than integers)
chunk = speed(lg);
% assign to struct
fld = num2str(i,'F%d');
S.(fld) = chunk;
% assign to cell
C{i} = chunk;
end
You can choose if you like the structure-way or the cell-way. It shouldn't make a difference in terms of memory space.
Personally, I prefer the struct as I can define names, which I can interpret but it is also a bit tedious to create the field name rather than just index a cell

Related

MATLAB find on multiple columns to multiple column result

Setup
I have an array of captured data. The data may be captured on just 1 device or up to a dozen devices, with each device being a column in the array. I have a prior statement which I execute on the array to then turn it into a logical array to find particular points of interest in the data. Due to the nature of the data, there are many 0's and only a few 1's. I need to return an array with the indices of the 1's so I can go back and capture the data between those points (see update below).
find is an obvious choice for a function - however, the result I need, needs to have 1 column for each device. Normally find will do a linear index regardless of the dimensions of the array.
The devices follow a pattern - but aren't exactly the same. So, complicating this is the fact that the number of 1's in each column is close to, but not guaranteed to be exactly the same depending on the exact timing the data capture is stopped (they are most often different from each other by 1 element, but could be different by more).
MATLAB CODE ATTEMPTS
Because of that difference, I can't use the following simple code:
for p = 1:np
indices( :, p ) = find( device.data.cross( :, p ) );
end
Notes:
np is the number of columns in the data = number of devices captured.
devices is a class representing the collection of devices
data is a TimeTable containing captured data on all the devices
cross is a column in the data TimeTable which contains the logical array
Even this simple code is inefficient and generates the Code Analyzer warning:
The variable 'indices' appears to change size on every loop
iteration (within a script). Consider preallocating for speed.
As expected, it doesn't work as I get an error similar to the following:
Unable to perform assignment because the size of the left side is
448-by-1 and the size of the right side is 449-by-1.
I know why I get this error - each column in an array in MATLAB has to have the same number of rows, so I can't make the assignment if the row size doesn't match. I need to pad the "short" columns somehow. In this case, repeating the last index will work for my later operations without causing an error.
I can't figure out a "good" way to do this. I can't pre-populate the array rows because I don't know how many rows there will be until I've done the find operation.
I can change the code as follows:
indices = [];
for p = 1:np
tempindices = find( devices.data.cross(:, p) );
sizediff = size( tempindices, 1 ) - size( indices, 1 );
if p > 1
if sizediff > 0
padding = repmat(indices(end, 1:(p - 1)), sizediff, 1);
indices = [indices; padding];
elseif sizediff < 0
padding = repmat(tempindices(end), abs(sizediff), 1);
tempindices = [tempindices; padding];
end
end
indices(:,p) = tempindices;
end
Note: padarray would have been useful here, but I don't have the Image Processing Toolbox so I cannot use it.
This code works, but it is very inefficient, it creates multiple otherwise unneeded variables in the workspace and generates multiple "appears to change size on every loop iteration" warnings in Code Analyzer. Is there a more efficient way to do this?
Update / Additional Information:
Some more context is needed for my issue. Given that devices.data.cross is a logical array, to just "pick" the data I want from other columns in my table (as I originally described my problem) I could select a column from devices.data.cross and pass that logical column as a subscript to get that data. I do that where it works. However, for some of the columns I need to select "chunks" of the data between the indices and that's where (I think) I need the indices. Or, at least I don't know of another way to do it.
Here is example code of how I use the indices:
for p = 1:np
for i = 2:num_indices
these_indices = indices(i-1, p):( indices(i, p) - 1 );
rmsvoltage = sqrt( mean( devices.data.voltage(these_indices).^2 ) );
end
end
This is just one routine I do on the "chunks" of data. I also have a couple of functions where these chunks of data are passed for processing.
When I understood your problem correctly, the code below should work. I'm using the approach that Cris Luengo suggested in a comment under your question.
Key element is [rowIdcs, colIdcs] = find( cros ); which gives you the subscripts of positions in cros having a value of one. Please find further comments inline.
% Create some data for testing
volt = randn(10,10);
cros = randi(10,10,10) > 9;
% Get rowIdcs and colIdcs, which have both a size of Nx1,
% with N denoting the number of ones in the mask.
% rowIdcs and colIdcs are the subscripts of the ones in the mask.
[rowIdcs, colIdcs] = find( cros );
% The number of chunks is equal to number N of ones found in the mask;
numChunks = numel( rowIdcs );
% Initilize a vector for the rms
rms = zeros( numChunks, 1 );
% Loop over the chunks
for k = 1 : numChunks
curRow = rowIdcs(k);
curCol = colIdcs(k);
% Get indices of range over neighbouring rows
chunkRowIdcs = curRow + [-1 0 1]; %i.e. these_indices in your example
% Remove indices that are out of range
chunkRowIdcs( chunkRowIdcs < 1 | chunkRowIdcs > size(volt,1) ) = [];
% Get voltages covered by chunk
chunkVoltages = volt( chunkRowIdcs, curCol );
% Get RMS over voltages
rms(k) = sqrt( mean( chunkVoltages(:).^2 ));
end

Matlab generate variable names when subdividing large data [duplicate]

This question already has answers here:
matlab iterative filenames for saving
(4 answers)
Closed 2 years ago.
I have a large data set (vector) I want to split up in to n smaller sets to look at later with other scripts. I.e.if n = 10 I want to turn one 1x80000000 double in to ten 1x8000000 doubles. My thoughts are turn the original in to a n by m matrix then save each row of the matrix in to it's own vector, as follows.
%data-n-splitter
n = 10 %number of sections
L = length(data);
Ls = L/n;
Ls = floor(Ls);
Counter = 1;
%converting vector to matrix
datamatrix = zeros(n,Ls);
for k = 1:n
datamatrix(k,:) = data(Counter:Counter+ Ls - 1);
Counter = Counter + Ls;
end
How do I make matlab loop this part of the code n times:
%save each row of matrix as seperate vector
P1 = datamatrix(1,:);
P2 = datamatrix(2,:);
P3 = datamatrix(3,:);
P4 = datamatrix(4,:);
P5 = datamatrix(5,:);
P6 = datamatrix(6,:);
P7 = datamatrix(7,:);
P8 = datamatrix(8,:);
P9 = datamatrix(9,:);
P10 = datamatrix(10,:);
Example answer that I'm hoping for:
for k = 1:n
P('n') = datamatrix(n,:);
end
I've seen some articles about using cell arrays but the scripts I'm passing the variables to aren't set up for this so I'd rather not go down that route if possible.
There are several options:
use a struct, which comes closest to what you are hoping for,
use a cell, more convenient looping but no access over meaningful names,
use a higher-dimension matrix (in your case it is only 2D, but the same applies for 3D or higher). This is the most memory-efficient option.
To round this off, you could also use a table, which is a hybrid of a struct and a cell as you can use both notations to access it. There is no other benefit.
Now, how to do this? The simplest (and best) solution first: create a 2D matrix with reshape
Ary = 1:10; % I shrank your 1x80000000 array to 1x10 but you'll get the idea
%% create new structure
Mat = reshape(Ary,5,2);
%% access new structure (looping over columns)
for i = 1:size(Ary,2)
% access columns through slicing
ary_sct = Mat(:,i);
% do something
end
Pro: memory efficient (requires the same amount of memory as the initial array); easy looping
Con: only works if you can slice the initial array evenly
Next: create a cell
Ary = 1:10;
n = 2; % number of sections
L = floor(length(Ary)/n);
% allocate memory
C = cell(1,n);
%% create new structure
for i = 1:n
% access the content of a cell with {}
C{i} = Ary((i-1)*L+1:i*L);
end
%% access new structure (looping over entries)
for i = 1:length(C)
% access the content of a cell with {}
ary_sct = C{i};
% do something
end
Pro: You can store anything in a cell. Every data type and -- what is often more important -- of any dimension
Con: The accessing the content (through {}) or accessing the element (through ()) is a bit annoying if your are a beginner; each element require a memory overhead of about 60 bytes as those are pointers, which need to store the information where and on what they are pointing.
Next: use a struct
Ary = 1:10;
n = 2; % number of sections
L = floor(length(Ary)/n);
% create empty struct
S = struct();
%% create new structure
for i = 1:n
% create fieldname (must start with a character!)
fld = num2str(i,'F%d');
% write to field (note the brackets)
S.(fld) = Ary((i-1)*L+1:i*L);
end
%% access new structure (looping over fieldnames)
% get all field names
FlNms = fieldnames(S);
for i = 1:length(FldNames)
% access field names (this is a cell!)
fld = FldNms{i};
% access struct
ary_sct = S.(fld);
% do something
end
Pro: Field names are convenient to keep the overview of your data
Con: accessing field names in a loop is a bit tedious; each element require a memory overhead of about 60 bytes as those are pointers, which need to store the information where and on what they are pointing.

Saving values of variable in MATLAB

Hi for my code I would like to know how to best save my variable column. column is 733x1. Ideally I would like to have
column1(y)=column, but I obtain the error:
Conversion to cell from logical is not possible.
in the inner loop. I find it difficult to access these stored values in overlap.
for i = 1:7
for y = 1:ydim % ydim = 436
%execute code %code produces different 'column' on each iteration
column1{y} = column; %'column' size 733x1 %altogether 436 sets of 'column'
end
overlap{i} = column1; %iterates 7 times.
end
Ideally I want overlap to store 7 variables saved that are (733x436).
Thanks.
I'm assuming column is calculated using a procedure where each column is dependent on the latter. If not, then there are very likely improvements that can be made to this:
column = zeros(733, 1); % Might not need this. Depends on you code.
all_columns = zeros(xdim, ydim); % Pre-allocate memory (always do this)
% Note that the first dimension is usually called x,
% and the second called y in MATLAB
overlap = cell(7, 1);
overlap(:) = {zeros(xdim, ydim)}; % Pre-allocate memory
for ii = 1:numel(overlap) % numel is better than length
for jj = 1:ydim % ii and jj are better than i and j
% several_lines_of_code_to_calculate_column
column = something;
all_columns(:, jj) = column;
end
overlap{ii} = all_columns;
end
You can access the variables in overlap like this: overlap{1}(1,1);. This will get the first element in the first cell. overlap{2} will get the entire matrix in the second cell.
You specified that you wanted 7 variables. Your code implies that you know that cells are better than assigning it to different variables (var1, var2 ...). Good! The solution with different variables is bad bad bad.
Instead of using a cell array, you could instead use a 3D-array. This might make processing later on faster, if you can vectorize stuff for instance.
This will be:
column = zeros(733, 1); % Might not need this. Depends on you code.
overlap = zeros(xdim, ydim, 7) % Pre-allocate memory for 3D-matrix
for ii = 1:7
for jj = 1:ydim
% several_lines_of_code_to_calculate_column
column = something;
all_column(:, jj, ii) = column;
end
end

MATLAB: vectors of different length

I want to create a MATLAB function to import data from files in another directory and fit them to a given model, but because the data need to be filtered (there's "thrash" data in different places in the files, eg. measurements of nothing before the analyzed motion starts).
So the vectors that contain the data used to fit end up having different lengths and so I can't return them in a matrix (eg. x in my function below). How can I solve this?
I have a lot of datafiles so I don't want to use a "manual" method. My function is below. All and suggestions are welcome.
datafit.m
function [p, x, y_c, y_func] = datafit(pattern, xcol, ycol, xfilter, calib, p_calib, func, p_0, nhl)
datafiles = dir(pattern);
path = fileparts(pattern);
p = NaN(length(datafiles));
y_func = [];
for i = 1:length(datafiles)
exist(strcat(path, '/', datafiles(i).name));
filename = datafiles(i).name;
data = importdata(strcat(path, '/', datafiles(i).name), '\t', nhl);
filedata = data.data/1e3;
xdata = filedata(:,xcol);
ydata = filedata(:,ycol);
filter = filedata(:,xcol) > xfilter(i);
x(i,:) = xdata(filter);
y(i,:) = ydata(filter);
y_c(i,:) = calib(y(i,:), p_calib);
error = #(par) sum(power(y_c(i,:) - func(x(i,:), par),2));
p(i,:) = fminsearch(error, p_0);
y_func = [y_func; func(x(i,:), p(i,:))];
end
end
sample data: http://hastebin.com/mokocixeda.md
There are two strategies I can think of:
I would return the data in a vector of cells instead, where the individual cells store vectors of different lengths. You can access data the same way as arrays, but use curly braces: Say c{1}=[1 2 3], c{2}=[1 2 10 8 5] c{3} = [ ].
You can also filter the trash data upon reading a line, if that makes your vectors have the same length.
If memory is not an major issue, try filling up the vectors with distinct values, such as NaN or Inf - anything, that is not found in your measurements based on their physical context. You might need to identify the longest data-set before you allocate memory for your matrices (*). This way, you can use equally sized matrices and easily ignore the "empty data" later on.
(*) Idea ... allocate memory based on the size of the largest file first. Fill it up with e.g. NaN's
matrix = zeros(length(datafiles), longest_file_line_number) .* NaN;
Then run your function. Determine the length of the longest consecutive set of data.
new_max = length(xdata(filter));
if new_max > old_max
old_max = new_max;
end
matrix(i, length(xdata(filter))) = xdata(filter);
Crop your matrix accordingly, before the function returns it ...
matrix = matrix(:, 1:old_max);

Matlab: Random sample with replacement

What is the best way to do random sample with replacement from dataset? I am using 316 * 34 as my dataset. I want to segment the data into three buckets but with replacement. Should I use the randperm because I need to make sure I keep the index intact where that index would be handy in identifying the label data. I am new to matlab I saw there are couple of random sample methods but they didn't look like its doing what I am looking for, its strange to think that something like doesn't exist in matlab, but I did the follwoing:
My issue is when I do this row_idx = round(rand(1)*316) sometimes I get zero, that leads to two questions
what should I do to avoid zeor?
What is the best way to do the random sample with replacement.
shuffle_X = X(randperm(size(X,1)),:);
lengthOf_shuffle_X = length(shuffle_X)
number_of_rows_per_bucket = round(lengthOf_shuffle_X / 3)
bucket_cell = cell(3,1)
bag_matrix = []
for k = 1:length(bucket_cell)
for i = 1:number_of_rows_per_bucket
row_idx = round(rand(1)*316)
bag_matrix(i,:) = shuffle_X(row_idx,:)
end
bucket_cell{k} = bag_matrix
end
I could do following:
if row_idx == 0
row_idx = round(rand(1)*316)
assuming random number will never give two zeros values in two consecutive rounds.
randi is a good way to get integer indices for sampling with replacement. Assuming you want to fill three buckets with an equal number of samples, then you can write
data = rand(316,34); %# create some dummy data
number_of_data = size(data,1);
number_of_rows_per_bucket = 50;
bucket_cell = cell(1,3);
idx = randi([1,number_of_data],[number_of_rows_per_bucket,3]);
for iBucket = 1:3
bucket_cell{iBucket} = data(idx(:,iBucket),:);
end
To the question: if you use randperm it will give you a draw order without replacement, since you can draw any item once.
If you use randi it draws you with replacement, that is you draw an item possibly many times.
If you want to "segment" a dataset, that usually means you split the dataset into three distinct sets. For that you use draw without replacement (you don't put the items back; use randperm). If you'd do it with replacement (using randi), it will be incredibly slow, since after some time the chance that you draw an item which you have not before is very low.
(Details in coupon collector ).
If you need a segmentation that is a split, you can just go over the elements and independently decide where to put it. (That is you choose a bucket for each item with replacement -- that is you put any chosen bucket back into the game.)
For that:
% if your data items are vectors say data = [1 1; 2 2; 3 3; 4 4]
num_data = length(data);
bucket_labels = randi(3,[1,num_data]); % draw a bucket label for each item, independently.
for i=1:3
bucket{i} = data(bucket_labels==i,:);
end
%if your data items are scalars say data = [1 2 3 4 5]
num_data = length(data);
bucket_labels = randi(3,[1,num_data]);
for i=1:3
bucket{i} = data(bucket_labels==i);
end
there we go.