Count unique consecutive values in Matlab - matlab

I have a data column which includes string and integer values, as well as blank cells (see below). What I would like to do is count the number of occurrences of unique string and integer values, but only for those values which are preceded by a different/blank value. Is this possible in Matlab? Many thanks.
Example:
Red, Red, Red, [blank], 2, 2, 1, 1, 1, [blank], 1, 1, [blank], Red, Red, 1
Desired output:
Red = 2,
2 = 1,
1 = 3

First find indices of the values you want to count - i.e. the first one, and those that differ from the preceding value and are non-blank. Then you need to count unique values in resulting subset. Small caveat is that you cannot simply use unique since data types as mixed. One way to get around it would be converting numbers to strings (that obviously assumes that you don't have strings which coincide with numbers). You can then use a combination of unique and accumarray to find unique values and their frequencies:
data = {'Red', 'Red', 'Red', [], 2, 2, 1, 1, 1, [], 1, 1, [], 'Red', 'Red', 1};
idx = [true, ~cellfun(#isequal, data(2:end), data(1:end-1)) & ~cellfun(#isempty, data(2:end))];
data_match = data(idx);
data_str = cellfun(#num2str, data_match, 'uni', false);
[~, ia, ic] = unique(data_str, 'stable');
values = data_match(ia)'
counts = accumarray(ic, 1)
values =
'Red'
[ 2]
[ 1]
counts =
2
1
3

Related

MATLAB: Subsetting one array by values of a second

I basically have two columns (arrays): column A represents a continuous stream of data across points in time (e.g. blood pressure rising and falling), while column B represents onset of an event (e.g. a shock or a deep breath). Column A has values for every cell, while column B only has values at a time point where an event occurred, which represent codes for the onset of different kinds of events (e.g. 1, 2, 3, 4, 5 for 5 kinds).
What code can use the values in column B to subset data in column A (say collect all data from any time points between an event 1 and 2, and event 1 and 3, or event 1 and 4)? Basically, I'm trying to pull out the values for only certain time period segments, and store them in a cell array.
Example:
What I have:
Array A: 10, 12, 13, 20, 15, 16, 14, 9, 8, 11, 12, 15, 14
Array B: 1, 0, 2, 0, 0, 0, 1, 0, 2, 0, 1, 0, 2
*(where in Array B, 1 and 2 are events--say a showing a cue and a subject
responding to that cue--and I want the data between a 1 and a 2)*
What I want:
(New) Cell Array C: [12, 13] , [9, 8] , [15,14]
*That is, it's grabbing the data from Array A, based on what falls between
1s and 2s in Array B, and storing them into cells of Array C*
Many thanks!
Here's a way:
Find the indices of starts and ends. This can be done using strfind, which also works with numeric arrays.
Use those indices to build the result with a loop.
ind_start = strfind(B, [1 0]); % Or: ind_start = strfind(char(B+48), '10');
ind_end = strfind(B, 2); % Or: ind_end = strfind(char(B+48), '2')
result = cell(size(ind_start));
for k = 1:numel(ind_start)
result{k} = A(ind_start(k)+1:ind_end(k));
end

Create random values in vector Matlab

I have this vector:
Population = [3, 5, 0, 2, 0, 5, 10, 50, 0, 1];
And i need to fill this vector with a random value between 1 and 4 only where have 0 value in vector.
How i can do it ?
Edit: there's a way to do it using randperm function?
First, find zero elements, then generate random values, then replace those elements:
Population = [3, 5, 0, 2, 0, 5, 10, 50, 0, 1];
idx = find(Population==0);
Population(idx) = 3 * rand(size(idx)) + 1;
If you need integers (didn't specify), just round the generated random numbers in the last statement, like this round(3*rand(size(idx))+1); or use randi (as suggested in answer by #OmG): randi([1,4], size(idx)).
You can use the following code:
ind = find(~Population); % find zero places
Population(ind) = randi(4,1,length(ind)); % replace them with a random integer

De-nest elements of cell-matrix into a matrix

EDIT: It turns out this problem is not solved, as it fails to handle empty cells in the source data. i.e. k = {1 2 [] 4 5}; cat( 2, k{:} ) gives 1 2 4 5 not 1 2 NaN 4 5. So the subsequent reshape is now misaligned. Can anyone outline a strategy for handling this?
I have data of the form:
X = { ...
{ '014-03-01' [1.1] [1.2] [1.3] }; ...
{ '014-03-02' [2.1] [2.2] [2.3] }; ... %etc
}
I wish to replace [1.1] with 1.1 etc, and also replace the date with an integer using datenum
So I may as well use a standard 2D matrix to hold the result (as every element can be expressed as a Double).
But how to go out this repacking?
I was hoping to switch the dates in-place using X{:,1} = datenum( X{:,1} ) but this command fails.
I can do:
A = cat( 1, X{:} )
dateNums = datenum( cat( 1, A{:,1} ) )
values = reshape( cat( 1, A{:,2:end} ), size(X,1), [] )
final = [dateNums values]
Well, that works, but I don't feel at all comfortable with it.
>> u = A{:,1}
u =
014-03-01
>> cat(1,u)
ans =
014-03-01
This suggests only one value is output. But:
>> cat(1,A{:,1})
ans =
014-03-01
014-03-02
So A{:,1} must be emitting a sequential stream of values, and cat must be accepting varargs.
So now if I do A{:,2:end}, it is now spitting out that 2D subgrid again as a sequential stream of values...? And the only way to get at that grid is to cat -> reshape it. Is this a correct understanding?
I'm finding MATLAB's console output infuriatingly inconsistent.
The "sequential stream of values" is known as a comma-separated list. Doing A{:,1} in MATLAB in the console is equivalent to the following syntax:
>> A{1,1}, A{2,1}, A{3,1}, ..., A{end,1}
This is why you see a stream of values because it is literally typing out each row of the cell for the first column, separated by commas and showing that in the command prompt. This is probably the source of your infuriation as you're getting a verbose dump of all of the contents in the cell when you are unpacking them into a comma-separated list. In any case, this is why you use cat because doing cat(1, A{:,1}) is equivalent to doing:
cat(1, A{1,1}, A{2,1}, A{3,1}, ... A{end,1})
The end result is that it takes all elements in the 2D cell array of the first column and creates a new result concatenating all of these together. Similarly, doing A{:, 2:end} is equivalent to (note the column-major order):
>> A{1, 2}, A{2, 2}, A{3, 2}, ..., A{end, 2}, A{1, 3}, A{2, 3}, A{3, 3}..., A{end, 3}, ..., A{end, end}
This is why you need to perform a reshape because if you did cat with this by itself, it will only give you a single vector as a result. You probably want a 2D matrix, so the reshape is necessary to convert the vector into matrix form.
Comma-separated lists are very similar to Python's splat operator if you're familiar with Python. The splat operator is used for unpacking input arguments that are placed in a single list or iterator type... so for example, if you had the following in Python:
l = [1, 2, 3, 4]
func(*l)
This is equivalent to doing:
func(1, 2, 3, 4)
The above isn't really necessary to understand comma-separated lists, but I just wanted to show you that they're used in many programming languages, including Python.
There is a problem with empty cells: cat will skip them. Which means that a subsequent reshape will throw a 'dimension mismatch' error.
The following code simply removes rows containing empty cells (which is what I require) as a preprocessing step.
(It would only take a minor alteration to replace empty cells with NaNs).
A = cat( 1, X{:} );
% Remove any row containing empty cells
emptiesInRow = sum( cellfun('isempty',A), 2 );
A( emptiesInRow > 0, : ) = [];
% Date is first col
dateNums = datenum( cat( 1, A{:,1} ) );
% Get other cols
values = reshape( cat( 1, A{:,2:end} ), size(A,1), [] );
% Recombine into (double) array
grid = [dateNums values]; %#ok<NASGU>

Selecting rows of matrix by value of first column

Let's say I have a matrix A, whose first column are IDs, and a vector B, containing certain IDs in a random order (and some of them might be missing etc).
How do I select the rows of A with matching IDs in the order given by B?
Example:
Using the matrices
A = [2, 0.4, 0.3;
9, 0.2, 0.8;
3, 0.3, 0.4;
5, 0.1, 0.5];
B = [9; 2; 5];
I would like to get the matrix
C = [9, 0.2, 0.8;
2, 0.4, 0.3;
5, 0.1, 0.5];
As per the revised stated problem, the first column of A are the IDs and B also contains certain IDs and we need to match A's IDs with those of B's and select the matching rows from A. Based on such an assumption, you can few approaches here.
Approach #1 [ With ismember]
[~,idx] = ismember(B,A(:,1))
C = A(idx,:)
Approach #2 [ With bsxfun]
[idx,~] = find(bsxfun(#eq,A(:,1),B'))
C = A(idx,:)
Approach #3 [ With intersect]
[~,~,idx] = intersect(B,A(:,1),'stable')
C = A(idx,:)
If your IDs are unique, positive integers, you could do the following:
Approach #4 [ With sparse and indexing]
Construct a sparse vector that corresponds to the mapping: ID -> rowIndex and evaluate this vector:
indexOfID = sparse(A(:,1), 1, 1:size(A,1));
C = A(indexOfID(B),:);
This could be beneficial, when you want to query your IDs more than once, as you only have to build indexOfID once.
(Also I do like the syntax of the "function evaluation" indexOfID(B))

Aggregate 3rd dimension of a 3d array for the subscripts of the first dimension

I have a 3 Dimensional array Val 4xmx2 dimension. (m can be variable)
Val{1} = [1, 280; 2, 281; 3, 282; 4, 283; 5, 285];
Val{2} = [2, 179; 3, 180; 4, 181; 5, 182];
Val{3} = [2, 315; 4, 322; 5, 325];
Val{4} = [1, 95; 3, 97; 4, 99; 5, 101];
I have a subscript vector:
subs = {1,3,4};
What i want to get as output is the average of column 2 in the above 2D Arrays (only 1,3 an 4) such that the 1st columns value is >=2 and <=4.
The output will be:
{282, 318.5, 98}
This can probably be done by using a few loops, but just wondering if there is a more efficient way?
Here's a one-liner:
output = cellfun(#(x)mean(x(:,1)>=2 & x(:,1)<=4,2),Val(cat(1,subs{:})),'UniformOutput',false);
If subs is a numerical array (not a cell array) instead, i.e. subs=[1,3,4], and if output doesn't have to be a cell array, but can be a numerical array instead, i.e. output = [282,318.5,98], then the above simplifies to
output = cellfun(#(x)mean(x(x(:,1)>=2 & x(:,1)<=4,2)),Val(subs));
cellfun applies a function to each element of a cell array, and the indexing makes sure only the good rows are being averaged.