Series of consecutive numbers (different lengths) - matlab

I would appreciate if someone showed me an easy way to do this. Let's say I have a vector in MATLAB like
d = [3 2 4 2 2 2 3 5 1 1 2 1 2 2 2 2 2 9 2]
I want to find the series of consecutive number "twos" and the lengths of those series.
Number twos can easily be found by x=find(d==2). But what I want is to get a vector which contains the lengths of all series of consecutive number twos, which means that my result in this case would be a vector like this:
[1 3 1 5 1].
Anyone who could help me?

This seems to work:
q = diff([0 d 0] == 2);
v = find(q == -1) - find(q == 1);
gives
v =
1 3 1 5 1
for me

This is called run length encoding. There is a good m-file available for it at http://www.mathworks.com/matlabcentral/fileexchange/4955-rle-deencoding . This method is generally faster than the previously posted diff/find way.
tic
d_rle = rle(d==2);
d_rle{2}(d_rle{1}==1);
toc
Elapsed time is 0.002632 seconds.
tic
q = [0 diff([0 d 0] == 2)];
find(q == -1) - find(q == 1);
toc
Elapsed time is 0.003061 seconds.

What if we want the indices of the original matrix where the consecutive values are located? Further, what if we want a matrix of the same size as the original matrix, where the number of consecutive values are stored in the indices of the consecutive values? For example:
original_matrix = [1 1 1;2 2 3; 1 2 3];
output_matrix = [3 3 3;2 2 0;0 0 0];
This problem has relevance for meteorological data quality control. For example, if I have a matrix of temperature data from a number of sensors, and I want to know what days had constant consecutive values, and how many days were constant, so I can then flag the data as possibly faulty.
temperature matrix is number of days x number of stations and I want an output matrix that is also number of days x number of stations, where the consecutive values are flagged as described above.

Related

How do I make a matrix with uniformly increasing elements without loop

0 0 1 1
1 1 2 2
2 2 3 3
3 3 4 4
4 4 5 5
I want to make matrix like above without for loops.
I only know how to do it with a loop.
This is my code
x = [0 0 1 1];
for i = 1:4
x= [x;x(1,:)+i]
end
Is there a way in a vector like function ':'? Or in other ways.
I want to know how to insert an increased element value into a matrix row without loop.
You could use bsxfun:
result = bsxfun(#plus,x,(0:4).')
In Matlab 2016b or newer you can also directly expand singleton dimensions:
result = x + (0:4).'
You can also use cumsum to cumulatively sum down the columns. So create your starting vector, with a matrix of ones underneath for the other rows.
cumsum([0 0 1 1; ones(4,4)]) % ones(n-1, 4) for result with n rows, input 4 columns
This has the advantage of being able to do other step sizes easily
cumsum([0 0 1 1; 2*ones(4,4)]) % steps of 2
Furthermore, it can handle different intervals in each column if we employ repmat
% Row one ↓ interval per col ↓
cumsum([0 0 1 1; repmat([1 2 3 4], 4, 1)]); % Again, use n-1 in place of 4
If you vertically concatenate the row vectors you want and then take the transpose you will get the required result (ie x=[0:4;0:4;1:5;1:5]' in this example).
You can use kron + one of methods suggested here.
kron(hankel(0:4,4:5),[1 1])

How to obtain transition probability matrix in MATLAB?

Suppose I have a sequence x= 1,3,3,1,2,1,4,2,3,1,4,2,4,4,4,3,1,2,5,1 and it has five states 1 3 2 4 5. I have to obtain transition probability matrix in MATLAB by this equation, probability= (Number of observation pairs x(t) & x(t+1), with x(t) in state i and x(t+1) in state j)/(Number of observation pairs x(t) & x(t+1), with x(t) in state i and x(t+1) in any one of the states 1......s).
I tried by this code but it giving error
x=[1 3 3 1 2 1 4 2 3 1 4 2 4 4 4 3 1 2 5 1]
n = length(x)-1
p = zeros(5,5)
for t = 1:n
if x(t)=x(t+1);
a(t)=count (x(t)=x(t+1)) % Here i am trying to count how many number of times pair of that states occur in sequence.
q(t)=sum(x==x(t)) % (Here i am trying to count Number of observation pairs x(t) & x(t+1), with x(t) in state i and x(t+1) in any one of the states 1......s)
end
for i=1:5
p(i, :) = a(t)/q(t)
end
Transition probability matrix calculated manually by me as follows
1 3 2 4 5
1 0 1/5 2/5 2/5 0
3 3/4 1/4 0 0 0
2 1/4 1/4 0 1/4 1/4
4 0 1/5 2/5 2/5 0
5 1 0 0 0 0
Since it has been a while, I think it is safe to provide an answer to this now. There is no toolbox required for either approach below. The approach assumes basic knowledge of a transition probability matrix of a Discrete Time Markov Chain (DTMC).
Both approaches use the unique() function to find the statespace. Note that the order is different, e.g. your [1 3 2 4 5] vs. my [1 2 3 4 5] but that isn't a limiting issue. I've separated getting the transition counts from the transition probabilities to illustrate some techniques.
Approach 1: Vectorized Approach
This approach uses the unique() and accumarray() functions.
% MATLAB 2018b
X =[1 3 3 1 2 1 4 2 3 1 4 2 4 4 4 3 1 2 5 1];
[u,~,n] = unique(X);
NumU = length(u); % Number of Unique Observations
Counts = accumarray([n(1:end-1),n(2:end)],1,[NumU,NumU]);
P = Counts./sum(Counts,2); % Probability transition matrix
Verification: You can verify that sum(sum(Counts)) == length(X)-1 and the rows of P sum to one (sum(P,2)).
Notice that the counts matrix uses a 1-step offset to count the transitions. The output is a NumU x NumU array of the number of transitions in terms of indices as given in the n-output from unique().
Approach 2: Single for loop
This is a direct approach that can use any ordering of the statespace (see below).
States = unique(X); % <--- can be anything
% e.g. try: States = [1 3 2 4 5];
Counts = zeros(length(States));
for k = 2:length(X)
Counts(find(X(k-1) == States),find(X(k) == States)) = ...
Counts(find(X(k-1) == States),find(X(k) == States)) + 1;
end
P = Counts./sum(Counts,2); % Probability transition matrix
Using your statespace ordering: If you use Approach 2 with States = [1 3 2 4 5];, the resulting probability transition matrix, P, matches the one you manually calculated.

MATLAB: sample from population randomly many times?

I am aware of MATLAB's datasample which allows to select k times from a certain population. Suppose population=[1,2,3,4] and I want to uniformly sample, with replacement, k=5 times from it. Then:
datasample(population,k)
ans =
1 3 2 4 1
Now, I want to repeat the above experiment N=10000 times without using a for loop. I tried doing:
datasample(repmat(population,N,1),5,2)
But the output I get is (just a short excerpt below):
1 3 2 1 3
1 3 2 1 3
1 3 2 1 3
1 3 2 1 3
1 3 2 1 3
1 3 2 1 3
1 3 2 1 3
1 3 2 1 3
1 3 2 1 3
Every row (result of an experiment) is the same! But obviously they should be different... It's as though some random seed is not updating between rows. How can I fix this? Or some other method I could use that avoids a for loop? Thanks!
You seem to be confusing the way datasample works. If you read the documentation on the function, if you specify a matrix, it will generate a data sampling from a selection of rows in the matrix. Therefore, if you simply repeat the population vector 10000 times, and when you specify the second parameter of the function - which in this case is how many rows of the matrix to extract, even though the actual row locations themselves are different, the actual rows over all of the matrix is going to be the same which is why you are getting that "error".
As such, I wouldn't use datasample here if it is your intention to avoid looping. You can use datasample, but you'd have to loop over each call and you explicitly said that this is not what you want.
What I would recommend you do is first create your population vector to have whatever you desire in it, then generate a random index matrix where each value is between 1 up to as many elements as there are in population. This matrix is in such a way where the number of columns is the number of samples and the number of rows is the number of trials. Once you create this matrix, simply use this to index into your vector to achieve the desired sampling matrix. To generate this random index matrix, randi is a fine choice.
Something like this comes to mind:
N = 10000; %// Number of trials
M = 5; %// Number of samples per trial
population = 1:4; %// Population vector
%// Generate random indices
ind = randi(numel(population), N, M);
%// Get the stuff
out = population(ind);
Here's the first 10 rows of the output:
>> out(1:10,:)
ans =
4 3 1 4 2
4 4 1 3 4
3 2 2 2 3
1 4 2 2 2
1 2 3 4 2
2 2 3 2 1
4 1 3 2 4
1 4 1 3 1
1 1 2 4 4
1 2 4 2 1
I think the above does what you want. Also keep in mind that the above code generalizes to any population vector you want. You simply have to change the vector and it will work as advertised.
datasample interprets each column of your data as one element of your population, sampling among all columns.
To fix this you could call datasample N times in a loop, instead I would use randi
population(randi(numel(population),N,5))
assuming your population is always 1:p, you could simplify to:
randi(p,N,5)
Ok so both of the current answers both say don't use datasample and use randi instead. However, I have a solution for you with datasample and arrayfun.
>> population = [1 2 3 4];
>> k = 5; % Number of samples
>> n = 1000; % Number of times to execute datasample(population, k)
>> s = arrayfun(#(k) datasample(population, k), n*ones(k, 1), 'UniformOutput', false);
>> s = cell2mat(s);
s =
1 4 1 4 4
4 1 2 2 4
2 4 1 2 1
1 4 3 3 1
4 3 2 3 2
We need to make sure to use 'UniformOutput', false with arrayfun as there is more than one output. The cell2mat call is needed as the result of arrayfun is a cell array.

cumsum of values for same timeunit

i have the following vectors:
A=[1 0 1 0 0 1 0 1 0 0];
B=[1 2 3 4 5 6 7 8 9 10];
in this case A represents a time vector, where the 1s signal the beginning of one time unit.
now i want to add up all the values in B which correspond to a time unit with the same length of 3 steps.
So in this example this would mean the 3rd, 4th and 5th value and the 8th, 9th and 10th value of B should be summed cause these are in a time unit of length 3.
B_result=[12 27];
i know cumsum() is the command for this but i dont know how to say that only these specific values depending on the time indices of A should be summed.
can you help me?
thanks alot
You can use cumsum alongside accumarray and hist:
csa = cumsum(A); %// from begining og unit to unit indices
n = hist(csa, 1:max(csa)); %// count num of steps in each unit
B_result = accumarray( csa', B' ); %// accumulate B into different time units
B_result(n~=3) = []; %// discard all time units that do not have 3 steps
For a simpler pattern matching, you can use strfind:
loc = strfind([A,1],[1 0 0 1]); %// add the 1 at the end of A and the pattern to avoid longer intervals
idx = bsxfun(#plus,loc,(0:2)'); %'// get the indices that need to be summed
result = sum(B(idx),1); %// obtain the result
N = 3; %// We want to detect a one followed by exactly N-1 zeros. Call that
%// sequence an "interesting part"
ind = find([A 1]); %// find ones. Append a last one to detect a possible
%// interesting part at the end.
ind = ind(diff(ind)==N); %// index of beginning of interesting parts
cs = cumsum(B); %// accumulate values
B_result = cs(ind+N-1)-cs(ind-1); %// use index to build result
A more generic application of Jonas' Idea:
A = [1 0 1 0 0 1 0 1 0 0 0 0 1];
B = [1 2 3 4 5 6 7 8 9 10 11 12];
n = 3;
result = arrayfun(#(x) sum( B(x:x+n-1) ), strfind([A,1],num2str(10^n+1)-48))
or use cumsum instead of sum, I was not sure what you actually want:
result = arrayfun(#(x) cumsum( B(x:x+n-1) ), ...
strfind( [A,1],num2str(10^n+1)-48 ) ,'uni',0)
%optional:
result = cell2mat(result')

Checking values of two vectors against eachother and then using the column location of equal entries to extract colums from a matrix in matlab

I'm doing a curve fitting problem in Matlab and so far I've set up some orthonormal polynomials along a specified range of x-values with x = (0:0.0001:40);
The polynomials themselves are each a manipulation of that x vector and are stored as a row in a matrix. I also have some have data entries in the form of two vectors - one for the data x-coords and one for the actual values. I need a way to use the x-coords of my data points to find the same values in my continuous x-vector and then take the corresponding columns from my polynomial matrix and add them to a new matrix.
EDIT: To be more clear. I have, for example:
x = [0 1 2 3 4 5]
Polynomial =
1 1 1 1 1 1
0 1 2 3 4 5
0 1 4 9 16 25
% Data values:
x-coord = [1 3 4]
values = [5 3 8]
I want to check the x-coord values against 'x' to find the corresponding columns and then pull out those columns from the polynomial matrix to get:
Polynomial =
1 1 1
1 3 4
1 9 16
If your x, Polynomial, and xcoord are the same length you could use logical indexing which is elegant; something along the lines of Polynomial(x==xcoord). But since this doesn't seem to be the case, there's a less fancy solution with a for-loop and find(xcoord(i)==x)