How to obtain transition probability matrix in MATLAB?

How to obtain transition probability matrix in MATLAB? - matlab

Suppose I have a sequence x= 1,3,3,1,2,1,4,2,3,1,4,2,4,4,4,3,1,2,5,1 and it has five states 1 3 2 4 5. I have to obtain transition probability matrix in MATLAB by this equation, probability= (Number of observation pairs x(t) & x(t+1), with x(t) in state i and x(t+1) in state j)/(Number of observation pairs x(t) & x(t+1), with x(t) in state i and x(t+1) in any one of the states 1......s).
I tried by this code but it giving error
x=[1 3 3 1 2 1 4 2 3 1 4 2 4 4 4 3 1 2 5 1]
n = length(x)-1
p = zeros(5,5)
for t = 1:n
if x(t)=x(t+1);
a(t)=count (x(t)=x(t+1)) % Here i am trying to count how many number of times pair of that states occur in sequence.
q(t)=sum(x==x(t)) % (Here i am trying to count Number of observation pairs x(t) & x(t+1), with x(t) in state i and x(t+1) in any one of the states 1......s)
end
for i=1:5
p(i, :) = a(t)/q(t)
end
Transition probability matrix calculated manually by me as follows
1 3 2 4 5
1 0 1/5 2/5 2/5 0
3 3/4 1/4 0 0 0
2 1/4 1/4 0 1/4 1/4
4 0 1/5 2/5 2/5 0
5 1 0 0 0 0

Since it has been a while, I think it is safe to provide an answer to this now. There is no toolbox required for either approach below. The approach assumes basic knowledge of a transition probability matrix of a Discrete Time Markov Chain (DTMC).
Both approaches use the unique() function to find the statespace. Note that the order is different, e.g. your [1 3 2 4 5] vs. my [1 2 3 4 5] but that isn't a limiting issue. I've separated getting the transition counts from the transition probabilities to illustrate some techniques.
Approach 1: Vectorized Approach
This approach uses the unique() and accumarray() functions.
% MATLAB 2018b
X =[1 3 3 1 2 1 4 2 3 1 4 2 4 4 4 3 1 2 5 1];
[u,~,n] = unique(X);
NumU = length(u); % Number of Unique Observations
Counts = accumarray([n(1:end-1),n(2:end)],1,[NumU,NumU]);
P = Counts./sum(Counts,2); % Probability transition matrix
Verification: You can verify that sum(sum(Counts)) == length(X)-1 and the rows of P sum to one (sum(P,2)).
Notice that the counts matrix uses a 1-step offset to count the transitions. The output is a NumU x NumU array of the number of transitions in terms of indices as given in the n-output from unique().
Approach 2: Single for loop
This is a direct approach that can use any ordering of the statespace (see below).
States = unique(X); % <--- can be anything
% e.g. try: States = [1 3 2 4 5];
Counts = zeros(length(States));
for k = 2:length(X)
Counts(find(X(k-1) == States),find(X(k) == States)) = ...
Counts(find(X(k-1) == States),find(X(k) == States)) + 1;
end
P = Counts./sum(Counts,2); % Probability transition matrix
Using your statespace ordering: If you use Approach 2 with States = [1 3 2 4 5];, the resulting probability transition matrix, P, matches the one you manually calculated.

Related

MATLAB: sample from population randomly many times?

I am aware of MATLAB's datasample which allows to select k times from a certain population. Suppose population=[1,2,3,4] and I want to uniformly sample, with replacement, k=5 times from it. Then:
datasample(population,k)
ans =
1 3 2 4 1
Now, I want to repeat the above experiment N=10000 times without using a for loop. I tried doing:
datasample(repmat(population,N,1),5,2)
But the output I get is (just a short excerpt below):
1 3 2 1 3
1 3 2 1 3
1 3 2 1 3
1 3 2 1 3
1 3 2 1 3
1 3 2 1 3
1 3 2 1 3
1 3 2 1 3
1 3 2 1 3
Every row (result of an experiment) is the same! But obviously they should be different... It's as though some random seed is not updating between rows. How can I fix this? Or some other method I could use that avoids a for loop? Thanks!

You seem to be confusing the way datasample works. If you read the documentation on the function, if you specify a matrix, it will generate a data sampling from a selection of rows in the matrix. Therefore, if you simply repeat the population vector 10000 times, and when you specify the second parameter of the function - which in this case is how many rows of the matrix to extract, even though the actual row locations themselves are different, the actual rows over all of the matrix is going to be the same which is why you are getting that "error".
As such, I wouldn't use datasample here if it is your intention to avoid looping. You can use datasample, but you'd have to loop over each call and you explicitly said that this is not what you want.
What I would recommend you do is first create your population vector to have whatever you desire in it, then generate a random index matrix where each value is between 1 up to as many elements as there are in population. This matrix is in such a way where the number of columns is the number of samples and the number of rows is the number of trials. Once you create this matrix, simply use this to index into your vector to achieve the desired sampling matrix. To generate this random index matrix, randi is a fine choice.
Something like this comes to mind:
N = 10000; %// Number of trials
M = 5; %// Number of samples per trial
population = 1:4; %// Population vector
%// Generate random indices
ind = randi(numel(population), N, M);
%// Get the stuff
out = population(ind);
Here's the first 10 rows of the output:
>> out(1:10,:)
ans =
4 3 1 4 2
4 4 1 3 4
3 2 2 2 3
1 4 2 2 2
1 2 3 4 2
2 2 3 2 1
4 1 3 2 4
1 4 1 3 1
1 1 2 4 4
1 2 4 2 1
I think the above does what you want. Also keep in mind that the above code generalizes to any population vector you want. You simply have to change the vector and it will work as advertised.

datasample interprets each column of your data as one element of your population, sampling among all columns.
To fix this you could call datasample N times in a loop, instead I would use randi
population(randi(numel(population),N,5))
assuming your population is always 1:p, you could simplify to:
randi(p,N,5)

Ok so both of the current answers both say don't use datasample and use randi instead. However, I have a solution for you with datasample and arrayfun.
>> population = [1 2 3 4];
>> k = 5; % Number of samples
>> n = 1000; % Number of times to execute datasample(population, k)
>> s = arrayfun(#(k) datasample(population, k), n*ones(k, 1), 'UniformOutput', false);
>> s = cell2mat(s);
s =
1 4 1 4 4
4 1 2 2 4
2 4 1 2 1
1 4 3 3 1
4 3 2 3 2
We need to make sure to use 'UniformOutput', false with arrayfun as there is more than one output. The cell2mat call is needed as the result of arrayfun is a cell array.

Identify adjacent superpixels iteratively

Let A be:
1 1 1 1 1 1
1 2 2 3 3 3
4 4 2 2 3 4
4 4 4 4 4 4
4 4 5 5 6 6
5 5 5 5 5 6
I need to identify a particular superpixel's adjacent pixels,
e.g.
The 1st adjacency of 2 is 1, 3, 4
The 2nd adjacency of 2 is 5, 6
The 3rd adjacency of 2 is ...
What is the FASTEST way to do it?

Assume you have a function adj(value), that has the code from your previous question.
sidenote: you probably would like that adj() function not to return the value of the pixel you are analyzing. you can make that easily.
you could do:
img=[your stuff];
imgaux=img;
ii=1;
val=2; %whatever value you want
while numel(unique(imgaux))>1 % Stop if the whole image is a single superpixel
adjacent{ii}=adj(val);
% expand the superpixel to the ii order of adjacency
for jj=1:size(adjacent{ii},1)
imgaux(imgaux==adjacent{ii}(jj))==val;
end
ii=ii+1;
end
Now size(adjacent,2) will be the amount of adjacency levels for that superpixel.
I am guessing this code is optimizable, I welcome any try for it!

Following Dan's suggestion on the comments, here is a possible implementation:
% Parameters
pixVal = 2;
adj = {};
prevMask = A == pixVal;
for ii = 1:length(A)
currMask = imdilate(prevMask, ones(2 * ii + 1));
adj{ii} = setdiff(unique(A(currMask & ~prevMask))', [adj{:}]);
if isempty(adj{ii})
break
end
prevMask = currMask;
end
Where pixVal is the pixel you want to look at.
Result:
>> adj{:}
ans =
1 3 4
ans =
5 6
ans =
Empty matrix: 1-by-0

Here's another approach reusing the code from your previous question:
%// create adjacency matrix
%// Includes code from #LuisMendo's answer
% // Data:
A = [ 1 1 1 1 1 1
1 2 2 3 3 3
4 4 2 2 3 4
4 4 4 4 4 4
4 4 5 5 6 6
5 5 5 5 5 6 ];
adj = [0 1 0; 1 0 1; 0 1 0]; %// define adjacency. [1 1 1;1 0 1;1 1 1] to include diagonals
nodes=unique(A);
J=zeros(numel(nodes));
for value=nodes.'
mask = conv2(double(A==value), adj, 'same')>0; %// from Luis' code
result = unique(A(mask)); %// from Luis' code
J(value,result)=1;
J(value,value)=0;
end
J is now the adjacency matrix for your matrix A and this becomes a graph problem. From here you would use the appropriate algorithm to find the shortest path. Path length of 1 is your "1st adjacency", path length of 2 is "2nd adjacency" and so on.
Dijkstra to find shortest path from a single node
Floyd-Warshall to find shortest paths from all the nodes
Breadth-first search for a single node, plus you can generate a handy tree
Update
I decided to play around with a custom Breadth-First Traversal to use in this case, and it's a good thing I did. It exposed some glaring errors in my pseudocode, which has been corrected above with working Matlab code.
Using your sample data, the code above generates the following adjacency matrix:
J =
0 1 1 1 0 0
1 0 1 1 0 0
1 1 0 1 0 0
1 1 1 0 1 1
0 0 0 1 0 1
0 0 0 1 1 0
We can then perform a depth-first traversal of the graph, putting each level of the breadth-first tree in a row of a cell array so that D{1} lists the nodes that have a distance of 1, D{2} has a distance of 2, etc.
function D = BFD(A, s)
%// BFD - Breadth-First Depth
%// Find the depth of all nodes connected to node s
%// in graph A (represented by an adjacency matrix)
A=logical(A); %// all distances are 1
r=A(s,:); %// newly visited nodes at the current depth
v=r; %// previously visited nodes
v(s)=1; %// we've visited the start node
D={}; %// returned Depth list
while any(r)
D(end+1,:)=find(r);
r=any(A(r,:))&~v;
v=r|v;
end
end
For a start node of 2, the output is:
>> D=BFD(J,2)
D =
{
[1,1] =
1 3 4
[2,1] =
5 6
}

put random position set of number with conditional repeat

I have this set of number a=[1 2 3]. And I want to put this number randomly into matrix 7 x 1, and number 1 must have 2 times, number 2 must have 3 times and number 3 must have 2 times.
The sequence is not necessary. The answer look like.
b=[1 2 2 2 1 3 3]'

Try randperm:
a=[1 2 3];
samps = [1 1 2 2 2 3 3]; % specify your desired repeats
samps = samps(randperm(numel(samps))); % shuffle them
b = a(samps)
Or, instead of specifying samps explicitly, you can specify the number of repetitions for each element of a and use arrayfun to compute samps:
reps = [2 3 2];
sampC = arrayfun(#(x,y)x*ones(1,y),a,reps,'uni',0);
samps = [sampC{:}];
samps = samps(randperm(numel(samps))); % shuffle them
b = a(samps)

%how often each value should occure
quantity=[2,2,3]
%values
a=[1,2,3]
l=[]
%get list of all values
for idx=1:numel(a)
l=[l,ones(1,quantity(idx))*v(idx)]
end
%shuffle l
l=l(randperm(numel(l)))

check the neighbours of a cell in a vector Matlab

I have two vectors
K=[1 1 1 2 1 2 1 4 2 10 4 5 1]
and
L=[2 0 1 2 1 2 1 3 2 0 1 2 1]
I want to compare the value of the 7th element in each vector with the neighbours of this value, where the neighbours are 5 elements next to this element in each side. So for K, the 7th element is 1 and the neighbours are 1 1 1 2 1 2 (left neighbours) and 4 2 10 4 5 1 (right neighbours).
For L, the 7th element is 1 and the neighbours are 2 0 1 2 1 2 (left neighbours) and 3 2 0 1 2 1 (right neighbours). If the difference between the 7th value and each of its neighbours is above a certain threshold then I'll do something e.g X=1, if not then I'll do another thing e.g X=2.
So in my example I'll set the threshold to 3, so for K the 7th element value is 1 and the difference between it and two of its neighbours 10,5 are more than the threshold value 3 so X will be 1. For L the 5th element value is 1 and the difference between it and all of its neighbours is less than the threshold value 3 so X will be 2. So I'm wondering if anyone could assist me to do this condition, I'm not sure if this can be done without loops to save time.

You can check this condition using any and or:
N = 5; % reference index
T = 3; % threshold
V = L; % used to pass the vector L to the if-statement
% V = K;
% formulate if-statement to check for values
% below/above index N and check if any difference
% exceeds the threshold
% the or-statement (because it does not matter if the
% threshold is exceeded above index N or below)
% is expressed as |
if any((V(1:N-1)-V(N))>T) | any((V(N+1:end)-V(N))>T)
X = 1;
else
X = 2;
end
Note
Depending on your Matlab version V(1:N-1)-V(N) will not work, because the matrix dimensions do not agree. In this case use: V(1:N-1)-ones(size(V(1:N-1))).*V(N)

Series of consecutive numbers (different lengths)

I would appreciate if someone showed me an easy way to do this. Let's say I have a vector in MATLAB like
d = [3 2 4 2 2 2 3 5 1 1 2 1 2 2 2 2 2 9 2]
I want to find the series of consecutive number "twos" and the lengths of those series.
Number twos can easily be found by x=find(d==2). But what I want is to get a vector which contains the lengths of all series of consecutive number twos, which means that my result in this case would be a vector like this:
[1 3 1 5 1].
Anyone who could help me?

This seems to work:
q = diff([0 d 0] == 2);
v = find(q == -1) - find(q == 1);
gives
v =
1 3 1 5 1
for me

This is called run length encoding. There is a good m-file available for it at http://www.mathworks.com/matlabcentral/fileexchange/4955-rle-deencoding . This method is generally faster than the previously posted diff/find way.
tic
d_rle = rle(d==2);
d_rle{2}(d_rle{1}==1);
toc
Elapsed time is 0.002632 seconds.
tic
q = [0 diff([0 d 0] == 2)];
find(q == -1) - find(q == 1);
toc
Elapsed time is 0.003061 seconds.

What if we want the indices of the original matrix where the consecutive values are located? Further, what if we want a matrix of the same size as the original matrix, where the number of consecutive values are stored in the indices of the consecutive values? For example:
original_matrix = [1 1 1;2 2 3; 1 2 3];
output_matrix = [3 3 3;2 2 0;0 0 0];
This problem has relevance for meteorological data quality control. For example, if I have a matrix of temperature data from a number of sensors, and I want to know what days had constant consecutive values, and how many days were constant, so I can then flag the data as possibly faulty.
temperature matrix is number of days x number of stations and I want an output matrix that is also number of days x number of stations, where the consecutive values are flagged as described above.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse