Prediction of rainfall using nonhomogeneous Hidden Markov Model

Prediction of rainfall using nonhomogeneous Hidden Markov Model - matlab

I am new to HMM but I have gone through enough literature. I am working on a project in which I will be predicting rainfall using atmospheric parameters.
I have four observable characteristics of the atmosphere (humidity, temperature, wind, sea level height) for 10 years. I have also rainfall amount data with me.
As per I can understand, for each day a weather state will be specified on the basis of the spatial rainfall. So here goes the question. Lets suppose I have data for 100 days.
Rainfall = { 1,2,3,4... 100}. So if I want to generate weather states what should I do?
Lets suppose
temperature = { 30 to 45, some kind of distribution }
humidity = { 25 to 80 }
wind = { 60 to 100 }
sea level height = { 35 to 90 }
How to find
P(X_0) Initial parameter,
P(X_t|X_t-1) state transition matrix,
P(Y_t|X_t) dependence of observation on state
Do I need some clustering for generating states?
I am coding it in MATLAB.
You can come with your example or any source which can explain the procedure to implement in program.

An HMM has a discrete number of states, so your first step will be to define your states. Once you have well-defined states, come up with a numbering scheme for your states and write a function that can accept the data for a given time period, and output the state number that corresponds to that state.
Once you have a function (let's call it get_state) that maps data to a state number, you can create your state transition matrix as follows:
T = zeros(num_states);
for day = 2:num_days
s1 = get_state(data(day-1));
s2 = get_state(data(day));
T(s1,s2) = T(s1,s2) + 1;
end
The i,j-th element of the matrix T now gives you the transition counts from state i to j. You can turn this into transition probabilities as follows:
M = bsxfun(#rdivide,T+1,sum(T+1,2));
The dependence of the observation on the state is harder. You will have to figure out how you want to turn the observed data into a probability density function or probability mass function. You can have mutliple observed distributions from a single state instead of combining temperature, humidity, etc., into a single observation.
This is obviously not a full implementation, but hopefully it is enough to give you a starting point.

Related

AnyLogic variable for cumulative sum in system dynamics

Good morning, in a System Dynamics model created on AnyLogic, I would like to compute the cumulative sum of a flow of the previous 7 days.
My purpose is to calculate the reproduction ratio of a disease starting from the infectious population at time t over the cumulative sum of the infectious in a fixed time interval. The formula is the following:
Formula
where:
I(t) = infectious population at time t --> I(t) is a flow in the model that changes a stock
I(t-s) = infectious population at time t-s
w(s) = gamma distribution
s represents the time interval of the previous 7 days
I have all the data but I am not able to calculate the sum of I(t-s).
Thanks.

You have to do this manually. Create a variable mySum of type double. Then, add a cyclic event that regularly adds to it from the stock (something like myVar += myStock).
You may need to use an additional variable that stores the temporary stock value from the last time you added, so you only add what was "new" since the last cycle.
In short: use a cyclic event to "approximate" your integral.

Separate y-values depending if the x-value is increasing or decreasing

I try to analyze my data using a mixture of python and matlab, but I am stuck and could not find any discussion that solves my problem.
My data consists of temperature and current measurements which are recorded at the same time but using two different devices. Afterwards these measurements are matched together using the time stamp of each measurement to get the "raw data plot". However, the current values differ at the same temperature depending if the sample was heated up or cooled down.
Now, I would like to separate the heating and cooling values of the current measurements, and calculate the mean and standard deviation for all currents at one temperature for cooling and heating, respectively.
What I do so far is first looking for all the values that are beloning to the same temperature, no matter if it's a cooling or heating cycle. That results in quite large standard deviation values.
The two figures show a simple example how my data looks like.
The first figure plots the temperature values against the number of data points and marks all values that belong to this temperature:
The second figure shows the current data with the marked values that correspond to the temperature.
The temperature is always kept constant for 180 s and then increased or decreased by 10°C. During the 180 s several current measurements are taking place, which results in several data point per temperature per cycle. The cycle is repeated itself several times (not shown here). To simplify the example here, I just used simple numbers instead of real temperature and current values. The repetition of the same number, indicates several measurements at one temperature. In reality the current values are not completely stable, but fluctuate around a certain value (which I also irgnoered here).
The code which does that looks like this:
Sample data:
Test_T = [1,1,1,1,1,2,2,2,2,2,3,3,3,3,3,4,4,4,4,4,5,5,5,5,5,4,4,4,4,4,3,3,3,3,3,2,2,2,2,2,1,1,1,1,1] ;
Test_I = [5,5,5,5,5,6,6,6,6,6,7,7,7,7,7,8,8,8,8,8,9,9,9,9,9,7,7,7,7,7,6,6,6,6,6,5,5,5,5,5,4,4,4,4,4] ;
Code:
Test_T_sel =Test_T;
Test_I_sel = Test_I;
ll = 0;
ul = 2;
ID = Test_T_sel <ul & Test_T_sel >ll;
Test_x_avg = mean(Test_T_sel(ID));
Test_y_avg = mean(Test_I_sel(ID));
figure('Position', [100 100 700 500]);
plot(Test_T_sel);
hold on;
plot(find(ID), Test_T_sel(ID), '*r');
ylabel('Temperature [°C]')
figure('Position', [900 100 700 500]);
plot(Test_I_sel);
hold on;
plot(find(ID), Test_I_sel(ID), '.r');
ylabel('Current [µA]')
And Test_T contains 90 values increasing stepwise from 1 to 5 each 10 values, while Test_I contains the current values. As you can see for Temperature = 1°C the current values is either 5 or 4. Now I would like to get a vector that only contains the values of T and the corresponding value of current if T increases and a second vector for the current values were T decreases.
I thought using a if else comand, but I actually do not know how to implement this. Maybe something like this could work:
if T2 == T1 and T2-T1 <= 0.2 "take corresponding I values" (this is true when the temperature is stable and only varies by 0.2°C)
if T2-T1 > 0.2 "ignore I values until" T2 == T1 again and T2-T1 <= 0.2 (this would either be a stronger variation at one temperature or indicate a temperature change and waits until T is constant again)
But now, I still need to distinguish if the temperature is generally increasing or decreasing after 5 measurements.
if T2 > T1, T is increasing (Test_T_heat) and the correspnding I values should be written in a vector Test_I_heat
if T2 < T1, T is decreasing (Test_T_cool) and the corresponding I values should be written in a vector Test_I_cool
For the example given above this should look like this at the end:
Test_T_heat: [1,1,1,1,1,2,2,2,2,2,3,3,3,3,3,4,4,4,4,4,5,5,5,5,5];
Test_I_heat: [5,5,5,5,5,6,6,6,6,6,7,7,7,7,7,8,8,8,8,8,9,9,9,9,9];
Test_T_cool: [4,4,4,4,4,3,3,3,3,3,2,2,2,2,2,1,1,1,1,1] ;
Test_I_cool: [7,7,7,7,7,6,6,6,6,6,5,5,5,5,5,4,4,4,4,4] ;
How has the "code" to be changed that I get such vectors?

Finding strat point when signal become perodic

I am trying to find the Mean of three cycles after the signal become periodic and reach to steady state. I have a signal that is not periodic at the beginning but after some time it became periodic. I want to find the Mean of the next three cycles which each cycle has five points.
Now I did that by opening the plot and find the point where the signal become periodic then I enter that point to MATLAB, then I got the results. The program working fine but I have a big problem. I have 500,000 data records and its impossible to open each one and find the starting point where the signal become periodic. Is there any way that I can find starting point without opening the plot because each case has a different starting point where the signal become periodic?
I used below code now
close all,clear variables,clear all;
clc;
prompt = 'Enter Strating Point?';
N= input(prompt);
Result=mean(mean(1,N:N+4)+mean(1,N+5:N+9)+mean(1,N+10:N+14));
I attached sample of data, Column one is the signal and column two is the time.
https://www.dropbox.com/sh/27lebrp1lwnmm3l/AABIhN1tzUSJQjjED954Yvyka?dl=0
Thank you!

Full edit:
%inputs: time and y (the response), both same length vectors
ppc = 5; % points per cycle
A = zeros(ppc,1);
for i = 1:ppc
A(i) = mean(y(i:ppc:length(y)));
end
[~,b] = min(A);
possidx = (length(time)+b-ppc):-ppc:b; %idx of lowest points
lowlist = fliplr(y(possidx));% lowest points
for i = 2:length(lowlist) %start from behind
se = std(lowlist(1:i))/sqrt(i); %calculate SE for all current points
if se > 0.05 %depending on your filed you might wanna change it to a lower value
periodstart = time(possidx(i-1)); %lowest point of first period
break
end
end
What it does: the first loop finds which group of points is always at the bottom. So adjust ppc to 10 if you have 10 points per cycle. The points per cycle don't have to be exactly the same for each cycle if you have a lot of them, it should still be reasonably accurate.
Then we add from behind one by one these lowest points and calculate the standard error. Once it is greater than 0.05 we are outside of the periods.
I felt so free to use standard error because that is something i know and that makes sense in this situation. I set the threshold to 0.05 because it's standard in many fields, alter it if it is different in your field.

MATLAB Simple - Linear Predictive Coding and Energy Forecasting

I have a dataset with 274 samples (9 months) of the daily energy (Watts.hour) used on a residential household. I'm not sure if i'm applying the lpc function correctly.
My code is the following:
filename='9-months.csv';
energy = csvread(filename);
C=zeros(5,1);
counter=0;
N=3;
for n=274:-1:31
w2=energy(1:n-1,1);
a=lpc(w2,N);
energy_estimated=0;
for X = 1:N
energy_estimated = energy_estimated + (-a(X+1)*energy(n-X));
end
w_real=energy(n);
error2=abs(w_real-energy_estimated);
counter=counter+1;
C(counter,1)=error2;
end
mean_error=round(mean(C));
Being "n" the sample on analysis, I will use the energy array's values, from 1 to n-1, to calculate the lpc coefficientes (with N=3).
After that, it will apply the calculated coefficients on the "for" cycle presented, in order to calculate the estimated energy.
Finally, error2 outputs the error between the real energy and estimated value.
On the example presented ( http://www.mathworks.com/help/signal/ref/lpc.html ) some filters are used. Do I need to apply any filter to it? Is my methodology correct?
Thank you very much in advance!

The lpc seems to be used correctly, but there are a few other things about your code. I am adressign the part at he "for n" :
for n=31:274 %for me it would seem more logically to go forward in time
w2=energy(1:n-1,1);
a=lpc(w2,N);
energy_estimate=filter([0 -a(2:end)],1,w2);
energy_estimate=energy_estimate(end);
estimates(n)=energy_estimate;
end
error=energy(31:274)-estimates(31:274)';
meanerror=mean(error); %you dont really round mean errors
filter is exactly what you are trying to do with the X=1:N loop. but this will perform the calculation for the entire w2 vector. If you just want the last value take the (end) command as well.
Now there is no reason to calculate the error for every single value and then add them to a vector you can do that faster after the calculation.
Now if your trying to estimate future values with a lpc it could work like that, but you are implying that every value is only dependend on the last 3 values. Have you tried something like a polynominal approach? i would think that this would be closer to reality.

Regarding Time scale issue in Netlogo

I am new user of netlogo. I have a system of reactions (converted to Ordinary Differential Equations), which can be solved using Matlab. I want to develop the same model in netlogo (for comparison with matlab results). I have the confusion regarding time/tick because netlogo uses "ticks" for increment in time, whereas Matlab uses time in seconds. How to convert my matlab sec to number of ticks? Can anyone help me in writing the code. The model is :
A + B ---> C (with rate constant k1 = 1e-6)
2A+ C ---> D (with rate constant k2 = 3e-7)
A + E ---> F (with rate constant k3 = 2e-5)
Initial values are A = B = C = 500, D = E = F = 10
Initial time t=0 sec and final time t=6 sec

I have a general comment first, NetLogo is intended for agent-based modelling. ABM has multiple entities with different characteristics interacting in some way. ABM is not really an appropriate methodology for solving ODEs. If your goal is to simply build your model in something other than Matlab for comparison rather than specifically requiring NetLogo, I can recommend Vensim as more appropriate. Having said that, you can build the model you want in NetLogo, it is just very awkward.
NetLogo handles time discretely rather than continuously. You can have any number of ticks per second (I would suggest 10 and then final time is 60 ticks). You will need to convert your equations into a discrete form, so your rates would be something like k1-discrete = k1 / 10. You may have precision problems with very small numbers.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Prediction of rainfall using nonhomogeneous Hidden Markov Model - matlab

Related

AnyLogic variable for cumulative sum in system dynamics

Separate y-values depending if the x-value is increasing or decreasing

Finding strat point when signal become perodic

MATLAB Simple - Linear Predictive Coding and Energy Forecasting

Regarding Time scale issue in Netlogo

Categories

Resources