Defining the EM parameters - expectation-maximization

I have a list of observations where each data point is a pair of a time expression (e.g. night, morning) and an hour in a 12-hr clock (i.e. 1, 2, ..., 12): Y = {<e_i, h_i>}_i={1,...,N}. I would like to estimate the distribution of hours in a 24-hr clock given a time expression (or equivalently, classify each data point to AM or PM).
I have a feeling EM would be useful here given the hidden AM/PM variable, but I'm struggling to define the parameters. In all other examples I've used EM for, something is assumed about the distribution that generated the observations (e.g. that it is a normal distribution, or document classification based on bag-of-words). But I'm not sure how to define it here.
I'd appreciate any help!

I ended up solving it as an ILP problem:
I defined a binary variable for each combination of 12 hr and time expression (true if it is PM, false if AM), and start time and end time variables for each expression. My constraints were the order of time expressions, e.g. morning ends before noon starts, etc. I maximized the number of observations that fit within the start and end time for each expression.

Related

Cyclic transformation of dates

I would like to use the day of the year in a machine learning model. As the day of the year is not continuous (day 365 of 2019 is followed by day 1 in 2020), I think of performing cyclic (sine or cosine) transformation, following this link.
However, in each year, there are no unique values of the new transformed variable; for example, two values for 0.5 in the same year, see figures below.
I need to be able to use the day of the year in model training and also in prediction. For a value of 0.5 in the sine transformation, it can be on either 31.01.2019 or 31.05.2019, then using 0.5 value can be confusing for the model.
Is it possible to make the model to differentiate between the two values of 0.5 within the same year?
I am modelling the distribution of a species using Maxent software. The species data is continuous every day in 20 years. I need the model to capture the signal of the day or the season, without using either of them explicitly as categorical variable.
Thanks
EDIT1
Based on furcifer's comment below. However, I find the Incremental modelling approach not useful for my application. It solves the issue of consistent difference between subsequent days; e.g. 30.12.2018, 31.12.2018, and 01.01.2019. But it does not differ than counting the number of days from a certain reference day (weight = 1). Having much higher values on the same date for 2019 than 2014 does not make ecological sense. I hope that interannual changes to be captured from the daily environmental conditions used (explanatory variables). The reason for my need to use day in the model is to capture the seasonal trend of the distribution of a migratory species, without the explicit use of month or season as a categorical variable. To predict suitable habitats for today, I need to make this prediction not only depends on the environmental conditions of today but also on the day of the year.
This is a common problem, but I'm not sure if there is a perfect solution. One thing I would note is that there are two things that you might want to model with your date variable:
Seasonal effects
Season-independent trends and autocorrelation
For seasonal effects, the cyclic transformation is sometimes used for linear models, but I don't see the sense for ML models - with enough data, you would expect a nice connection at the edges, so what's the problem? I think the posts you link to are a distraction, or at least they do not properly explain why and when a cyclic transformation is useful. I would just use dYear to model the seasonal effect.
However, the discontinuity might be a problem for modelling trends / autocorrelation / variation in the time series that is not seasonal, or common between years. For that reason, I would add an absolute date to the model, so use
y = dYear + dAbsolute + otherPredictors
A well-tuned ML model should be able to do the rest, with the usual caveats, and if you have enough data.
This may not the right choice depending on your needs, there are two choices that comes to my mind.
Incremental modeling
In this case, the dates are modeled in a linear fashion, so say 12 Dec, 2018 < 12, Dec, 2019.
For this you just need some form of transformation function that converts dates to numeric values.
As there are many dates that need to be converted to numeric representation, the first thing to make sure is that the output list also has the same order as Lukas mentioned. The easiest way to do this is by adding weight to each unit (weight_year > weight_month > weight_day).
def date2num(date_time):
d, m, y = date_time.split('-')
num = int(d)*10 + int(m)*100 + int(y)*1000 # these weights can be anything as long as
# they are ordered
return num
Now, it's important to normalize the numeric values.
import numpy as np
date_features = []
for d in list(df['date_time']):
date_features.append(date2num(d))
date_features = np.array(date_features)
date_features_normalized = (date_features - np.min(date_features))/(np.max(date_features) - np.min(date_features))
Using the day, month, year as separate features. So, instead of considering the date as whole, we segregate. The motivation is that maybe there will be some relations between the output and a specific date, month, etc. Like, maybe the output suddenly increases in the summer season (specific months) or maybe on weekends (specific days)

How I could make a temperature sweep in comsol?

I make a structure using Comsol then I want to make this structure subjected to a temperature variation ( T(begain)=25C then a temperature ramp (100 C/min) till T=250C and it lasts for 30 min then another temperature ramp (-100 C/min) till T=25C ).How could I make these temperature sweep?
You can define a function (e.g foo) that follows exactly your desired temperature with time profile. Then in the place where you specify your temperature (whether it is a boundary condition or domain condition) you insert foo(t), t being COMSOL's exclusive variable name for time.
You can do that for other variables too, space for instance. The easiest way to define foo is through the 1D interpolation option. Unfortunately, I do not currently have a COMSOL license to check it but I think you can simply enter the time and temperature values in the 1D interpolation table, choose a name and the interpolant style and just use it in the later part of the program.
I'am simulating magnetic fields in time domain with moving coils. Time dependent solver is needed for the movement and for temperature ramps as well. I think that you can use something like this, T=T_start+rate_of_change*t. The t variable is available with the time dependent solver and you can simply write the equation I mentioned. However, I think that you need to use time dependent solver three times, one for ramp up second for the constant temperature and third for the ramp down. Set the times for time dependent solvers so that you can made the desired temperatures.
First t=0s->(225/100*60)135s
second t=135s->(135+30*60)1935s
and last one t=1935s->(1935+135)2070s
You might also need to use compile solutions steps as well to add these three solutions together. I can try to do this tomorrow and check it.
Hope that this helped a bit

Matlab change x axis tick label

I am relatively inexperienced with matlab, as I only use it occasionally. I am trying to plot a large range of values against time and I am running into some problems.
The data, which is from a text file, with about 55000 entries, gives the information in the following format:
year month day hour minute second value
The seconds column has accuracy of 6 decimal places and there are about 24hrs worth of data.
What I want to do is plot the values against time, which works fine. However as a result of my code below, the x-axis has label ticks in serial date number format, which is not very useful when looking at the figure. I want to change the labels to something more useful such intervals of hours. However I am not sure how to go about doing this.
Here is the code:
A = dlmread('data.txt',' ');
time = datenum(A(:,1),A(:,2),A(:,3),A(:,4),A(:,5),A(:,6));
scatter(time,A(:,7),1)
axis([min(time) max(time) min(A(:,7)) max(A(:,7))])
I found a solution here: matlab ticks with certain labels however, the process here is manual and with so much information I don't want to do this manually. How would I automate this process? or is there a better way to do what I am trying to achieve?
EDIT: I also found this method: http://www.mathworks.com/help/matlab/ref/datetick.html#btpnuk4-1, however, I dont want to show the actual date, I rather want to show intervals of time, ie an hour or 30 minutes.
EDIT 2: I have found a somewhat satisfactory solution. It could still be improved upon, so I don't know if I should submit this as an answer to my own question or not, but here it is:
A = dlmread('data.txt',' ');
time = datenum(A(:,1),A(:,2),A(:,3),A(:,4),A(:,5),A(:,6));
temp= time(1);
timediff = time - temp;
scatter(timediff,A(:,7),1)
axis([min(timediff) max(timediff) min(A(:,7)) max(A(:,7))])
datetick('x', 'HH')
This takes the original time vector in serialized time format and subtracts the first time from all the subsequent times to get the difference. The it uses the datetick function to to convert that to hours. It isn't ideal because instead of 24 hours it goes back to 00, but its the best I have tried thus far.
With reference to the other article, you will have to follow the same method but in order to automate the process you'll need to form the vectors of xtick and xticklabels as you read in the data and after you've plotted the data change the xticks and xticklabels.
Its not difficult what you're trying to do, but I will need more details of how you want to organize the ticks to be able to exactly say the steps that you'd have to follow
Matlab serial time is simply days since January 1, 0000, so your timediff variable is really elapsed days (and fractions thereof) since the start of your experiment. If you want your x ticks to be elapsed hours you could multiply timediff by 24.
scatter(timediff * 24, values)
This avoids the weirdness that can arise when using datetick as well.

Find time stamps in a given time interval (start/end) in a 2 column matrix

MatLAB noob here..
I have a 2 column matrix with start/end (in seconds) times - in a 2 column matrix.
I also have a single column matrix of time stamps. How do I find the time stamps that occur in each interval?
Not going to put any code here, as you provided none.
There are some alternatives, this is the most straight forward (though might not be the most efficient for the computer):
Use a for loop, for each line of your start/end matrix, and for each do another for loop for each element in your time stamp matrix, and assess, using if function, if each time stamp is between start and end times.
If you don't know how to use FOR and IF, type help FOR, help IF and find out. Or google it

How to determine value from previous time step during simulation in Modelica?

How can I determine value from previous time step during simulation in Modelica?
I have equation Q=m*c*(Ts2-Ts1-Tr) I need to extract value of Ts2 and Ts1 from it.
Ts2 - is the value from time step 2
Ts1 - is value from previous time step
Ts is input signal and it has variations during the time. Each step
it has different value. In my case time step is 1s. Other values are
fixed.
Can I set in equation variable time?
For example:
Ts2 (start=time);
Ts (start=time-1);
Or it should be input inside this model?
regards Tymofii
This was addressed in a similar question already.
The key point is that equations describing physical behavior cannot refer to time steps. This is because there is no "timestep" in nature or the laws of physics and so the response of a system cannot depend on it.
You don't really explain why you need to do what you are doing. Are you trying to extract simulation results? Are you trying to correlate to experimental data? Or, are you just trying to solve a differential equation?
It isn't clear what you want to do. Please elaborate and we can probably give you some guidance on how to proceed in Modelica.
Update
Using values from a "previous interval" is fine. For example, if you wanted to sample your solution at regular intervals, express a "z transform" or implement a Kalman Filter in Modelica, you could do each of those very easily (for example, see the 'sample' keyword here). In other words, it is possible to store as many previous values as you would like.
What you cannot do is use the timestep of the continuous solver in expressing how your system behaves. The intervals you reference must be independent from any intervals that the solver is using.