I have two time series of data, one which is water temperature and the other is air temperature (hourly measurements for one year). Both measurements are taken simultaneously and the vectors are therefore the same size. The command corrcoef illustrates that they have a correlation equal to ~0.9.
Now I'm trying a different approach to find the correlation where I was thinking of spectral coherence. As far as I understand, in order to do this I should find the autospectral density of each time series? (i.e. of water temperature and air temperature) and then find the correlation between them?
As I am new to signal processing I was hoping for some advice on the best ways of doing this!
I would recommend consulting this site. It contains an excellent reference to your question. If you need help with the cohere function, let me know.
Related
I'm studying Anylogic. I'm curious about something.
Some people explain that arrival rate follows Exponential Distribution.
I wanna know 'How can prove that?'
Any kind guidance from you would be very helpful and much appreciated.
Thank you so much.
The arrival rate doesn't follow an exponential distribution, it follows a poisson distribution, so there's nothing to prove on that regard.
What follows an exponential distribution is the inter-arrival time between agents.
To prove that this thing actually follows a particular distribution, you can use one of the many distribution fitting techniques out there, my favorite and the one is the Cullen and Frey Graph. You can see an answer about it here:
https://stats.stackexchange.com/questions/333495/fitting-a-probability-distribution-and-understanding-the-cullen-and-frey-graph
You can also check the wikipedia page on distribution fitting:
https://en.wikipedia.org/wiki/Probability_distribution_fitting
Have in mind that distribution fitting is kinda an art, and no technique gives you the correct distribution, but maybe a good enough approximation of a distribution. But in this case it should be quite easy.
You can't really prove that a distribution fits the data though, you can just have maybe an error estimation when you compare the distribution function with the actual data, and you can have a confidence interval for that... I'm not sure if that's what you want.
not exactly sure what you mean by "prove" that it is exponential... But anyway, it is not "some people" that explain that, it is actually mentioned in AnyLogic help under the "Source" topic as follows:
Rate - agents are generated at the specified arrival rate (which is
equivalent to exponentially distributed interarrival time with mean =
1/rate).
What you can do is collect the interval time between arrivals and plot that distribution to see that it actually looks like an exponential distribution.
To do that:
Create a typical DES process (e.g. source, queue, delay, sink)
Set the arrival type to rate and specify for example 1 per hour
Create a variable in main called "prevTime"
Create a histogram data element called "data"
In the "On exit" of the source write the following code:
data.add(time() - prevTime);
prevTime = time();
Look at the plot of the histogram and its mean.
I am trying to extract common patterns that always appear whenever a certain event occurs.
For example, patient A, B, and C all had a heart attack. Using the readings from there pulse, I want to find the common patterns before the heart attack stroke.
In the next stage I want to do this using multiple dimensions. For example, using the readings from the patients pulse, temperature, and blood pressure, what are the common patterns that occurred in the three dimensions taking into consideration the time and order between each dimension.
What is the best way to solve this problem using Neural Networks and which type of network is best?
(Just need some pointing in the right direction)
and thank you all for reading
Described problem looks like a time series prediction problem. That means a basic prediction problem for a continuous or discrete phenomena generated by some existing process. As a raw data for this problem we will have a sequence of samples x(t), x(t+1), x(t+2), ..., where x() means an output of considered process and t means some arbitrary timepoint.
For artificial neural networks solution we will consider a time series prediction, where we will organize our raw data to a new sequences. As you should know, we consider X as a matrix of input vectors that will be used in ANN learning. For time series prediction we will construct a new collection on following schema.
In the most basic form your input vector x will be a sequence of samples (x(t-k), x(t-k+1), ..., x(t-1), x(t)) taken at some arbitrary timepoint t, appended to it predecessor samples from timepoints t-k, t-k+1, ..., t-1. You should generate every example for every possible timepoint t like this.
But the key is to preprocess data so that we get the best prediction results.
Assuming your data (phenomena) is continuous, you should consider to apply some sampling technique. You could start with an experiment for some naive sampling period Δt, but there are stronger methods. See for example Nyquist–Shannon Sampling Theorem, where the key idea is to allow to recover continuous x(t) from discrete x(Δt) samples. This is reasonable when we consider that we probably expect our ANNs to do this.
Assuming your data is discrete... you still should need to try sampling, as this will speed up your computations and might possibly provide better generalization. But the key advice is: do experiments! as the best architecture depends on data and also will require to preprocess them correctly.
The next thing is network output layer. From your question, it appears that this will be a binary class prediction. But maybe a wider prediction vector is worth considering? How about to predict the future of considered samples, that is x(t+1), x(t+2) and experiment with different horizons (length of the future)?
Further reading:
Somebody mentioned Python here. Here is some good tutorial on timeseries prediction with Keras: Victor Schmidt, Keras recurrent tutorial, Deep Learning Tutorials
This paper is good if you need some real example: Fessant, Francoise, Samy Bengio, and Daniel Collobert. "On the prediction of solar activity using different neural network models." Annales Geophysicae. Vol. 14. No. 1. 1996.
Ok here is what i need to do:
I want to do some tracking using Kalman filter(possibly adaptive).My measurements(when they are available) are very good with very small error from the real measurements. In some cases though the measurements jump to a value,completely off from the correct position i am looking for, and then after few frames the come back to their correct position.
The problem is that if my filter(not adaptive) has specific values for Measurement Noise Covariance(R) and State Error Covariance(Q) matrices the results are not very accurate,because even for these 1% of cases i have to do a compromise between R and Q.
So i decided to use an adaptive Kalman filter as they do in here: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.367.1747&rep=rep1&type=pdf
They estimate the measurement noise covariance matrix based on the innovation sequences.
Basically, they are using a moving window on previous samples and the calculate the covariance of the error between the previous measurements-prior estimations. For eg 5 past measurements and the 5 prior estimations.When a faulty measurement comes under the window, the covariance increases and thus the R increases also.
But in practice the R increases(but not enough) so in the next step the estimation is still good but just a bit towards the the faulty measurement.In the next step(because now the the previous estimation has moved a bit towards the measurement) the R becomes smaller with result the new estimation to go even closer to the measurements, and so on and so forth.
In the end after a few frames the estimations follow the faulty measurements. Here is a plot to understand better what i mean.
https://www.dropbox.com/s/rkv0tjcm4s54kv3/untitled.tif
Maybe what i am trying to do is completely wrong and can't be done with the adaptive Kalman filter.Maybe someone who has worked extensively with Kalman Filter in the past and he has faced this problem before can help.
Any idea is welcome!
Before the answer, I want to be sure I got the problem you have right.
You have measurements, some of them are good (Low measurement noise) yet others are outliers.
The problem you're having is tuning the measurement noise covariance matrix.
Practically, you tune for the good measurements.
Outliers measurements are rejected by using the Error Covariance.
If the innovation falls outside an ellipse you define using the Error Covariance Matrix the measurement is rejected.
Whenever a measurement is rejected you just apply the prediction step again and wait for another measurement.
Yes the problem is exactly this.
However i manage to solve it without the need to define any ellipse.What i was doing was correct except the fact that was not working if i had a lot of(lets say fifty) consecutive outliers.
This is normal if you think the size of your window.If it is for example only 10 samples and you have 20 outliers obviously it won't work.But for 5 consecutive outliers work perfectly.Generally i haven't used any threshold as you propose("if the innovation falls outside an ellipse") reject the measurements.I keep the measurements but in the same time when i start to have outliers the Error measurement covariance becomes very large.So the estimation is based more in previous estimation than in current measurement.
If i used your method which is indeed more logical(reject the current measurement,if it is an outlier based on a threshold) i have the problem that i have to define this threshold a priori,right?Maybe i am missing something..
I'm running a PIV analysis on two consecutive images taken during an experiment to get the vector field. But I would like to know, based on what criteria do I have to choose the percentage of overlap between the tow images for the cross-correlation process? 50%, 75%...? The PIVlab_GUI tool designed for MATLAB chooses a 50% overlap by default, but it allows changing it.
I just want to know the criteria based on which I can know how much overlap is best? Do the vectors become less accurate, dependent.etc, as we increase/decrease the overlap?
My book "Fluid Mechanics Measurements" does not explain how to choose the overlap amount in the cross-correlation process, and I could not find any helpful online reference.
Any help is appreciated.
I suggest you read up on spectral estimation - which is basically equivalent to cross correlation when you segment the data and average the correlation estimates calculated from each segment (the cross correlation is the inverse Fourier transform of the cross spectrum). There's a book chapter on this stuff here, but you may want to find a more complete resource if you are unclear on the basics.
A short answer: increasing the overlap will increase the frequency resolution of the spectral estimate, and give you more segments to average over; your estimate will have a lower variance. But there are diminishing statistical returns the more you increase your overlap past 50%, while the computational complexity continues to rise (more segments = more calculations). Hence most people just choose 50% and have done with it.
It's important to note that you don't get any more information by using overlapping frames, you are simply increasing the frequency resolution (or time lag resolution, for correlation) - similar to the effect of zero-padding a signal before taking its Fourier transform - and this has statistical effects due to the way estimation of this type works.
I have two data sets (t,y1) and (t,y2). These data sets visually look same but their is some time delay or magnitude shift. i want to find the similarity between the two curves (giving the score of similarity 1 for approximately similar curves and 0 for not similar curves). Some curves are seem to be different because of oscillation in data. so, i am searching for the method to find the similarity between the curves. i already tried gradient command in Matlab to find the slope of the curve at each time step and compared it. but it is not giving me satisfactory results. please anybody suggest me the method to find the similarity between the curves.
Thanks in Advance
This answer assumes your y1 and y2 are signals rather than curves. The latter I would try to parametrise with POLYFIT.
If they really look the same, but are shifted in time (and not wrapped around) then you can:
y1n=y1/norm(y1);
y2n=y2/norm(y2);
normratio=norm(y1)/norm(y2);
c=conv2(y1n,y2n,'same');
[val ind]=max(c);
ind will indicate the time shift and normratio the difference in magnitude.
Both can be used as features for your similarity metric. I assume however your signals actually vary by more than just timeshift or magnitude in which case some sort of signal parametrisation may be a better choice and then building a metric on those parameters.
Without knowing anything about your data I would first try with AR (assuming things as typical as FFT or PRINCOMP won't work).
For time series data similarity measurement, one traditional solution is DTW (Dynamic Time Warpping)
Kolmongrov Smirnov Test (kstest2 function in Matlab)
Chi Square Test
to measure similarity there is a measure called MIC: Maximal information coefficient. It quantifies the information shared between 2 data or curves.
The dv and dc distance in the following paper may solve your problem.
http://bioinformatics.oxfordjournals.org/content/27/22/3135.full