plot 95% confidence interval of 20 data points in Matlab - matlab

If amount of data points is small (i.e. 20 data points). Do I need to check its normality before calculating confidence interval?
Can anyone suggests a rigorous process to plot 95% confidence interval?
Thanks!

With so few data points and if you don't know if the distribution is normal I would look into using bootstrap confidence interval. It's a non-parametric method so you are not assuming normality. The MATLAB function bootci implements this method.
Here is the documentation for bootci.

Related

95% confidence interval for AUROCC?

How do I calculate the 95% confidence interval of the area under the ROC curve (AUROCC) on BlueSky Statistics? I know how to create the multivariate logistic model and show the ROC curve and AUROCC. I tried using the Bootstrap Resampling but could not figure out how to get the 95% confidence interval.
If you are familiar with R, you may want to enter the corresponding R codes to perform the calculation you are looking for (see chapter 4 of the user manual for V10 for further details on using R code with BlueSky)
For example, I was able to use some R code to directly connect to MySQL as well as to open a SQLite database file.

How to convert scales to frequencies in Wavelet Transform

I'm dealing with CWT, and I have a big problem converting scales to frequencies. In the MAtlab Wavelet Tutorial they use this expression to convert scales to frequencies
But if i use the default function scal2freq I obtain different result.
I don't understand the role of the Morlet Fourier Factor
Thanks in advance
It is a pretty complicated concept, which I somewhat understand it. I'll write some points here so that you might figure it out yourself, rather easier.
A simple fact is that:
Scale is inversely proportional to frequency.
For example, imagine we have a 1-100 Hz range of frequencies in some time series data such as stock markets data or earthquake data. Scale is "supposed to be" the inverse of that. For instance, if scale would be in range of 1 to 100, we'd have had:
Scale(1/Hz) Frequency (Hz)
1 100
50 50
100 1
Therefore,
The frequency is not the real frequency of those time series data (e.g., stock market, earthquake) that we know of. They are only related, inversely.
And we can safely say that here we are calculating some "pseudo-frequencies", which MATLAB does that (by approximating that). You can read about the approximation process in the documentation in the section pseudo-frequencies:
MATlAB does calculate those pseudo-frequencies based on:
In wavelet analysis, the way to relate scales to frequencies is to determine the center frequency of the wavelet function:
which you can visually see in this image and of-course it would differ, when we would change the types of our function in the calculation. Thus, that center frequency will change everytime in our approximation process:
That "MorletFourierFactor" is a variable to approximate a constant so that when you would do the 1/scale, it would closely approximate those "pseudo-frequencies".
I thought this image about shifting (time axis) and scaling (frequency axis) might be a little helpful to look into as well:
The bottom line is that don't worry about pseudo-frequencies, you wouldn't probably need those. If you would want any frequency spectrum, you can likely go towards applying some of those frequency methods (such as Fast Fourier Transform) on whatever time series data that you have.
If you really really want to map that, you can also try to design some methods to approximate it yourself.
Source
Harvard Seismology

FFT in Matlab in order to find signal frequency and create a graph with peaks

I have data from an accelerometer and made a graph of acceleration(y-axis) and time (x-axis). The frequency rate of the sensor is arround 100 samples per second. but there is no equally spaced time (for example it goes from 10.046,10.047,10.163 etc) the pace is not const. And there is no function of the signal i get. I need to find the frequency of the signal and made a graph of frequency(Hz x-axis) and acceleration (y-axis). but i don't know which code of FFT suits my case. (sorry for bad english)
Any help would be greatly appreciated
For an FFT to work you will need to reconstruct the signal you have with with a regular interval. There are two ways you can do this:
Interpolate the data you already have to make an accurate guess at where the signal would be at a regular interval. However, this FFT may contain significant inaccuracies.
OR
Adjust the device reading from the accelerometer incorporate an accurate timer such that results are always transmitted at regular intervals. This is what I would recommend.

Accurate frequency estimation with short time series data - maximum entropy methods or Yule Walker AR method?

I am using the Lomb-Scargle code to estimate some frequencies in a short time-series, the time series is shown in the first image. The results of the Lomb-Scargle analysis are shown in the second, and I have zoomed in on a prominent peak at about 2 cycles per day. However this peak is smeared and thus it is proving difficult to resolve the real frequency of this component. Is there any other methods, or improvements to the method I am using, to accurately resolve the important frequency components within this short time-series?
There is some information on the use of methods for short time series here but its not clear whether they need to be regularly sampled. Ideally I am looking for a method that works with irregularly sampled data, from some research it appears that maximum entropy methods are the answer, but I am not sure whether these have been implemented in MATLAB? Although from the this link, it appears that there is an equivalent method, 'The Yule-Walker AR method produces the same results as a maximum entropy estimator. However again its not clear whether the data need to be uniformly sampled?

how to find the similarity between two curves and the score of similarity?

I have two data sets (t,y1) and (t,y2). These data sets visually look same but their is some time delay or magnitude shift. i want to find the similarity between the two curves (giving the score of similarity 1 for approximately similar curves and 0 for not similar curves). Some curves are seem to be different because of oscillation in data. so, i am searching for the method to find the similarity between the curves. i already tried gradient command in Matlab to find the slope of the curve at each time step and compared it. but it is not giving me satisfactory results. please anybody suggest me the method to find the similarity between the curves.
Thanks in Advance
This answer assumes your y1 and y2 are signals rather than curves. The latter I would try to parametrise with POLYFIT.
If they really look the same, but are shifted in time (and not wrapped around) then you can:
y1n=y1/norm(y1);
y2n=y2/norm(y2);
normratio=norm(y1)/norm(y2);
c=conv2(y1n,y2n,'same');
[val ind]=max(c);
ind will indicate the time shift and normratio the difference in magnitude.
Both can be used as features for your similarity metric. I assume however your signals actually vary by more than just timeshift or magnitude in which case some sort of signal parametrisation may be a better choice and then building a metric on those parameters.
Without knowing anything about your data I would first try with AR (assuming things as typical as FFT or PRINCOMP won't work).
For time series data similarity measurement, one traditional solution is DTW (Dynamic Time Warpping)
Kolmongrov Smirnov Test (kstest2 function in Matlab)
Chi Square Test
to measure similarity there is a measure called MIC: Maximal information coefficient. It quantifies the information shared between 2 data or curves.
The dv and dc distance in the following paper may solve your problem.
http://bioinformatics.oxfordjournals.org/content/27/22/3135.full