Calculate confidence interval for 1 dimension random data - confidence-interval

I want to know how to calculate the given confidence interval for 1 dimension (this could simplify the question) random data.
The situation is like this: suppose I have 50 random points of data from 1 to 100 (not normally distributed), and I want to know the min range that we have for 90% of the data points located in that range.
This is quite similar to the confidence interval for normal distribution.
Is there any one can help me on this?
Thanks.

I believe this question would be more suited to the mathematics stackexchange, but since it is here I'm gonna roll with it.
You don't specify in which programming language you want to achieve this, therefore I believe your problems lie elsewhere, in the theoretical realm of the subject. Thus the best I can do is to point you to good topics about your question.
After doing some research, this is what I found:
Wikipedia's page on Confidence Intervals: https://en.wikipedia.org/wiki/Confidence_interval
Khan Academy Confidence interval explained (I highly recomend it !): https://www.khanacademy.org/math/probability/statistics-inferential/confidence-intervals/v/confidence-interval-1
Stat Treck summary on Confidence intervals: http://stattrek.com/estimation/confidence-interval.aspx
With this in mind, I also took the liberty of checkign other StackOverflow questions. And I found something that I believe is exactly what you need:
Java calculate confidence interval
Hope it helps!

Related

How to prove arrival rate follows Exponential Distributions?

I'm studying Anylogic. I'm curious about something.
Some people explain that arrival rate follows Exponential Distribution.
I wanna know 'How can prove that?'
Any kind guidance from you would be very helpful and much appreciated.
Thank you so much.
The arrival rate doesn't follow an exponential distribution, it follows a poisson distribution, so there's nothing to prove on that regard.
What follows an exponential distribution is the inter-arrival time between agents.
To prove that this thing actually follows a particular distribution, you can use one of the many distribution fitting techniques out there, my favorite and the one is the Cullen and Frey Graph. You can see an answer about it here:
https://stats.stackexchange.com/questions/333495/fitting-a-probability-distribution-and-understanding-the-cullen-and-frey-graph
You can also check the wikipedia page on distribution fitting:
https://en.wikipedia.org/wiki/Probability_distribution_fitting
Have in mind that distribution fitting is kinda an art, and no technique gives you the correct distribution, but maybe a good enough approximation of a distribution. But in this case it should be quite easy.
You can't really prove that a distribution fits the data though, you can just have maybe an error estimation when you compare the distribution function with the actual data, and you can have a confidence interval for that... I'm not sure if that's what you want.
not exactly sure what you mean by "prove" that it is exponential... But anyway, it is not "some people" that explain that, it is actually mentioned in AnyLogic help under the "Source" topic as follows:
Rate - agents are generated at the specified arrival rate (which is
equivalent to exponentially distributed interarrival time with mean =
1/rate).
What you can do is collect the interval time between arrivals and plot that distribution to see that it actually looks like an exponential distribution.
To do that:
Create a typical DES process (e.g. source, queue, delay, sink)
Set the arrival type to rate and specify for example 1 per hour
Create a variable in main called "prevTime"
Create a histogram data element called "data"
In the "On exit" of the source write the following code:
data.add(time() - prevTime);
prevTime = time();
Look at the plot of the histogram and its mean.

Computing similarity matrix with mixed data

I have asked this question also on "Cross Validated" forum, but with no answer so far, so I am trying also here:
I would like to compute similarity matrix (which I will further use for clustering purposes) from my data (failure data from automotive company). The data consist of these variables:
START DATE + TIME (dd/mm/yyyy hh/mm/ss), DURATION (in seconds), DAY OF THE WEEK (mon,tue,...), WORKING TEAM (1,2,3), LOCALIZATION (1,2,3,...,20), FAILURE TYPE
From this, it is clear, that there are continuous and categorical data. What method would you suggest to calculate similarities between failure types? I think I can not use Euclidean distance, or Gowe's similarity. Thank you in advance.
No, you need an ad hoc function that represents your knowledge about what the data means in the real world. Presumably it will be mainly applying a weight to a continuous difference, and a 2D simple matrix for the discrete categorical variables. But don't rule our censoring of extreme values or fuzzification.

Selecting part of a sample by frequency

I'm wondering if there is a way to select part of a sample at a given frequency. The only way I can think to index the sample by frequency is using an FFT, but doing that seems to mess up the sample so that it's not actually playable anymore. I was wondering how else one might select the part of a sample at a given frequency whilst keeping the sound intelligible?
Edit: The exact instructions were "synthesize an example of each vowel of pitch 150 Hz and duration 5 seconds".
Edit: I completely misunderstood what I needed to do originally. New question is here: Synthesizing vowel from existing audio sample jin matlab
The exact phrasing suggests you are being asked to synthesize, ie create a new signal, not filter, or modify an existing signal. Moreover it asks about a fundamental frequency of 150 Hz (It uses the word pitch and not frequency. I'm assuming that fundamental frequency is good enough and/or what they meant :).
So, let me try rewording the question for you:
Do the following for each vowel sound (A, E, I, O, U, etc):
Create a 5 second sound with a fundamental frequency of 150 Hz.
I can think of two ways to solve this problem: 1. sum up some sine waves (all of which will be a multiple of 150 Hz) at different intensities. Knowing the intensities is the trick here. or 2. Start with a pulse of 150 Hz and filter it. Knowing the exact filter to use is the trick here, although using the right pulse will probably have some impact as well. Either way, you don't need or want an FFT in the generation stage. If you can't or don't want to look up the unknowns above, you could use an FFT to analyze a real person saying those sounds and use the results of the analysis to fill in the gaps. It wouldn't be too hard to do that, but it's probably covered in an advanced textbook on phonetics and/or acoustics.
If you need a more detailed answer, perhaps you should create a new question and link it here for help answering that. I suggest the following tags, if they exist:
Speech synthesis
Filtering
audio
phonetics
You should define "at a given frequency" more precisely, but it seems that what you want is a filter with a narrow pass-band tuned at the desired frequency.
However, the narrow frequency requirement is opposed to intelligibility. In the limit, a single frequency would just give you a sinusoid, and intelligibility would be completely lost.

how to find the similarity between two curves and the score of similarity?

I have two data sets (t,y1) and (t,y2). These data sets visually look same but their is some time delay or magnitude shift. i want to find the similarity between the two curves (giving the score of similarity 1 for approximately similar curves and 0 for not similar curves). Some curves are seem to be different because of oscillation in data. so, i am searching for the method to find the similarity between the curves. i already tried gradient command in Matlab to find the slope of the curve at each time step and compared it. but it is not giving me satisfactory results. please anybody suggest me the method to find the similarity between the curves.
Thanks in Advance
This answer assumes your y1 and y2 are signals rather than curves. The latter I would try to parametrise with POLYFIT.
If they really look the same, but are shifted in time (and not wrapped around) then you can:
y1n=y1/norm(y1);
y2n=y2/norm(y2);
normratio=norm(y1)/norm(y2);
c=conv2(y1n,y2n,'same');
[val ind]=max(c);
ind will indicate the time shift and normratio the difference in magnitude.
Both can be used as features for your similarity metric. I assume however your signals actually vary by more than just timeshift or magnitude in which case some sort of signal parametrisation may be a better choice and then building a metric on those parameters.
Without knowing anything about your data I would first try with AR (assuming things as typical as FFT or PRINCOMP won't work).
For time series data similarity measurement, one traditional solution is DTW (Dynamic Time Warpping)
Kolmongrov Smirnov Test (kstest2 function in Matlab)
Chi Square Test
to measure similarity there is a measure called MIC: Maximal information coefficient. It quantifies the information shared between 2 data or curves.
The dv and dc distance in the following paper may solve your problem.
http://bioinformatics.oxfordjournals.org/content/27/22/3135.full

spectral coherence of time seires

I have two time series of data, one which is water temperature and the other is air temperature (hourly measurements for one year). Both measurements are taken simultaneously and the vectors are therefore the same size. The command corrcoef illustrates that they have a correlation equal to ~0.9.
Now I'm trying a different approach to find the correlation where I was thinking of spectral coherence. As far as I understand, in order to do this I should find the autospectral density of each time series? (i.e. of water temperature and air temperature) and then find the correlation between them?
As I am new to signal processing I was hoping for some advice on the best ways of doing this!
I would recommend consulting this site. It contains an excellent reference to your question. If you need help with the cohere function, let me know.