MATLAB random numbers below a threshold - matlab

I use normrnd and lognormrnd to sample numbers according to these two distribution functions. Nevertheless, since I am using quite large standard deviations, I sample several numbers that are above a threshold I put for my code. My question is: is there a way to sample numbers according to this distributions but within a certain threshold without using an if_bigger --> sample_again function?

Related

Appropriate method for clustering ordinal variables

I was reading through all (or most) previously asked questions, but couldn't find an answer to my problem...
I have 13 variables measured on an ordinal scale (thy represent knowledge transfer channels), which I want to cluster (HCA) for a following binary logistic regression analysis (including all 13 variables is not possible due to sample size of N=208). A Factor Analysis seems inappropriate due to the scale level. I am using SPSS (but tried R as well).
Questions:
1: Am I right in using the Chi-Squared measure for count data instead of the (squared) euclidian distance?
2. How can I justify a choice of method? I tried single, complete, Ward and average, but all give different results and I can't find a source to base my decision on.
Thanks a lot in advance!
Answer 1: Since the variables are on ordinal scale, the chi-square test is an appropriate measurement test. Because, "A Chi-square test is designed to analyze categorical data. That means that the data has been counted and divided into categories. It will not work with parametric or continuous data (such as height in inches)." Reference.
Again, ordinal scaled data is essentially count or frequency data you can use regular parametric statistics: mean, standard deviation, etc Or non-parametric tests like ANOVA or Mann-Whitney U test to compare 2 groups or Kruskal–Wallis H test to compare three or more groups.
Answer 2: In a clustering problem, the choice of distance method solely depends upon the type of variables. I recommend you to read these detailed posts 1, 2,3

Correct way to generate random numbers

On page 3 of "Lecture 8, White Noise and Power Spectral Density" it is mentioned that rand and randn create Pseudo-random numbers. Please correct me if I am wrong: a sequence of random number is that which for the same seed, two sequences are never really exact.
Whereas, Pseudo-random numbers are deterministic i.e., two sequences are same if generated from the same seed.
How can I create random numbers and not pseudo-random numbers since I was under the impression that Matlab's rand and randn functions are used to generate identically independent random numbers? But, the slides mention that they create pseudo random numbers. Googling for creating of random numbers return rand and randn() functions.
The reason for distinguishing random numbers from pseudo-random numbers is that I need to compare performance of cryptography (A) random with white noise characteristics and (B) pseudo-random signal with white noise characteristic. So, (A) must be different from (B). I shall be grateful for any code and the correct way to generate random numbers and pseudo-random numbers.
Generation of "true" random numbers is a tricky exercise, you can check Wikipedia on RNG and the tests of randomness (http://en.wikipedia.org/wiki/Random_number_generation). This link offers RNG based on atmospheric noise (http://www.random.org/).
As mentioned above, it is really difficult (probably impossible) to create real random numbers with computer software. There are numerous projects on the internet that provide real random numbers that are generated by physical processes (for example the one Kostya mentioned). A Particularly interesting one is this from HU Berlin.
That being said, for experiments like the one you want to perform, Maltab's psedo RNGs are more than fine. Matlab's algorithms include Mersenne Twister which is one of the best known pseudo RNG (I would suggest you google the Mersenne Twister's properties). See Maltab rng documentation here.
Since you did not mention which type of system you want to simulate, one simple approach to solve your issue would be to use a good RNG (Mersenne Twister) for process A and a not-so-good for process B.

how to generate a dataset of correlated variables with different distributions?

For teaching purposes, I need to generate random datasets of correlated random variables with different distributions. I have tried corr2data in Stata but it will not allow me to specify max and min values of the variables to be generated, just means, sd's and the covariance matrix. Therefore, I need to do messy adjustments after generation of the data. Various other details annoy me with corr2data. Is there a simpler way of doing this with MATLAB? I am not as familiar with this software as I am with Stata.
If you have access to Statistics Toolbox as well as MATLAB, you can use the copula functionality to do this fairly easily. Using a copula, you can specify the marginal distributions of each variable, and a correlation structure between the variables.
You can then generate random numbers from the copula, fit it to data etc. as well.
See in the MATLAB documentation:
Copulas: Generate Correlated Samples

matlab - pdf for multivariate uniform distribution

The pdf for the multivariate normal distribution in MATLAB is mvnpdf(...). What about the case where multiple variables are uniformly distributed: Is there a function to describe their joint distribution analogous to the multivariate normal distribution? If there is no such function, is there a trick to handle this case?
The simplest way how several variables can be uniformly distributed is if they are mutually independent; in that case you simply have a uniform distribution over the hypercube in the space spanned by the variables. In order to get samples from this distribution, you just separately generate samples for each of the variables.
The point where a "trick" might be necessary is if you have dependencies between the variables even though the marginal distribution for each of them is still uniform. In this case you have to describe the dependency structure, and I'm not aware of any standard way to do this (the way dependencies between normally distributed variables are described by a correlation matrix).
Of course such distributions exist: For two dimensions, one possibility would be to have a joint distribution that looks like a solution to the "eight rooks" problem:
Another one actually derives from the introductory Matlab example, the magic square:
Both of these examples are discrete distributions, but can be produced at arbitrary granularity, or simply interpreted as piecewise constant continuous distributions.
As you can see there are many possibilities for a multivariate distribution each of whose marginal distributions are uniform. The question you have to answer for yourself is what kind of dependencies, if any, you are interested in?
If I'm understanding the question properly, we want to calculate the pdf of a multivariate uniform distribution. By definition, the pdf is constant for all values in the support the distribution. Thus to calculate the pdf all that is required is to calculate the norming constant, which is given by the inverse of the integral of the support. That is to say, the pdf is given by
f(x) = 1 / integral(A)
where A is the support set, and x is an element in A. If an analytic solution to integral(A) is not available, then a numerical integrator can be employed.

Select data based on a distribution in matlab

I have a set of data in a vector. If I were to plot a histogram of the data I could see (by clever inspection) that the data is distributed as the sum of three distributions;
One normal distribution centered around x_1 with variance s_1;
One normal distribution centered around x_2 with variance s_2;
Once lognormal distribution.
My data is obviously a subset of the 'real' data.
What I would like to do is to take a random subset of my data away from my data ensuring that the resulting subset is a reasonable representative sample of the original data.
I would like to do this as easily as possible in matlab but am new to both statistics and matlab and am unsure where to start.
Thank you for any help :)
If you can identify each of the 3 distributions (in the sense that you can estimate their parameters), one approach could be to select a random subset of your data and then try to estimate the parameters for each distribution and see whether they are close enough (according to your own definition of "close") to the parameters of the original distributions. You should repeat this process several time and look at the average difference given a random subset size.