Using randi in MATLAB to get random values: values are not distributed uniformly - matlab

I am generating a random population of strings made of 0s and 1s. I am using randi(2)-1 to get a randomly generated single value 0 or 1. I expect to get 1s almost as frequently as 0s. Instead, when I view all the individuals in the population, they mostly consist of 1s. Below is the code - what is wrong?
for iInd=1:individualsCount
individual(attrCount) = 0;
for i=1:attrCount
individual(i) = randi(2)-1;
end
population{iInd} = individual;
end

Firstly, you don't need a loop to generate a random string of 0's and 1's. Try this instead:
individual = randi([0 1],[attrCount,1]);
Secondly, again, you don't need a loop to construct your population cell. Try this instead:
population=arrayfun(#(x)randi([0 1],[attrCount,1]),1:individualsCount,'UniformOutput',false)
You might have to change the order of rows and columns depending on how you want to set it up.
Now, coming to your question, you ought to understand that these distributions are stochastic and approach a truly uniform distribution of 50% 1s and 50% 0s only as your sample size approaches infinity. If your attrCount is small enough, do not be surprised if you don't find numbers close to 50% for each. That doesn't mean it is wrong. It is what it is.
Here's how the distribution of 1s looks like for a random binary vector of different sample sizes. You can see that for small sample sizes, there is high variability (and by no means is this exact... it will be different each time), whereas as you start approaching large sample sizes of 1000 and above, your distribution of 1s gets closer and closer to 50%, eventually being exactly 50% at infinity.

Related

How to specify min and max values in a skewnorm distribution?

I want to use the skewnorm function to generate random numbers. I want the numbers to come close to zero, but never less than or equal to it. Is there a way to specify a minium value?
By definition, skew normal distribution allows any number to be generated just as the normal distribution does (its density is positive on the entire real line). So if your numbers are never less than or equal to 0, they don't come from skew normal distribution.
To ensure you never get a non-positive number you can e.g. throw out any such number you get and sample again in a loop; you just won't get a skew normal distribution this way.

How does random number generation ensure reproducibility?

While reading about Transfer Learning with MATLab I came across a piece of code which says...
rng(2016) % For reproducibility
convnet = trainNetwork(trainDigitData,layers,options);
...before training the network so that the results can be reproduced exactly as given in the example by anyone who tries that code. I would like to know how generating a pseudo-random number using rng(seed_value) function can help with reproduciblity of the entire range of results?
Not random number generation, the random number generator seed.
There is no such things as random numbers, just pseudo-random numbers, numbers that behave almost as random, generally arising from some complex mathematical function, function that usually requires an initial value. Often, computers get this initial value from the time register in the microchip in your PC, thus "ensuring" randomness.
However, if you have an algorithm that is based in random numbers (e.g. a NN), reproducibility may be a problem when you want to share your results. Someone that re-runs your code will be ensured to get different results, as randomness is part of the algorithm. But, you can tell the random number generator to instead of starting from a seed taken randomly, to start from a fixed seed. That will ensure that while the numbers generated are random between themseves, they are the same each time (e.g. [3 84 12 21 43 6] could be the random output, but ti will always be the same).
By setting a seed for your NN, you ensure that for the same data, it will output the same result, thus you can make your code "reproducible", i.e. someone else can run your code and get EXACTLY the same results.
As a test I suggest you try the following:
rand(1,10)
rand(1,10)
and then try
rng(42)
rand(1,10)
rng(42)
rand(1,10)
Wikipedia for Pseudo-random number generator
Because some times is good to use the same random numbers, this is what matlab says about that
Set the seed and generator type together when you want to:
Ensure that the behavior of code you write today returns the same results when you run that code in a future MATLABĀ® release.
Ensure that the behavior of code you wrote in a previous MATLAB release returns the same results using the current release.
Repeat random numbers in your code after running someone else's random number code
this is te point of repating the seed, and generate the same random numbers. matlab points it out in two good articles one for repeating numbers and one for different numbers
You dont want to start with weights all equal zeros, so in the initializing stage you give the weights some random value. There maybe other random values involved in searching for minimum later in the learning process, or in the way you feed your data.
So the real input to all neural network learning process is your data and the random number generator.
If they are the same, than all going to be the same.
And 'rng' command put the random number generator in predefined state so it will generate same sequence of number.
anquegi's answer, pretty much answers your question, so this post is just to elaborate a bit more.
Whenever you ask for a random number, what MATLAB really does, is that it generates a pseudo random number, which has distribution U(0,1) (that is the uniform on [0,1]) This is done via some deterministic formula, typically something like, see Linear congruential generator:
X_{n+1} = (a X_{n} + b) mod M
then a uniform number is obtained by U = X_{n+1}/M.
There is, however, a problem, If you want X_{1}, then you need X_{0}. You need to initialise the generator, this is the seed. This also means that once X_{0} is specified you will draw the same random numbers, every time. Try open a new MATLAB instance, run randn, close MATLAB, open it again and run randn again. It will be the same number. That is because MATLAB always uses the same seed whenever it is opened.
So what you do with rng(2016) is that you "reset" the generator, and put X_{0} = 2016, such that you now know all numbers that you ask for, and thus reproduce the results.

Random number generation with Poisson distribution in Matlab

I am trying to simulate an arrival process of vehicles to an intersection in Matlab. The vehicles are randomly generated with Poisson distribution.
LetĀ“s say that in one diraction there is the intensity of the traffic flow 600 vehicles per hour. From what I understood from theory, the lambda of the Poisson distribution should be 600/3600 (3600 sec in 1 hour).
Then I run this cycle:
for i = 1:3600
vehicle(i) = poissrnd(600/3600);
end
There is one problem: when I count the "ones" in the array vehicle there are never 600 ones, it is always some number around, like 567, 595 and so on.
The question is, am I doing it wrong, i.e. should lambda be different? Or is it normal, that the numbers will never be equal?
If you generate a random number, you can have an expectation of the output.
If you actually knew the output it would not be random anymore.
As such you are not doing anything wrong.
You could make your code a bit more elegant though.
Consider this vectorized approach:
vehicle = poissrnd(600/3600,3600,1)
If you always want the numbers to be the same (for example to reproduce results) try setting the state of your random generator.
If you have a modern version (without old code) you could do it like so:
rng(983722)

Generating a random number with low precision MATLAB

I want to generate a large number of random numbers (uniformly distributed on the interval [0,1]). Currently the generation of these random numbers is causing my program to run quite slowly, however the program only needs them to be calculated to around 5 decimal places.
I'm not entirely sure of how MATLAB generates random numbers, but if there is a way of only calculating them to 5 decimal places then it will (hopefully greatly) speed up my program.
Is there a way of doing such a thing?
Thanks very much.
To answer your question, yes, you can generate single precision random numbers, like this:
r = rand(..., 'single'); %Reference: http://www.mathworks.com/help/matlab/ref/rand.html
Single precision numbers have 7 (ish) significant figures when printed as decimal.
To echo some comments above, I don't think this will buy you much performance. The first thing to do if rand is really your slow operation is to batch the calls. That is, instead of:
for ix 1:1000
y = rand(1,1,'single);
end
use:
yVector = rand(1000,1,'single');
As already mentioned, you can instruct RAND to generate numbers directly as single precision, and it's definitely best to generate the numbers in a decent sized chunk. If you still need more performance, and you have Parallel Computing Toolbox and a supported NVIDIA GPU, the gpuArray.rand function can be even faster, especially if you select the philox generator like so:
parallel.gpu.RandStream('Philox4x32-10')
Assuming you actually have a proper code layout where you generate a lot of numbers in an array, this can be a solution for low precision. Note that I have not tested but it is mentioned to be fast:
R = randi([0 100000],500,300)/100000
This will generate 150000 low precision random numbers between 0 and 1

Measuring Frequency of Square wave in MATLAB using USB 1024HLS

I'm trying to measure the frequency of a square wave which is read through a USB 1024 HLS Daq module through MATLAB. What I've done is create a loop which reads 100 values from the digitial input and that gives me vector of 0's and 1's. There is also a timer in this loop which measures the duration for which the loop runs.
After getting the vector, I then count the number of 1's and then use frequency = num_transitions/time to give me the frequency. However, this doesn't seem to work well :( I keep getting different frequencies for different number of iterations of the loop. Any suggestions?
I would suggest trying the following code:
vec = ...(the 100-element vector of digital values)...
dur = ...(the time required to collect the above vector)...
edges = find(diff(vec)); % Finds the indices of transitions between 0 and 1
period = 2*mean(diff(edges)); % Finds the mean period, in number of samples
frequency = 100/(dur*period);
First, the code finds the indices of the transitions from 0 to 1 or 1 to 0. Next, the differences between these indices are computed and averaged, giving the average duration (in number of samples) for the lengths of zeroes and ones. Multiplying this number by two then gives the average period (in number of samples) of the square wave. This number is then multiplied by dur/100 to get the period in whatever the time units of dur are (i.e. seconds, milliseconds, etc.). Taking the reciprocal then gives the average frequency.
One additional caveat: in order to get a good estimate of the frequency, you might have to make sure the 100 samples you collect contain at least a few repeated periods.
Functions of interest used above: DIFF, FIND, MEAN
First of all, you have to make sure that your 100 samples contain at least one full period of the signal, otherwise you'll get false results. You need a good compromise of sample rate (i.e. the more samples per period you have the better the measurement is) and and number of samples.
To be really precise, you should either have a timestamp associated with every measurement (as you usually can't be sure that you get equidistant time spacing in the for loop) or perhaps it's possible to switch your USB module in some "running" mode which doesn't only get one sample at a time but a complete waveform with fixed samplerate.
Concerning the calculation of the frequency, gnovice already pointed out the right way. If you have individual timestamps (in seconds), the following changes are necessary:
tst = ...(the timestamps associated with every sample)...
period = 2*mean(diff(tst(edges)));
frequency = 1/period;
I can't figure out the problem, but if the boolean vector were v then,
frequency = sum(v)/time_to_give_me_the_frequency
Based on your description, it doesn't sound like a problem with the software, UNLESS you are using the Windows system timer, which is notoriously inaccurate (it is only accurate to about 15 milliseconds).
There are high-resolution timers available in Windows, but I don't know how to use them in Matlab. If you have access to the .NET framework, the Stopwatch class has 1 microsecond accuracy (or better), as does the QueryPerformanceCounter API in Win32.
Other than that, you might have some jitter. There could be something in your signal chain that is causing false triggers, etc.
UPDATE: The following CodeProject article should solve the timing problem, if there is one. You should check the Matlab documentation of your version of Matlab to see if it has a native high-resolution timer. Otherwise, you can use this:
C++/Mex wrapper adds microsecond resolution timer to Matlab under WinXP
http://www.codeproject.com/KB/cpp/Matlab_Microsecond_Timer.aspx
mersenne31:
Thanks everyone for your responses. I have tried the solutions that gnovice and groovingandi mentioned and I'm sure they will work as soon as the timing issue is solved.
The code I've used is shown below:
for i=1:100 tic; value = getvalue(portCH); vector(i) = value(1); tst(i) = toc; % gets an individual time sample end
% to get the total time I put total_time = toc after the for loop
totaltime = sum(tst); edges = find(diff(vec)); % Finds the indices of transitions between 0 and 1 period = 2*mean(diff(edges)); % Finds the mean period, in number of samples frequency = 100/(totaltime*period);
The problem is that measuring the time for one sample doesn't really help because it is nearly the same for all samples. What is needed is, as groovingandi mentioned, some "running" mode which reads 100 samples for 3 seconds.
So something like for(3 seconds) and then we do the data capture. But I can't find anything like this. Is there any function in MATLAB that could do this?
This won't answer your question, but it's what I thought of after reading you question. square waves have infinite frequency. The FFT of a square wave it sin(x)/x, which goes from -inf to +inf.
Also try counting only the rising edges in matlab. You can quantize the signal to just +1 and 0, and then only increment the count when you see [0 1] slice of your vector.
OR
You can quantize, decimate, then just sum. This will only work if the each square pulse is the same length and your sampling frequency is constant. I think this one would be harder to do.