Random number generation with Poisson distribution in Matlab - matlab

I am trying to simulate an arrival process of vehicles to an intersection in Matlab. The vehicles are randomly generated with Poisson distribution.
Let´s say that in one diraction there is the intensity of the traffic flow 600 vehicles per hour. From what I understood from theory, the lambda of the Poisson distribution should be 600/3600 (3600 sec in 1 hour).
Then I run this cycle:
for i = 1:3600
vehicle(i) = poissrnd(600/3600);
end
There is one problem: when I count the "ones" in the array vehicle there are never 600 ones, it is always some number around, like 567, 595 and so on.
The question is, am I doing it wrong, i.e. should lambda be different? Or is it normal, that the numbers will never be equal?

If you generate a random number, you can have an expectation of the output.
If you actually knew the output it would not be random anymore.
As such you are not doing anything wrong.
You could make your code a bit more elegant though.
Consider this vectorized approach:
vehicle = poissrnd(600/3600,3600,1)
If you always want the numbers to be the same (for example to reproduce results) try setting the state of your random generator.
If you have a modern version (without old code) you could do it like so:
rng(983722)

Related

Activation function to get day of week

I'm writing a program to predict when will something happens. I don't know which activation function to get output in day of week (1-7).
I tried sigmoid function but i need to input the predicted day and it output probability of it, I don't want it to be this way.
I expect the activation function returning 0 to infinite, is ReLU the best activation function for this task?
EDIT:
also, what if i wanted output more than 7 days, for example, x will hapen in 9th day from today, or 15th day from today, etc? I'm looking for dynamic ways to do this
What you are trying to do is solving a classification problem with a regression approach. That's at least unconventional.
You can use any activation function you want and define your output as you want. E.g. linear, relu with output range from 1 to 7 or something between -1(or 0) and 1 like tanh or sigmoid and map the output (-1 -> 1; -0.3 -> 2; ...).
The problem for you will be that you get a floatingpoint number as a result. So your model not only has to learn how to classify correctly but also how to predict the (allmost) exact number you want in your output neuron. That makes the problem more complicated than it has to be. With a model like that it also will be likley that for some outlier datapoints you might get unexpected return values like 0, -1 or 8. What do you do then?
To sum it up: Listen to #venkata krishnan, use softmax and seven output neurons and map this result to a number between 1 and 7 outside the neural network if you have to.
EDIT
What comes to my mind after reading the comments again would be a mix of what you want and what you should do.
You could try to make the second last layer a 7 neuron softmax layer and map those output to a single neuron in the last layer.
Niether did i ever try that nor have i ever read about something like that so i can't tell you if thats a good idea, likely not, but you might consider it worth a try.
I want to add onto the point of #venkata krishnan, which raises a valid point in your problem setting. You will find an answer to your original question further down, but I strongly suggeste you read the following comment first.
Generally, you want to discern between categorical, ordinal and interval variables. I have given a relatively lengthy explanation in a different answer on Stackoverflow, it might be helpful to understand this concept in more detail.
In your scenario, you mostly want to have an understanding of "how wrong" you are. Of course, it is perfectly reasonable to assume what you are doing and interpret it as a interval variable, and therefore have an assumed ordering (and a distance) between different values.
What is problematic, though, is the fact that you are assuming a continuous space on a discrete variable. E.g., it does not make any sense to interpret the output of 4.3, since you can only tell between 4 (Friday, assuming you start numbering your days at 0), or 5 (Saturday). Any value in between would have to be rounded, which is perfectly fine - until you want to perform backpropagation on this loss.
It is problematic, because you are essentially introducing a non-convex and non-continous function, no matter how you "round" your values. Again, to exemplify this, you could assume to round to the nearest number; then, at the value of 4.5, you would see a sudden increase in the loss, which is non-differentialbe, and will therefore put a hard time on your optimizer, potentially limiting convergence of your system.
If, instead, you utilize several output neurons, as suggested by #venkata krishnan, you might lose the information of distance (how many days you are off) on paper, but you can of course still interpret your loss in any way you like. This would certainly be the better option for a discrete-valued variable.
To answer your original question: I personally would make sure that your loss function is bounded both in the upper and lower level, as you could otherwise have undefined/inconsistent loss values, that might lead to subpar optimization. One way to do this is to re-scale a Sigmoid function (the co-domain of sigmoid(R) is [0,1]. Eventually, you can then just multiply your output by 6, to get a value range that is [0,6], and could (after rounding) cover all the values you want.
As far I know, there is no such thing like an activation function which will yield 0 to infinite. You can apply 7 output nodes with a "Softmax" activation function which will return the probability. There is another solution which may work. You can you 3 output nodes with "Binary" activation function which will return either 0 or 1. That means you can have 8 different outputs with only 3 nodes which are 000, 001, 010, 011, 100, 101, 110 and 111. You can use 7 of them. 

How does random number generation ensure reproducibility?

While reading about Transfer Learning with MATLab I came across a piece of code which says...
rng(2016) % For reproducibility
convnet = trainNetwork(trainDigitData,layers,options);
...before training the network so that the results can be reproduced exactly as given in the example by anyone who tries that code. I would like to know how generating a pseudo-random number using rng(seed_value) function can help with reproduciblity of the entire range of results?
Not random number generation, the random number generator seed.
There is no such things as random numbers, just pseudo-random numbers, numbers that behave almost as random, generally arising from some complex mathematical function, function that usually requires an initial value. Often, computers get this initial value from the time register in the microchip in your PC, thus "ensuring" randomness.
However, if you have an algorithm that is based in random numbers (e.g. a NN), reproducibility may be a problem when you want to share your results. Someone that re-runs your code will be ensured to get different results, as randomness is part of the algorithm. But, you can tell the random number generator to instead of starting from a seed taken randomly, to start from a fixed seed. That will ensure that while the numbers generated are random between themseves, they are the same each time (e.g. [3 84 12 21 43 6] could be the random output, but ti will always be the same).
By setting a seed for your NN, you ensure that for the same data, it will output the same result, thus you can make your code "reproducible", i.e. someone else can run your code and get EXACTLY the same results.
As a test I suggest you try the following:
rand(1,10)
rand(1,10)
and then try
rng(42)
rand(1,10)
rng(42)
rand(1,10)
Wikipedia for Pseudo-random number generator
Because some times is good to use the same random numbers, this is what matlab says about that
Set the seed and generator type together when you want to:
Ensure that the behavior of code you write today returns the same results when you run that code in a future MATLAB® release.
Ensure that the behavior of code you wrote in a previous MATLAB release returns the same results using the current release.
Repeat random numbers in your code after running someone else's random number code
this is te point of repating the seed, and generate the same random numbers. matlab points it out in two good articles one for repeating numbers and one for different numbers
You dont want to start with weights all equal zeros, so in the initializing stage you give the weights some random value. There maybe other random values involved in searching for minimum later in the learning process, or in the way you feed your data.
So the real input to all neural network learning process is your data and the random number generator.
If they are the same, than all going to be the same.
And 'rng' command put the random number generator in predefined state so it will generate same sequence of number.
anquegi's answer, pretty much answers your question, so this post is just to elaborate a bit more.
Whenever you ask for a random number, what MATLAB really does, is that it generates a pseudo random number, which has distribution U(0,1) (that is the uniform on [0,1]) This is done via some deterministic formula, typically something like, see Linear congruential generator:
X_{n+1} = (a X_{n} + b) mod M
then a uniform number is obtained by U = X_{n+1}/M.
There is, however, a problem, If you want X_{1}, then you need X_{0}. You need to initialise the generator, this is the seed. This also means that once X_{0} is specified you will draw the same random numbers, every time. Try open a new MATLAB instance, run randn, close MATLAB, open it again and run randn again. It will be the same number. That is because MATLAB always uses the same seed whenever it is opened.
So what you do with rng(2016) is that you "reset" the generator, and put X_{0} = 2016, such that you now know all numbers that you ask for, and thus reproduce the results.

One-time randomization

I have a matrix, ECGsig, with each row containing a 1-second-long ECG signal,
I will classify them later but I want to randomly change the rows like,
idx = randperm(size(ECGsig,1));
ECGsig = ECGsig(idx,:);
However I want this to happen just once and not every time that I run the program,
Or in other words to have the random numbers generated only once,
Because if it changes every time I would have different results for classification,
Is there any way to do this beside doing in a separate m file and saving it in a mat file?
Thanks,
You can set the random generation seed so that every time you run a random result, it will generate the same random result each time. You can do this through rng. This way, even though run the program multiple times, it will still generate the same random sequence regardless. As such, try doing something like:
rng(1234);
The input into rng would be the seed. However, as per Luis Mendo's comment, rng is only available with newer versions of MATLAB. Should rng not be available with your distribution of MATLAB, do this instead:
rand('seed', 1234);
You can also take a look at randstream, but that's a bit too advanced so let's not look at it right now. To reset the seed to what it was before you opened MATLAB, choose a seed of 0. Therefore:
rng(0); %// or
rand('seed', 0);
By calling this, any random results you generate from this point will be based on a pre-determined order. The seed can be any integer you want really, but use something that you'll remember. Place this at the very beginning of your code before you do anything. The main reason why we have control over how random numbers are generated is because this encourages the production of reproducible results and research. This way, other people can generate the results you have created should you decide to do anything with random or randomizing.
Even though you said you only want to run this randomization once, this will save you the headache of saving your results to a different file before you run the program multiple times. By setting the seed, even though you're running the program multiple times, you're guaranteed to generate the same random sequence each time.

Uniform Random Number blocks in my simulation model

I've used 2 Uniform Random Number blocks in my simulation model, but every time I run the program they generate last numbers (exactly the same). I need to test the model with new generated numbers. what should I do?
thanks for your helps in advance
The fact that random number generators generate the same random numbers "from the start" is a feature, not a bug. It allows for reproducible testing. You need to initialize your random number generator with a "random seed" in order to give a different result each time - you could use the current time, for example. When you do, it is recommended that you store the seed used - it means you can go back and run exactly the same code again.
For initializing a random seed, you can use the methods given in this earlier answer
In that answer, they are setting the seed to 0 - this is the opposite of what you are trying to do. You will want to generate a non-random number (like the date), and use that. A very useful article can be found here. To quote:
If you look at the output from rand, randi, or randn in a new MATLAB
session, you'll notice that they return the same sequences of numbers
each time you restart MATLAB. It's often useful to be able to reset
the random number generator to that startup state, without actually
restarting MATLAB. For example, you might want to repeat a calculation
that involves random numbers, and get the same result.
They recommend the command
rng shuffle
To generate a new random seed. You can access the seed that was used with
rng.seed
and store that for future use. So if you co
rng shuffle
seedStore = rng.seed;
Then next time you want to reproduce results, you set
rng(seedStore);

Measuring Frequency of Square wave in MATLAB using USB 1024HLS

I'm trying to measure the frequency of a square wave which is read through a USB 1024 HLS Daq module through MATLAB. What I've done is create a loop which reads 100 values from the digitial input and that gives me vector of 0's and 1's. There is also a timer in this loop which measures the duration for which the loop runs.
After getting the vector, I then count the number of 1's and then use frequency = num_transitions/time to give me the frequency. However, this doesn't seem to work well :( I keep getting different frequencies for different number of iterations of the loop. Any suggestions?
I would suggest trying the following code:
vec = ...(the 100-element vector of digital values)...
dur = ...(the time required to collect the above vector)...
edges = find(diff(vec)); % Finds the indices of transitions between 0 and 1
period = 2*mean(diff(edges)); % Finds the mean period, in number of samples
frequency = 100/(dur*period);
First, the code finds the indices of the transitions from 0 to 1 or 1 to 0. Next, the differences between these indices are computed and averaged, giving the average duration (in number of samples) for the lengths of zeroes and ones. Multiplying this number by two then gives the average period (in number of samples) of the square wave. This number is then multiplied by dur/100 to get the period in whatever the time units of dur are (i.e. seconds, milliseconds, etc.). Taking the reciprocal then gives the average frequency.
One additional caveat: in order to get a good estimate of the frequency, you might have to make sure the 100 samples you collect contain at least a few repeated periods.
Functions of interest used above: DIFF, FIND, MEAN
First of all, you have to make sure that your 100 samples contain at least one full period of the signal, otherwise you'll get false results. You need a good compromise of sample rate (i.e. the more samples per period you have the better the measurement is) and and number of samples.
To be really precise, you should either have a timestamp associated with every measurement (as you usually can't be sure that you get equidistant time spacing in the for loop) or perhaps it's possible to switch your USB module in some "running" mode which doesn't only get one sample at a time but a complete waveform with fixed samplerate.
Concerning the calculation of the frequency, gnovice already pointed out the right way. If you have individual timestamps (in seconds), the following changes are necessary:
tst = ...(the timestamps associated with every sample)...
period = 2*mean(diff(tst(edges)));
frequency = 1/period;
I can't figure out the problem, but if the boolean vector were v then,
frequency = sum(v)/time_to_give_me_the_frequency
Based on your description, it doesn't sound like a problem with the software, UNLESS you are using the Windows system timer, which is notoriously inaccurate (it is only accurate to about 15 milliseconds).
There are high-resolution timers available in Windows, but I don't know how to use them in Matlab. If you have access to the .NET framework, the Stopwatch class has 1 microsecond accuracy (or better), as does the QueryPerformanceCounter API in Win32.
Other than that, you might have some jitter. There could be something in your signal chain that is causing false triggers, etc.
UPDATE: The following CodeProject article should solve the timing problem, if there is one. You should check the Matlab documentation of your version of Matlab to see if it has a native high-resolution timer. Otherwise, you can use this:
C++/Mex wrapper adds microsecond resolution timer to Matlab under WinXP
http://www.codeproject.com/KB/cpp/Matlab_Microsecond_Timer.aspx
mersenne31:
Thanks everyone for your responses. I have tried the solutions that gnovice and groovingandi mentioned and I'm sure they will work as soon as the timing issue is solved.
The code I've used is shown below:
for i=1:100 tic; value = getvalue(portCH); vector(i) = value(1); tst(i) = toc; % gets an individual time sample end
% to get the total time I put total_time = toc after the for loop
totaltime = sum(tst); edges = find(diff(vec)); % Finds the indices of transitions between 0 and 1 period = 2*mean(diff(edges)); % Finds the mean period, in number of samples frequency = 100/(totaltime*period);
The problem is that measuring the time for one sample doesn't really help because it is nearly the same for all samples. What is needed is, as groovingandi mentioned, some "running" mode which reads 100 samples for 3 seconds.
So something like for(3 seconds) and then we do the data capture. But I can't find anything like this. Is there any function in MATLAB that could do this?
This won't answer your question, but it's what I thought of after reading you question. square waves have infinite frequency. The FFT of a square wave it sin(x)/x, which goes from -inf to +inf.
Also try counting only the rising edges in matlab. You can quantize the signal to just +1 and 0, and then only increment the count when you see [0 1] slice of your vector.
OR
You can quantize, decimate, then just sum. This will only work if the each square pulse is the same length and your sampling frequency is constant. I think this one would be harder to do.