Should the output of backpropogation converge to 1 given the output is (0,1) - neural-network

I am currently trying to understand the ANN that I created for an assignment that essentially takes gray scale (0-150)images (120x128) and determines whether the person is Male or Female. It works for the most part. I am treating this like a boolean problem where the output(Male = 1, Female = 0). I am able to get the ANN to correctly identify Male or Female. However the outputs I am getting for the Males are (0.3-0.6) depending on the run. Should I be getting the value ~1 out?
I am using a sigmoid unit 1/(1+e^-y) and have tried to take the inverse. I have tried this using 5 - 60 hidden units on 1 layer and tried 2 outputs with flip flop results. I want to understand this so that I can apply this to a non-boolean problem. ie If I want a numerical output how would I got about doing that or am I using the wrong machine learning technique?

You can use binary function at the output with some threshold. Assuming, you have assigned 0 for female and 1 for male in training, while testing you will get values in between 0 and 1 and also some times below 0 and above 1......So to make a decision at the output value just add threshold of 0.5 and check output value, if it is less than 0.5 then estimated class is female and if it is equal to or greater than 0.5 then estimated class is male.

Related

Simulating first come first serve agent distribution

I currently have agents of type patient seizing an exam room when it becomes available, then, to mimic a first come first serve system where different healthcare practitioners see the patient, I have utilized a SelectOutputIn and 4 SelectOutputOut blocks corresponding to the 4 different practitioners that can see the patient. Each SelectOutputOut block has the same probability corresponding to their resource type. The problem arises when all practitioners are busy, it seems to send patients only to the Physio path and overload it. Is this because it is physically the last block in the order? How can I make the distribution of patients random even if all practitioners are busy?
[][1
It is correct. If all probabilities are zero, AnyLogic seems to pick the last option. To address your issue, we need to add a condition for when all practitioners are busy. So you will have 3 possible outcomes as follows:
Surgeons.idle() + Fellows.idle() + Residents.idle() + Physios.idle() == 0 ? 0.25 : Surgeons.idle() > 0 ? 0.25 : 0
The difference between the first and second 0.25 is that in the first case, all ports will have a 0.25 probability. Whereas in the second case, 3 of 4, or 2 of 4 or even 1 of 4 will have a 0.25 probability depending on how many resources are available. AnyLogic normalizes the probabilities. So if two ports have 0.25 probability, then it's like saying it's a 50/50 chance.
Finally, if the code seems too long, you can replace the first part by a function to have a cleaner/shorter code.
The function body could be:
return Surgeons.idle() + Fellows.idle() + Residents.idle() + Physios.idle();
Assume you named the function function, you can simplify your code in the probability field as follows:
function() == 0 ? 0.25 : Surgeons.idle() > 0 ? 0.25 : 0

System Dynamics simulation - Translating Stella into AnyLogic syntax

I modelled the following logic in stella:
(IF "cause" > 0 THEN MONTECARLO("probabilityofconsequence") ELSE 0
But Im not getting the correct syntax on AnyLogic:
(cause > 0) ? (uniform() < probabilityofconsequence) ? 1 : 0 : 0
Any ideas?
Disclaimer:
What stella does is with the Montecarlo function a series of zeros and ones from a Bernoulli distribution based on the probability provided. The probability is the percentage probability of an event happening per DT divided by DT (it is similar too, but not the same as, the percent probability of an event per unit time). The probability value can be either a variable or a constant, but should evaluate to a number between 0 and 100/DT (numbers outside the range will be set to 0 or 100/DT). The expected value of the stream of numbers generated summed over a unit time is equation to probability/100.
MONTECARLO is equivalent to the following logic:
IF (UNIFORM(0,100,<seed>) < probability*DT THEN 1 ELSE 0
the equivalent in anylogic should be:
cause>0 && uniform(0,100) < probability*DT ? 1 : 0
you need to create a variable called DT that is the equal to either the fixed time step that you have chosen in your model configuration, or the value you consider that should be adequate.
Since anylogic depending on how you are running the model, doesn't consider the fixed time step as fixed, you need to define the DT yourself.
No matter what, you are going to get results not exactly equal to stella probably since the time steps are not necessarily the same... but maybe similar enough should satisfy you

Modeling an hrf time series in MATLAB

I'm attempting to model fMRI data so I can check the efficacy of an experimental design. I have been following a couple of tutorials and have a question.
I first need to model the BOLD response by convolving a stimulus input time series with a canonical haemodynamic response function (HRF). The first tutorial I checked said that one can make an HRF that is of any amplitude as long as the 'shape' of the HRF is correct so they created the following HRF in matlab:
hrf = [ 0 0 1 5 8 9.2 9 7 4 2 0 -1 -1 -0.8 -0.7 -0.5 -0.3 -0.1 0 ]
And then convolved the HRF with the stimulus by just using 'conv' so:
hrf_convolved_with_stim_time_series = conv(input,hrf);
This is very straight forward but I want my model to eventually be as accurate as possible so I checked a more advanced tutorial and they did the following. First they created a vector of 20 timepoints then used the 'gampdf' function to create the HRF.
t = 1:1:20; % MEASUREMENTS
h = gampdf(t,6) + -.5*gampdf(t,10); % HRF MODEL
h = h/max(h); % SCALE HRF TO HAVE MAX AMPLITUDE OF 1
Is there a benefit to doing it this way over the simpler one? I suppose I have 3 specific questions.
The 'gampdf' help page is super short and only says the '6' and '10' in each function call represents 'A' which is a 'shape' parameter. What does this mean? It gives no other information. Why is it 6 in the first call and 10 in the second?
This question is directly related to the above one. This code is written for a situation where there is a TR = 1 and the stimulus is very short (like 1s). In my situation my TR = 2 and my stimulus is quite long (12s). I tried to adapt the above code to make a working HRF for my situation by doing the following:
t = 1:2:40; % 2s timestep with the 40 to try to equate total time to above
h = gampdf(t,6) + -.5*gampdf(t,10); % HRF MODEL
h = h/max(h); % SCALE HRF TO HAVE MAX AMPLITUDE OF 1
Because I have no idea what the 'gampdf' parameters mean (or what that line does, in all actuality) I'm not sure this gives me what I'm looking for. I essentially get out 20 values where 1-14 have SOME numeric value in them but 15-20 are all 0. I'm assuming there will be a response during the entire 12s stimulus period (first 6 TRs so values 1-6) with the appropriate rectification which could be the rest of the values but I'm not sure.
Final question. The other code does not 'scale' the HRF to have an amplitude of 1. Will that matter, ultimately?
The canonical HRF you choose is dependent upon where in the brain the BOLD signal is coming from. It would be inappropriate to choose just any HRF. Your best source of a model is going to come from a lit review. I've linked a paper discussing the merits of multiple HRF models. The methods section brings up some salient points.

MSE in neuralnet results and roc curve of the results

Hi my question is a bit long please bare and read it till the end.
I am working on a project with 30 participants. We have two type of data set (first data set has 30 rows and 160 columns , and second data set has the same 30 rows and 200 columns as outputs=y and these outputs are independent), what i want to do is to use the first data set and predict the second data set outputs.As first data set was rectangular type and had high dimension i have used factor analysis and now have 19 factors that cover up to 98% of the variance. Now i want to use these 19 factors for predicting the outputs of the second data set.
I am using neuralnet and backpropogation and everything goes well and my results are really close to outputs.
My questions :
1- as my inputs are the factors ( they are between -1 and 1 ) and my outputs scale are between 4 to 10000 and integer , should i still scaled them before running neural network ?
2-I scaled the data ( both input and outputs ) and then predicted with neuralnet , then i check the MSE error it was so high like 6000 while my prediction and real output are so close to each other. But if i rescale the prediction and outputs then check The MSE its near zero. Is it unbiased to rescale and then check the MSE ?
3- I read that it is better to not scale the output from the beginning but if i just scale the inputs all my prediction are 1. Is it correct to not to scale the outputs ?
4- If i want to plot the ROC curve how can i do it. Because my results are never equal to real outputs ?
Thank you for reading my question
[edit#1]: There is a publication on how to produce ROC curves using neural network results
http://www.lcc.uma.es/~jja/recidiva/048.pdf
1) You can scale your values (using minmax, for example). But only scale your training data set. Save the parameters used in the scaling process (in minmax they would be the min and max values by which the data is scaled). Only then, you can scale your test data set WITH the min and max values you got from the training data set. Remember, with the test data set you are trying to mimic the process of classifying unseen data. Unseen data is scaled with your scaling parameters from the testing data set.
2) When talking about errors, do mention which data set the error was computed on. You can compute an error function (in fact, there are different error functions, one of them, the mean squared error, or MSE) on the training data set, and one for your test data set.
4) Think about this: Let's say you train a network with the testing data set,and it only has 1 neuron in the output layer . Then, you present it with the test data set. Depending on which transfer function (activation function) you use in the output layer, you will get a value for each exemplar. Let's assume you use a sigmoid transfer function, where the max and min values are 1 and 0. That means the predictions will be limited to values between 1 and 0.
Let's also say that your target labels ("truth") only contains discrete values of 0 and 1 (indicating which class the exemplar belongs to).
targetLabels=[0 1 0 0 0 1 0 ];
NNprediction=[0.2 0.8 0.1 0.3 0.4 0.7 0.2];
How do you interpret this?
You can apply a hard-limiting function such that the NNprediction vector only contains the discreet values 0 and 1. Let's say you use a threshold of 0.5:
NNprediction_thresh_0.5 = [0 1 0 0 0 1 0];
vs.
targetLabels =[0 1 0 0 0 1 0];
With this information you can compute your False Positives, FN, TP, and TN (and a bunch of additional derived metrics such as True Positive Rate = TP/(TP+FN) ).
If you had a ROC curve showing the False Negative Rate vs. True Positive Rate, this would be a single point in the plot. However, if you vary the threshold in the hard-limit function, you can get all the values you need for a complete curve.
Makes sense? See the dependencies of one process on the others?

Output Value Of Neural Network Does Not Arrive To Desired Values

I made a neural network that also have Back Propagation.it has 5 nodes in input layer,6 nodes in hidden layer,1 node in output layer and have random weights and i use sigmoid as activation function.
i have two set of data for input.
for example :
13.5 22.27 0 0 0 desired value=0.02
7 19 4 7 2 desired value=0.03
now i train the network with 5000 iteration or iteration will stop if the error
value(desired - calculated output value) is less than or equal to 0.001.
the output value of first iteration for each input set is about 60 And it will decrease in each iteration.
now the problem is that the second set of inputs(that has desired value of 0.03),cause to stop iteration because of calculated output value of 3.001 but the first set of inputs did not arrived to desired value of it(that is 0.02) and its output is about 0.03 .
EDITED :
I used LMS algorithm andchanged the error threshold 0.00001 to find correct error value,but now output value of last iteration for both 0.03 and 0.02 desired value is between 0.023 and 0.027 and that is incorrect yet.
For your error value stop threshold, you should take the error on one epoch (Sum of every error of all your dataset) and not only on one member of you dataset. With this you will have to increase the value of your error threshold but it will force your neural network to do a good classification on all your example and not only on some example.