I made a system that detects and counts traffic violations specifically vehicular obstructions in the pedestrian crossing lane. My inputs are videos. To test the program, I'd compare the violation count from my manual observation of the video (ground truth) against the violation count from my program.
Example:
Video 1
Ground Truth: 10 violations
Program Count: 8 violations (False accept: 2, False Reject: 4)
FAR: 2/8 = 25%
FRR: 4/8 = 50%
Overall accuracy: (8 violations - 2 false accepts) / 10 total violations = 60%
Are my computations correct especially the overall accuracy? Also what is the formula for the equal error rate (EER)?
FAR and FRR should be computed relatively to the number of observations, not the expected number of positive observations.
EDIT
As an example, imagine there have been 100 observations and your program splitted them in 8 violations (including 2 false accept) and 92 non violations (including 4 false reject) when it should have been 10 violations and 90 non violations, then :
FAR = 2/100 = 2%
FRR = 4/100 = 4%
I think accuracy is correct as the program has indeed detected 60% of the violations.
Related
I used fast_SIR to simulate 5400 epidemics, 30 each of my 180 networks of 1000 nodes. I did not state a node to start as initially infectious and therefore from my understanding it chooses a node at random as initially infectious. I calculated the expected proportion of epidemics of size 1 by considering cases in which the initial random infectious node does not transmit the infection to any of its neighbours. From my understanding the fast_SIR states that the time to transmission and recovery rates are sampled from the exponential distribution with rates tau and gamma respectively.
P(initial random node does not transmit to N neighbours) = sum_{N = 0}^{10} P(of N neighbours for initial node)*P(initial node does not transmit to a neighbour)^N
Where P(initial node does not transmit to a neighbour) is the stationary distribution for the initial node and is (transmission rate)/(transmission rate+recovery rate) = tau/gamma+tau = 0.06/(0.06+0.076)
I accounted for the probability that the initial random node is an isolated node since not all my networks are fully connected.
The actual proportion of epidemics of size 1 = 0.26 and the expected proportion of epidemics of size 1 using the above formula was 0.095
I don't know what I am not accounting for in the formula and what other scenarios or cases would lead to epidemics of size 1... Any advice would be appreciated.
I have been reading up on DBI on Wikipedia, which references this research paper: http://www.cs.columbia.edu/~cs4823/handouts/stan-burleson-tvlsi-95.pdf
The paper says:
While the maximum number of transitions is reduced by half the
decrease in the average number of transitions is not as good. For an
8-bit bus for example the average number of transitions per time-slot
by using the Bus-invert coding becomes 3.27 (instead of 4), or 0.41
(instead of 0.5) transitions per bus-line per time-slot.
However this would suggest it reduces the entropy of the 8 bit message, no?
So the entropy of a random 8 bit message is 8 (duh). Add a DBI bit which shifts the probability distribution to the left, but it (I thought) wouldn't reduce the area under the curve. You should still be left with a minimum of 8 bits of entropy, but spread over 9 bits. But they claim the average is now 0.41, instead of 0.5, which suggests the entropy is now -log( (0.59)^9) = ~6.85. I would have assumed the average would (at best) become 0.46 (-log(0.54 ^9) = ~8).
Am I misunderstanding something?
So I'm modelling a production line (simple, with 5 processes which I modelled as Services). I'm simulating for 1 month, and during this one month, my line stops approximately 50 times (due to a machine break down). This stop can last between 3 to 60 min and the avg = 12 min (depending on a triangular probability). How could I implement this to the model? I'm trying to create an event but can't figure out what type of trigger I should use.
Have your services require a resource. If they are already seizing a resource like labor, that is ok, they can require more than one. On the resourcePool, there is an area called "Shifts, breaks, failures, maintenance..." Check "Failures/repairs:" and enter your downtime distribution there.
If you want to use a triangular, you need min/MODE/max, not min/AVERAGE/max. If you really wanted an average of 12 minutes with a minimum of 3 and maximum of 60; then this is not a triangular distribution. There is no mode that would give you an average of 12.
Average from triangular, where X is the mode:
( 3 + X + 60 ) / 3 = 12
Means X would have to be negative - not possible for there to be a negative delay time for the mode.
Look at using a different distribution. Exponential is used often for time between failures (or poisson for failures per hour).
I have 32 GB of RAM and am training a large dataset using a Keras sequential neural network on a Windows 7 machine. Because of the size of the dataset, I have opted to use fit_generator taking in around 5000 samples in each batch which has about 500 features each. I have a gc.collect() in the generator to address the potential memory leak, which helped in previous iterations of this model.
For the first few steps of the first epoch, memory consumption is low. Then after around 15 steps, it starts to increase and decrease until eventually it caps off at 27.6 GB.
Can anyone explain why the memory usage increases over time? Also, its been hundreds of steps for this first epoch, and the memory is still sitting at 27.6 GB. Does this have any significance?
The NN itself is 3 layers deep, with 50 neurons in each. I understand that there are some memory requirements for storing the weights, but would this increase over time?
def gen_data(max,rows,skip):
import gc
while True:
data = pd.read_csv(csv,skiprows=range(1,skip),nrows=rows,index_col=0)
x,y = features(data)
yield x,y
skip += rows
if max is not None and skip >= max:
skip = 0
gc.collect()
model=Sequential()
model.add(Dense(50, input_dim = train_shape, activation='linear'))
model.add(LeakyReLU())
model.add(Dropout(0.2))
model.add(Dense(50, input_dim = train_shape, activation='linear'))
model.add(LeakyReLU())
model.add(Dropout(0.2))
model.add(Dense(50, input_dim = train_shape, activation='linear'))
model.add(LeakyReLU())
model.add(Dropout(0.2))
model.add(Dense(1,activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam')
hist = model.fit_generator(gen_data(8000000,5000),epochs=50,
steps_per_epoch=int(8000000/5000),verbose=1,callbacks=callbacks_list,
class_weight=class_weight,validation_steps=10,validation_data=gen_data(800000,80000))
-- edit --
When removing validation_steps and validation_data, the process does not blow up in memory. This seems like odd behavior because I would not expect the validation data to be used until the end of the epoch. Any ideas?
I've read a few ideas on the correct sample size for Feed Forward Neural networks. x5, x10, and x30 the # of weights. This part I'm not overly concerned about, what I am concerned about is can I reuse my training data (randomly).
My data is broken up like so
5 independent vars and 1 dependent var per sample.
I was planning on feeding 6 samples in (6x5 = 30 input neurons), confirm the 7th samples dependent variable (1 output neuron.
I would train on neural network by running say 6 or 7 iterations. before trying to predict the next iteration outside of my training data.
Say I have
each sample = 5 independent variables & 1 dependent variables (6 vars total per sample)
output = just the 1 dependent variable
sample:sample:sample:sample:sample:sample->output(dependent var)
Training sliding window 1:
Set 1: 1:2:3:4:5:6->7
Set 2: 2:3:4:5:6:7->8
Set 3: 3:4:5:6:7:8->9
Set 4: 4:5:6:7:8:9->10
Set 5: 5:6:7:6:9:10->11
Set 6: 6:7:8:9:10:11->12
Non training test:
7:8:9:10:11:12 -> 13
Training Sliding Window 2:
Set 1: 2:3:4:5:6:7->8
Set 2: 3:4:5:6:7:8->9
...
Set 6: 7:8:9:10:11:12->13
Non Training test: 8:9:10:11:12:13->14
I figured I would randomly run through my set's per training iteration say 30 times the number of my weights. I believe in my network I have about 6 hidden neurons (i.e. sqrt(inputs*outputs)). So 36 + 6 + 1 + 2 bias = 45 weights. So 44 x 30 = 1200 runs?
So I would do a randomization of the 6 sets 1200 times per training sliding window.
I figured due to the small # of data, I was going to do simulation runs (i.e. rerun over the same problem with new weights). So say 1000 times, of which I do 1140 runs over the sliding window using randomization.
I have 113 variables, this results in 101 training "sliding window".
Another question I have is if I'm trying to predict up or down movement (i.e. dependent variable). Should I match to an actual # or just if I guessed up/down movement correctly? I'm thinking I should shoot for an actual number, but as part of my analysis do a % check on if this # is guessed correctly as up/down.
If you have a small amount of data, and a comparatively large number of training iterations, you run the risk of "overtraining" - creating a function which works very well on your test data but does not generalize.
The best way to avoid this is to acquire more training data! But if you cannot, then there are two things you can do. One is to split the training data into test and verification data - using say 85% to train and 15% to verify. Verification means compute the fitness of the learner on the training set, without adjusting the weights/training. When the verification data fitness (which you are not training on) stops improving (in general it will be noisy), and your training data fitness continues improving - stop training. If on the other hand you use a "sliding window", you may not have a good criterion to know when to stop training - the fitness function will bounce around in unpredictable ways (you might slowly make the effect of each training iteration have less effect on the parameters, however, to give you convergence... maybe not the best approach but some training regimes do this) The other thing you can do normalize out your node's weights via some metric to ensure some notion of 'smoothness' - if you visualize overfitting for a second you'll find that in the extreme case your fitness function sharply curves around your dataset positives...
As for the latter question - for the training to converge, you fitness function needs to be smooth. If you were to just use binary all-or-nothing fitness terms, most likely what would happen is that whatever algorithm you are using to train (backprop, BGFS, etc...) would not converge. In practice, the classification criterion should be an activation that is above for a positive result, less than or equal to for a negative result, and varies smoothly in your weight/parameter space. You can think of 0 as "I am certain that the answer is up" and 1 as "I am certain that the answer is down", and thus realize a fitness function that has a higher "cost" for incorrect guesses that were more certain... There are subtleties possible in how the function is shaped (for example you might have different ideas about how acceptable a false negative and false positive are) - and you may also introduce regions of "uncertain" where the result is closer to "zero weight" - but it should certainly be continuous/smooth.
You can re-use sliding window's.
It basically the same concept as bootstrapping (your training set); which in itself reduces training time, but don't know if it's really helpful in making the net more adaptive to anything other than the training data.
Below is an example of a sliding window in pictorial format (using spreadsheet magic)
http://i.imgur.com/nxhtgaQ.png
https://github.com/thistleknot/FredAPI/blob/05f74faf85d15f6898aa05b9b08d5363fe27c473/FredAPI/Program.cs
Line 294 shows how the code is ran using randomization, it resets the randomization at position 353 so the rest flows as normal.
I was also able to use a 1 (up) or 0 (down) as my target values and the network did converge.