Not getting the expected proportion of epidemics of size 1 using fast_SIR - networkx

I used fast_SIR to simulate 5400 epidemics, 30 each of my 180 networks of 1000 nodes. I did not state a node to start as initially infectious and therefore from my understanding it chooses a node at random as initially infectious. I calculated the expected proportion of epidemics of size 1 by considering cases in which the initial random infectious node does not transmit the infection to any of its neighbours. From my understanding the fast_SIR states that the time to transmission and recovery rates are sampled from the exponential distribution with rates tau and gamma respectively.
P(initial random node does not transmit to N neighbours) = sum_{N = 0}^{10} P(of N neighbours for initial node)*P(initial node does not transmit to a neighbour)^N
Where P(initial node does not transmit to a neighbour) is the stationary distribution for the initial node and is (transmission rate)/(transmission rate+recovery rate) = tau/gamma+tau = 0.06/(0.06+0.076)
I accounted for the probability that the initial random node is an isolated node since not all my networks are fully connected.
The actual proportion of epidemics of size 1 = 0.26 and the expected proportion of epidemics of size 1 using the above formula was 0.095
I don't know what I am not accounting for in the formula and what other scenarios or cases would lead to epidemics of size 1... Any advice would be appreciated.

Related

Lottery Ticket Hypothesis - Iterative Pruning

I was reading about The Lottery Ticket Hypothesis and it was mentioned in the paper:
we focus on iterative pruning, which repeatedly trains, prunes, and
resets the network over n rounds; each round prunes (p^(1/n))% of the
weights that survive the previous round.
Can someone please explain this say for each round with numbers, when n = 5 (rounds) and the final sparsity desired (p) = 70%.
In this example, the numbers I computed are as follows:
Round (p^(1/n))% of weights pruned
1 0.93114999
2 0.86704016
3 0.80734437
4 0.75175864
5 0.7
According to these calculations, it seems that the first round prunes 93.11% (approx) of the weights, whereas, the fifth round prunes 70% of the weights. It's as if as the rounds progress, the percentage of weights being pruned decreases.
What am I doing wrong?
Thanks!
You are using p^(1/n). As you increase n after each iteration, your p^(1/n) term decreases!

What do you do if the sample size for an A/B test is larger than the population?

I have a list of 7337 customers (selected because they only had one booking from March-August 2018). We are going to contact them and are trying to test the impact of these activities on their sales. The idea is that contacting them will cause them to book more and increase the sales of this largely inactive group.
I have to setup an A/B test and am currently stuck on the sample size calculation.
Here's my sample data:
Data
The first column is their IDs and the second column is the total sales for this group for 2 weeks in January (i took 2 weeks as the customers in this group purchase very infrequently).
The metric I settled on was Revenue per customer (RPC = total revenue/total customer) so I can take into account both the number of orders and the average order value of the group.
The RPC for this group is $149,482.7/7337=$20.4
I'd like to be able to detect at least a 5% increase in this metric at 80% power and 5% significance level. First I calculated the effect size.
Standard Deviation of the data set = 153.9
Effect Size = (1.05*20.4-20.4)/153.9 = 0.0066
I then used the pwr package in R to calculate the sample size.
pwr.t.test(d=0.0066, sig.level=.05, power = .80, type = 'two.sample')
Two-sample t test power calculation
n = 360371.048
d = 0.0066
sig.level = 0.05
power = 0.8
alternative = two.sided
The sample size I am getting however is 360,371. This is larger than the size of my population (7337).
Does this mean I can not run my test at sufficient power? The only way I can determine to lower the sample size without compromising on significance or power is to increase the effect size to determine a minimum increase of 50% which would give me an n=3582.
That sounds like a pretty high impact and I'm not sure that high of an impact is reasonable to expect.
Does this mean I can't run an A/B test here to measure impact?

Simulating 'RSSI for LTE network' in matlab

The Definition of RSSI is 'Total received wide-band power by UE'
I have confusion in understanding that what is meant by wide-band here.
My understanding is as follows
a. In case if Carrier Bandwidth of LTE channel is 10 MHz,total
bandwidth is 10 MHz and hence RSSI is calculated for all the all the
Resource blocks i.e. 50 RB
b. Each RB has 12 subcarriers. Hence for 10 MHz channel, 50 RB are
dedicated => 12x50 = 60 subchannels
c. Finally (Assuming same Pt for all subcarrier) Pr =
Pt*(c/4*pidF*)2 . What is the F should I put and is this the right
way to calculate the Pr? In my opinions F = 15KhZ. Where Pr is
Received Power per subcarrier by UE. (Using 
d. Finally RSSI = Pr*12*50  [ or dBm 10*log10(Pr) + 10*log10(12*50)]
Or RSSI is calculated in any different way in simulation / How can I make simulation model to calculate RSSI. Do I have to make a model with a complete RB with all Resource Elements (RE). If yes, then do I need to do the scheduling as well?
How can I calculate SINR with RSSI?
Thanks
Shan
RSSI is computed taking a power density integrated over the bandwidth of interest. So for the same average power density, 5 MHz LTE versus 10 MHz LTE will have a 3dB difference in RSSI.
RSSI will vary with data/control traffic as well since it considers all subcarriers. So yes, what the scheduler is doing matters.
RSSI alone is insufficient to compute SINR. You will need RSRP as well. (or a derived quantity like RSRQ)

Tensorflow Inception Multiple GPU Training Loss is not Summed?

I am trying to go through Tensorflow's inception code for multiple GPUs (on 1 machine). I am confused because we get multiple losses from the different towers, aka the GPUs, as I understand, but the loss variable evaluated seems to only be of the last tower and not a sum of the losses from all towers:
for step in xrange(FLAGS.max_steps):
start_time = time.time()
_, loss_value = sess.run([train_op, loss])
duration = time.time() - start_time
Where loss was last defined specifically for each tower:
for i in xrange(FLAGS.num_gpus):
with tf.device('/gpu:%d' % i):
with tf.name_scope('%s_%d' % (inception.TOWER_NAME, i)) as scope:
# Force all Variables to reside on the CPU.
with slim.arg_scope([slim.variables.variable], device='/cpu:0'):
# Calculate the loss for one tower of the ImageNet model. This
# function constructs the entire ImageNet model but shares the
# variables across all towers.
loss = _tower_loss(images_splits[i], labels_splits[i], num_classes,
scope)
Could someone explain where the step is to combine the losses from different towers? Or are we simply a single tower's loss as representative of the other tower's losses as well?
Here's the link to the code:
https://github.com/tensorflow/models/blob/master/inception/inception/inception_train.py#L336
For monitoring purposes, considering all towers work as expected, single tower's loss is as representative as average of all towers' losses. This is due to the fact that there is no relation between batch and tower it is assigned to.
But the train_op uses gradients from all towers, as per line 263, 278 so technically training takes into account batches from all towers, as it should be.
Note, that average of losses will have lower variance than single tower's loss, but they will have the same expectation.
Yes, according to this code, losses are not summed or averaged across gpus. Loss per gpu is used inside of each gpu (tower) for gradient calculation. Only gradients are synchronized. So the isnan test is only done for the portion of data processed by the last gpu. This is not crucial but can be a limitation.
If really needed, I think you can do as follows to get averaged loss cross gpus:
per_gpu_loss = []
for i in xrange(FLAGS.num_gpus):
with tf.device('/gpu:%d' % i):
with tf.name_scope('%s_%d' % (inception.TOWER_NAME, i)) as scope:
...
per_gpu_loss.append(loss)
mean_loss = tf.reduce_mean(per_gpu_loss, name="mean_loss")
tf.summary.scalar('mean_loss', mean_loss)
and then replace loss in sess.run as mean_loss:
_, loss_value = sess.run([train_op, mean_loss])
loss_value is now an average across losses processed by all the gpus.

Re-Use Sliding Window data for Neural Network for Time Series?

I've read a few ideas on the correct sample size for Feed Forward Neural networks. x5, x10, and x30 the # of weights. This part I'm not overly concerned about, what I am concerned about is can I reuse my training data (randomly).
My data is broken up like so
5 independent vars and 1 dependent var per sample.
I was planning on feeding 6 samples in (6x5 = 30 input neurons), confirm the 7th samples dependent variable (1 output neuron.
I would train on neural network by running say 6 or 7 iterations. before trying to predict the next iteration outside of my training data.
Say I have
each sample = 5 independent variables & 1 dependent variables (6 vars total per sample)
output = just the 1 dependent variable
sample:sample:sample:sample:sample:sample->output(dependent var)
Training sliding window 1:
Set 1: 1:2:3:4:5:6->7
Set 2: 2:3:4:5:6:7->8
Set 3: 3:4:5:6:7:8->9
Set 4: 4:5:6:7:8:9->10
Set 5: 5:6:7:6:9:10->11
Set 6: 6:7:8:9:10:11->12
Non training test:
7:8:9:10:11:12 -> 13
Training Sliding Window 2:
Set 1: 2:3:4:5:6:7->8
Set 2: 3:4:5:6:7:8->9
...
Set 6: 7:8:9:10:11:12->13
Non Training test: 8:9:10:11:12:13->14
I figured I would randomly run through my set's per training iteration say 30 times the number of my weights. I believe in my network I have about 6 hidden neurons (i.e. sqrt(inputs*outputs)). So 36 + 6 + 1 + 2 bias = 45 weights. So 44 x 30 = 1200 runs?
So I would do a randomization of the 6 sets 1200 times per training sliding window.
I figured due to the small # of data, I was going to do simulation runs (i.e. rerun over the same problem with new weights). So say 1000 times, of which I do 1140 runs over the sliding window using randomization.
I have 113 variables, this results in 101 training "sliding window".
Another question I have is if I'm trying to predict up or down movement (i.e. dependent variable). Should I match to an actual # or just if I guessed up/down movement correctly? I'm thinking I should shoot for an actual number, but as part of my analysis do a % check on if this # is guessed correctly as up/down.
If you have a small amount of data, and a comparatively large number of training iterations, you run the risk of "overtraining" - creating a function which works very well on your test data but does not generalize.
The best way to avoid this is to acquire more training data! But if you cannot, then there are two things you can do. One is to split the training data into test and verification data - using say 85% to train and 15% to verify. Verification means compute the fitness of the learner on the training set, without adjusting the weights/training. When the verification data fitness (which you are not training on) stops improving (in general it will be noisy), and your training data fitness continues improving - stop training. If on the other hand you use a "sliding window", you may not have a good criterion to know when to stop training - the fitness function will bounce around in unpredictable ways (you might slowly make the effect of each training iteration have less effect on the parameters, however, to give you convergence... maybe not the best approach but some training regimes do this) The other thing you can do normalize out your node's weights via some metric to ensure some notion of 'smoothness' - if you visualize overfitting for a second you'll find that in the extreme case your fitness function sharply curves around your dataset positives...
As for the latter question - for the training to converge, you fitness function needs to be smooth. If you were to just use binary all-or-nothing fitness terms, most likely what would happen is that whatever algorithm you are using to train (backprop, BGFS, etc...) would not converge. In practice, the classification criterion should be an activation that is above for a positive result, less than or equal to for a negative result, and varies smoothly in your weight/parameter space. You can think of 0 as "I am certain that the answer is up" and 1 as "I am certain that the answer is down", and thus realize a fitness function that has a higher "cost" for incorrect guesses that were more certain... There are subtleties possible in how the function is shaped (for example you might have different ideas about how acceptable a false negative and false positive are) - and you may also introduce regions of "uncertain" where the result is closer to "zero weight" - but it should certainly be continuous/smooth.
You can re-use sliding window's.
It basically the same concept as bootstrapping (your training set); which in itself reduces training time, but don't know if it's really helpful in making the net more adaptive to anything other than the training data.
Below is an example of a sliding window in pictorial format (using spreadsheet magic)
http://i.imgur.com/nxhtgaQ.png
https://github.com/thistleknot/FredAPI/blob/05f74faf85d15f6898aa05b9b08d5363fe27c473/FredAPI/Program.cs
Line 294 shows how the code is ran using randomization, it resets the randomization at position 353 so the rest flows as normal.
I was also able to use a 1 (up) or 0 (down) as my target values and the network did converge.