JAGS - hierarchical model comparison not jumping between models even with pseudopriors - winbugs

I'm using the hierarchical modelling framework described by Kruschke to set up a comparison between two models in JAGS. The idea in this framework is to run and compare multiple versions of a model, by specifying each version as one level of a categorical variable. The posterior distribution of this categorical variable then can be interpreted as the relative probability of the various models.
In the code below, I'm comparing two models. The models are identical in form. Each has a single parameter that needs to be estimated, mE. As can be seen, the models differ in their priors. Both priors are distributed as beta distributions that have a mode of 0.5. However, the prior distribution for model 2 is a much more concentrated. Note also that I've used pseudo priors that I had hoped would keep the chains from getting stuck on one of the models. But the model seems to get stuck anyway.
Here is the model:
model {
m ~ dcat( mPriorProb[] )
mPriorProb[1] <- .5
mPriorProb[2] <- .5
omegaM1[1] <- 0.5 #true prior
omegaM1[2] <- 0.5 #psuedo prior
kappaM1[1] <- 3 #true prior for Model 1
kappaM1[2] <- 5 #puedo prior for Model 1
omegaM2[1] <- 0.5 #psuedo prior
omegaM2[2] <- 0.5 #true prior
kappaM2[1] <- 5 #puedo prior for Model 2
kappaM2[2] <- 10 #true prior for Model 2
for ( s in 1:Nsubj ) {
mE1[s] ~ dbeta(omegaM1[m]*(kappaM1[m]-2)+1 , (1-omegaM1[m])*(kappaM1[m]-2)+1 )
mE2[s] ~ dbeta(omegaM2[m]*(kappaM2[m]-2)+1 , (1-omegaM2[m])*(kappaM2[m]-2)+1 )
mE[s] <- equals(m,1)*mE1[s] + equals(m,2)*mE2[s]
z[s] ~ dbin( mE[s] , N[s] )
}
}
Here is R code for the relevant data:
dataList = list(
z = c(81, 59, 36, 18, 28, 59, 63, 57, 42, 28, 47, 55, 38,
30, 22, 32, 31, 30, 32, 33, 32, 26, 13, 33, 30),
N = rep(96, 25),
Nsubj = 25
)
When I run this model, the MCMC spends every single iteration at m = 1, and never jumps over to m = 2. I've tried lots of different combinations of priors and pseudo priors, and can't seem to find a combination in which the MCMC will consider m = 2. I've even tried specifying identical priors and pseudo priors for models 1 and 2, but this was no help. In this situation, I would expect the MCMC to jump fairly frequently between models, spending about half the time considering one model, and half the time considering the other. However, JAGS still spent the whole time at m = 1. I've run chains as long as 6000 iterations, which should be more than long enough for a simple model like this.
I would really appreciate if anyone has any thoughts on how to resolve this issue.
Cheers,
Tim

I haven't been able to figure this out, but I thought that anybody else who works on this might appreciate the following code, which will reproduce the problem start-to-finish from R with rjags (must have JAGS installed).
Note that since there are only two competing models in this example, I changed m ~ dcat() to m ~ dbern(), and then replaced m with m+1 everywhere else in the code. I hoped this might ameliorate the behavior, but it did not. Note also that if we specify the initial value for m, it stays stuck at that value regardless of which value we pick, so m just fails to get updated properly (instead of getting weirdly attracted to one model or the other). A head-scratcher for me; could be worth posting for Martyn's eyes at http://sourceforge.net/p/mcmc-jags/discussion/
library(rjags)
load.module('glm')
dataList = list(
z = c(81, 59, 36, 18, 28, 59, 63, 57, 42, 28, 47, 55, 38,
30, 22, 32, 31, 30, 32, 33, 32, 26, 13, 33, 30),
N = rep(96, 25),
Nsubj = 25
)
sink("mymodel.txt")
cat("model {
m ~ dbern(.5)
omegaM1[1] <- 0.5 #true prior
omegaM1[2] <- 0.5 #psuedo prior
kappaM1[1] <- 3 #true prior for Model 1
kappaM1[2] <- 5 #puedo prior for Model 1
omegaM2[1] <- 0.5 #psuedo prior
omegaM2[2] <- 0.5 #true prior
kappaM2[1] <- 5 #puedo prior for Model 2
kappaM2[2] <- 10 #true prior for Model 2
for ( s in 1:Nsubj ) {
mE1[s] ~ dbeta(omegaM1[m+1]*(kappaM1[m+1]-2)+1 , (1-omegaM1[m+1])*(kappaM1[m+1]-2)+1 )
mE2[s] ~ dbeta(omegaM2[m+1]*(kappaM2[m+1]-2)+1 , (1-omegaM2[m+1])*(kappaM2[m+1]-2)+1 )
z[s] ~ dbin( (1-m)*mE1[s] + m*mE2[s] , N[s] )
}
}
", fill=TRUE)
sink()
inits <- function(){list(m=0)}
params <- c("m")
nc <- 1
n.adapt <-100
n.burn <- 200
n.iter <- 5000
thin <- 1
mymodel <- jags.model('mymodel.txt', data = dataList, inits=inits, n.chains=nc, n.adapt=n.adapt)
update(mymodel, n.burn)
mymodel_samples <- coda.samples(mymodel,params,n.iter=n.iter, thin=thin)
summary(mymodel_samples)

The trick is not assigning a fixed probability for the model, but rather estimating it (phi below) based on a uniform prior. You then want the posterior distribution for phi as that tells you the probability of selecting model 2 (ie, a "success" means m=1; Pr(model 1) = 1-phi).
sink("mymodel.txt")
cat("model {
m ~ dbern(phi)
phi ~ dunif(0,1)
omegaM1[1] <- 0.5 #true prior
omegaM1[2] <- 0.5 #psuedo prior
kappaM1[1] <- 3 #true prior for Model 1
kappaM1[2] <- 5 #puedo prior for Model 1
omegaM2[1] <- 0.5 #psuedo prior
omegaM2[2] <- 0.5 #true prior
kappaM2[1] <- 5 #puedo prior for Model 2
kappaM2[2] <- 10 #true prior for Model 2
for ( s in 1:Nsubj ) {
mE1[s] ~ dbeta(omegaM1[m+1]*(kappaM1[m+1]-2)+1 , (1-omegaM1[m+1])*(kappaM1[m+1]-2)+1 )
mE2[s] ~ dbeta(omegaM2[m+1]*(kappaM2[m+1]-2)+1 , (1-omegaM2[m+1])*(kappaM2[m+1]-2)+1 )
z[s] ~ dbin( (1-m)*mE1[s] + m*mE2[s] , N[s] )
}
}
", fill=TRUE)
sink()
inits <- function(){list(m=0)}
params <- c("phi")

See my comment above on Mark S's answer.
This answer is to show by example why we want inference on m and not on phi.
Imagine we have a model given by
data <- c(-1, 0, 1, .5, .1)
m~dbern(phi)
data[i] ~ m*dnorm(0, 1) + (1-m)*dnorm(100, 1)
Now, it is obvious that the true value of m is 1. But what do we know about the true value of phi? Obviously higher values of phi are more likely, but we don't actually have good evidence to rule out lower values of phi. For example, phi=0.1 still has a 10% chance of yielding m=1; and phi=0.5 still has a 50% chance of yielding m=1. So we don't have good evidence against fairly low values of phi, even though we have ironclad evidence that m=1. We want inference on m.

Related

Using a grouped z-score over a rolling window

I would like to calculate a z-score over a bin based on the data of a rolling look-back period.
Example
Todays visitor amount during [9:30-9:35) should be z-score normalized based off the (mean, std) of the last 3 days of visitors that visited during [9:30-9:35).
My current attempts both raise InvalidOperationError. Is there a way in polars to calculate this?
import polars as pl
def z_score(col: str, over: str, alias: str):
# calculate z-score normalized `col` over `over`
return (
(pl.col(col)-pl.col(col).over(over).mean()) / pl.col(col).over(over).std()
).alias(alias)
df = pl.from_dict(
{
"timestamp": pd.date_range("2019-12-02 9:30", "2019-12-02 12:30", freq="30s").union(
pd.date_range("2019-12-03 9:30", "2019-12-03 12:30", freq="30s")
),
"visitors": [(e % 2) + 1 for e in range(722)]
}
# 5 minute bins for grouping [9:30-9:35) -> 930
).with_column(
pl.col("timestamp").dt.truncate(every="5m").dt.strftime("%H%M").cast(pl.Int32).alias("five_minute_bin")
).with_column(
pl.col("timestamp").dt.truncate(every="3d").alias("daytrunc")
)
# normalize visitor amount for each 5 min bin over the rolling 3 day window using z-score.
# not rolling but also wont work (InvalidOperationError: window expression not allowed in aggregation)
# df.with_column(
# z_score("visitors", "five_minute_bin", "normalized").over("daytrunc")
# )
# won't work either (InvalidOperationError: window expression not allowed in aggregation)
#df.groupby_rolling(index_column="daytrunc", period="3i").agg(z_score("visitors", "five_minute_bin", "normalized"))
For an example of 4 days of data with four data-points each lying in two time-bins ({0,0} - {0,1}), ({1,0} - {1,1})
Input:
Day 0: x_d0_{0,0}, x_d0_{0,1}, x_d0_{1,0}, x_d0_{1,1}
Day 1: x_d1_{0,0}, x_d1_{0,1}, x_d1_{1,0}, x_d1_{1,1}
Day 2: x_d2_{0,0}, x_d2_{0,1}, x_d2_{1,0}, x_d2_{1,1}
Day 3: x_d3_{0,0}, x_d3_{0,1}, x_d3_{1,0}, x_d3_{1,1}
Output:
Day 0: norm_x_d0_{0,0} = nan, norm_x_d0_{0,1} = nan, norm_x_d0_{1,0} = nan, norm_x_d0_{1,1} = nan
Day 1: norm_x_d1_{0,0} = nan, norm_x_d1_{0,1} = nan, norm_x_d1_{1,0} = nan, norm_x_d1_{1,1} = nan
Day 2: norm_x_d2_{0,0} = nan, norm_x_d2_{0,1} = nan, norm_x_d2_{1,0} = nan, norm_x_d2_{1,1} = nan
Day 3: norm_x_d3_{0,0} = (x_d3_{0,0} - np.mean([x_d0_{0,0}, x_d0_{0,1}, X_d1_{0,0}, ..., x_d3_{0,1}]) / np.std([x_d0_{0,0}, x_d0_{0,1}, X_d1_{0,0}, ..., x_d3_{0,1}])) , ... ,
They key here is to use over to restrict your calculations to the five minute bins and then use the rolling functions to get the rolling mean and standard deviation over days restricted by those five minute bin keys. five_minute_bin works as in your code and I believe that a truncated day_bin is necessary so that, for example, 9:33 on one day will include 9:31 both 9:34 on the same and 9:31 from 2 days ago.
days = 5
pl.DataFrame(
{
"timestamp": pl.concat(
[
pl.date_range(
datetime(2019, 12, d, 9, 30), datetime(2019, 12, d, 12, 30), "30s"
)
for d in range(2, days + 2)
]
),
"visitors": [(e % 2) + 1 for e in range(days * 361)],
}
).with_columns(
five_minute_bin=pl.col("timestamp").dt.truncate(every="5m").dt.strftime("%H%M"),
day_bin=pl.col("timestamp").dt.truncate(every="1d"),
).with_columns(
standardized_visitors=(
(
pl.col("visitors")
- pl.col("visitors").rolling_mean("3d", by="day_bin", closed="right")
)
/ pl.col("visitors").rolling_std("3d", by="day_bin", closed="right")
).over("five_minute_bin")
)
Now, that said, when trying out the code for this, I found polars doesn't handle non-unique values in the by-column in the rolling function correctly, so that the same values in the same 5-minute bin don't end up as the same standardized values. Opened bug report here: https://github.com/pola-rs/polars/issues/6691. For large amounts of real world data, this shouldn't actually matter that much, unless your data systematically differs in distribution within the 5 minute bins.

How to check basic integer order relations in sympy?

I have a (possibly relatively large) set of assumptions about multiple integers like {x > -1, x < 5, x != 2, y > 0, x-2 < y} and I would like to check whether certain other propositions like {x > -5, x == 3, ...} either true, false or could be both.
The docs say that explicit relationships like Q.is_true(x < 3) are not supported, so I tried using .positive property, but without any luck, e.g.
# x > -1 => x > -3 - ?
x = sympy.Symbol('x')
with sympy.assuming(sympy.Q.positive(x+1), sympy.Q.integer(x)):
print(sympy.ask(sympy.Q.positive(x+3)))
produces
None
Which means that the checker gave up on checking that.
Refine also does not seem to help much (probably uses assumptions anyway)
sympy.refine(x > 0, sympy.Q.is_true(x > -1))
If there's a different library that can check that, that also works!
I have found that python bindings for the z3 solver best fitted for my problem. One can just download the binary release from the github page and add included folder into $PYTHONPATH , e.g.
LD_LIBRARY_PATH=${Z3FOLDER}/bin PYTHONPATH=${Z3FOLDER}/bin/python python
then these relations could be checked as
from z3 import *
x = Int('x')
s = Solver()
s.add(x > 10)
s.add(x > 12)
print(s) // [x > 10, x > 12]
print(s.check()) // sat
print(s.model()) // [x = 13]

Kafka Streams - Filter messages that appear frequently in a time window

I am trying to filter for any messages whose key appears more often than a threshold N in a given (hopping) time window of length T.
For example, in the following stream:
#time, key
0, A
1, B
2, A
3, C
4, D
5, A
6, B
7, C
8, C
9, D
10, A
11, D
12, D
13, D
14, D
15, D
and N=2 and T=3, the outcome should be
0, A
2, A
7, C
8, C
9, D
11, D
12, D
13, D
14, D
15, D
Alternatively, if the above is not possible, a simplification would be only to filter for the messages after the threshold has been met:
#time, key
2, A
8, C
11, D
12, D
13, D
14, D
15, D
Is this possible with Kafka Streams?
So far I have tried to create a windowed count (instance of KTable) of the stream and join it back to the original stream. I change the key of the windowed count back to the original key using KTable#toStream((k,v) -> k.key()) and performing a dummy aggregation back to an instance of KTable. This seems to introduce a delay which causes the leftJoin to miss messages which come very close after the threshold is exceeded.
final Serde<String> stringSerde = Serdes.String();
final Serde<Long> longSerde = Serdes.Long();
KStream<String, Long> wcount = source.groupByKey()
.count(TimeWindows.of(TimeUnit.SECONDS.toMillis(5)),"Counts")
.toStream((k,v) -> k.key());
// perform dummy aggregation to get KTable
KTable<String, Long> wcountTable = wcount.groupByKey(stringSerde, longSerde)
.reduce((aggValue, newValue) -> newValue,
"dummy-aggregation-store");
// left join and filter with threshold N=1
source.leftJoin(wcountTable, (leftValue, rightValue) -> rightValue,stringSerde, stringSerde )
.filter((k,v) -> v!=null)
.filter((k,v) -> v>1)
.print("output");
I have also tried to perform a KStream-KStream join with an appropriate window (leaving out the dummy aggregation):
source.join(wcount, (leftValue, rightValue) -> rightValue, JoinWindows.of(TimeUnit.SECONDS.toMillis(5)),stringSerde, stringSerde, longSerde)
.filter((k,v) -> v!=null)
.filter((k,v) -> v>1)
.print("output");
This results in duplicate outputs since each UPSERT into wcount triggers an event.
This is certainly possible. You can apply a windowed aggregation that collect all raw data in a list (ie, you manually materialize the window). Afterwards, you apply a flatMap that evaluates the window. If the threshold is not met yet, you emit nothing. If the threshold is met for the first time, you emit all buffered data. For all further calls of flatMap with a larger count than the threshold, you only emit the latest one in the list (you know that you did emit all others an the call to flatMap before, ie, emit only the newly added one).
Note: you need to disable KTable cache, ie, set config parameter "cache.max.bytes.buffering" = 0. Otherwise, the algorithms won't work correctly.
Something like this:
KStream<Windowed<K>, List<V>> windows = stream.groupByKey()
.aggregate(
/*init with empty list*/,
/*add value to list in agg*/,
TimeWindows.of()...),
...)
.toStream();
KStream<K,V> thresholdMetStream = windows.flatMap(
/* if List#size < threshold
then return empty-list, ie, nothing
elseif List#size == threshold
then return whole list
else [List#size > threshold]
then return last element from list
*/);
AFAIK this is the perfect fit for the Count-Min-Sketch algorithm. See for example the stream-lib implementation:
https://github.com/addthis/stream-lib

How can I sum up functions that are made of elements of the imported dataset?

See the code and error. I have already tried Do, For,...and it is not working.
CODE + Error from Mathematica:
Import of survival probabilities _{k}p_x and _{k}p_y (calculated in excel)
px = Import["C:\Users\Eva\Desktop\kpx.xlsx"];
px = Flatten[Take[px, All], 1];
NOTE: The probability _{k}p_x can be found on the position px[[k+2, x -16]
i = 0.04;
v = 1/(1 + i);
JointLifeIndep[x_, y_, n_] = Sum[v^k*px[[k + 2, x - 16]]*py[[k + 2, y - 16]], {k , 0, n - 1}]
Part::pkspec1: The expression 2+k cannot be used as a part specification.
Part::pkspec1: The expression 2+k cannot be used as a part specification.
Part::pkspec1: The expression 2+k cannot be used as a part specification.
General::stop: Further output of Part::pkspec1 will be suppressed during this calculation.
Part of dataset (left corner of the dataset):
k\x 18 19 20
0 1 1 1
1 0.999478086278185 0.999363078716059 0.99927911905056
2 0.998841497412202 0.998642656911039 0.99858030519133
3 0.998121451605207 0.99794428814123 0.99788275311401
4 0.997423447323642 0.997247180349674 0.997174407432264
5 0.996726703362208 0.996539285828369 0.996437857252448
6 0.996019178300768 0.995803204773039 0.99563600297737
7 0.995283481416241 0.995001861216016 0.994823584922968
8 0.994482556091416 0.994189960607964 0.99405569519175
9 0.993671079225432 0.99342255996206 0.993339856748282
10 0.992904079096455 0.992707177451333 0.992611817294026
11 0.992189069953677 0.9919796017009 0.991832027835091
Without having the exact same data files to work with it is often easy for each of us to make mistakes that the other cannot reproduce or understand.
From your snapshot of your data set I used Export in Mathematica to try to reproduce your .xlsx file. Then I tried the following
px = Import["kpx.xlsx"];
px = Flatten[Take[px, All], 1];
py = px; (* fake some py data *)
i = 0.04;
v = 1/(1 + i);
JointLifeIndep[x_, y_, n_] := Sum[v^k*px[[k+2,x-16]]*py[[k+2,y-16]], {k,0,n-1}];
JointLifeIndep[17, 17, 12]
and it displays 362.402
Notice I used := instead of = in my definition of JointLifeIndep. := and = do different things in Mathematica. = will immediately evaluate the right hand side of that definition. This is possibly the reason that you are getting the error that you do.
You should also be careful with your subscript values and make sure that every subscript is between 1 and the number of rows (or columns) in your matrix.
So see if you can try this example with an Excel sheet containing only the snapshot of data that you showed and see if you get the same result that I do.
Hopefully that will be enough for you to make progress.

Torch: back-propagation from loss computed over a subset of the output

I have a simple convolutional neural network, whose output is a single channel 4x4 feature map. During training, the (regression) loss needs to be computed only on a single value among the 16 outputs. The location of this value will be decided after the forward pass. How do I compute the loss from just this one output, while making sure all irrelevant gradients are zero'ed out during back-prop.
Let's say I have the following simple model in torch:
require 'nn'
-- the input
local batch_sz = 2
local x = torch.Tensor(batch_sz, 3, 100, 100):uniform(-1,1)
-- the model
local net = nn.Sequential()
net:add(nn.SpatialConvolution(3, 128, 9, 9, 9, 9, 1, 1))
net:add(nn.SpatialConvolution(128, 1, 3, 3, 3, 3, 1, 1))
net:add(nn.Squeeze(1, 3))
print(net)
-- the loss (don't know how to employ it yet)
local loss = nn.SmoothL1Criterion()
-- forward'ing x through the network would result in a 2x4x4 output
y = net:forward(x)
print(y)
I have looked at nn.SelectTable and it seems like if I convert the output into tabular form I would be able to implement what I want?
This is my current solution. It works by splitting the output into a table, and then using nn.SelectTable():backward() to get the full gradient:
require 'nn'
-- the input
local batch_sz = 2
local x = torch.Tensor(batch_sz, 3, 100, 100):uniform(-1,1)
-- the model
local net = nn.Sequential()
net:add(nn.SpatialConvolution(3, 128, 9, 9, 9, 9, 1, 1))
net:add(nn.SpatialConvolution(128, 1, 3, 3, 3, 3, 1, 1))
net:add(nn.Squeeze(1, 3))
-- convert output into a table format
net:add(nn.View(1, -1)) -- vectorize
net:add(nn.SplitTable(1, 1)) -- split all outputs into table elements
print(net)
-- the loss
local loss = nn.SmoothL1Criterion()
-- forward'ing x through the network would result in a (2)x4x4 output
y = net:forward(x)
print(y)
-- returns the output table's index belonging to specific location
function get_sample_idx(feat_h, feat_w, smpl_idx, feat_r, feat_c)
local idx = (smpl_idx - 1) * feat_h * feat_w
return idx + feat_c + ((feat_r - 1) * feat_w)
end
-- I want to back-propagate the loss of this sample at this feature location
local smpl_idx = 2
local feat_r = 3
local feat_c = 4
-- get the actual index location in the output table (for a 4x4 output feature map)
local out_idx = get_sample_idx(4, 4, smpl_idx, feat_r, feat_c)
-- the (fake) ground-truth
local gt = torch.rand(1)
-- compute loss on the selected feature map location for the selected sample
local err = loss:forward(y[out_idx], gt)
-- compute loss gradient, as if there was only this one location
local dE_dy = loss:backward(y[out_idx], gt)
-- now convert into full loss gradient (zero'ing out irrelevant losses)
local full_dE_dy = nn.SelectTable(out_idx):backward(y, dE_dy)
-- do back-prop through who network
net:backward(x, full_dE_dy)
print("The full dE/dy")
print(table.unpack(full_dE_dy))
I would really appreciate it somebody points out a simpler OR more efficient method.