How can I integrate prior with data exactly? - winbugs

I was told to use Bayesian inference instead of working only analytically with polling data. However, I have a problem; I have a small dataset with guesses about prior distributions for the parties, and I have data from polls. How can I obtain marginals from Gibbs simulations?
prior <- a <- c(.30, .15, .15, .10, .10, .08, .12)
polls <- data.frame(rbind(
a <- c(.24, .23, .20, .11, .08, .08, .06, 3959, .02),
b <- c(.22, .22, .22, .11, .07, .08, .08, 1024, .03),
c <- c(.23, .25, .19, .11, .07, .08, .06, 2099, .02),
d <- c(.19, .27, .18, .10, .04, .08, .06, 1024, .03),
e <- c(.22, .30, .18, .09, .07, .08, .06, 1799, .02)
))
names(polls) <- c("Cons", "Lib", "Lab", "Ind", "Others", "Null", "Swingy",
"Sample.size", "Err")

You can build upon a Dirichlet distribution alpha priors. I didn't test it with your data, so, my answer ill be addressing only the conception.
# K = number of parties
# T = number of periods a:e, I guess
model {
for(t in 1:T){
y[t, 1:k] ~ dmulti(alpha[t, 1:k], N[t])
# Dirichlet priors on the paramenters
alpha[t, 1:k] ~ ddirch(theta[1:k])
N[t] <- sum(y[t,1:k])
# Sample size for dmulti based on observed data
# Inference probabilities
delta[t]<-step(alpha[t,2]-alpha[t,3])
}
for(i in 1:k){ #gamma prior for the alpha vector
theta[i] ~ dgamma(0,0.01) }
}

Related

Why data filtering does not work correctly?

I am getting data from server and I want to filter it by data. I do it with if. But I ran into a problem, I get price data and compare it with my price data, but in the end they are wrong, for some reason the check does not pass, could you tell me what is the reason why the numbers are not compared correctly if I have an <= operator?
List speed = [
7.4,
11,
22,
];
List<double> price = [
0.20,
0.25,
0.30,
0.35,
0.40,
0.45,
0.50,
];
if ((w.power.toInt() == speed[filters.minSpeed]) &&
(double.parse(w.formattedPrice.substring(1)) <=
price[filters.maxPrice]) ||
(i.public == filters.ownershipStationPublic ||
filters.ownershipStationAll)) {
log(double.parse(w.formattedPrice.substring(1)).toString());
log('My price: ${price[filters.maxPrice].toString()}');
filteredStations.add(i);
}
Even if the price is higher, the if statement is also true if either i.public == filters.ownershipStationPublic or filters.ownershipStationAll are true. Maybe this is happening here. Maybe you wanted that first || to be &&

Save ColorFiltered image to disk

In Flutter, if we use the ColorFilter widget, it takes a ColorFilter.matrix and an Image, on which it applies the ColorFilter.matrix.
const ColorFilter sepia = ColorFilter.matrix(<double>[
0.393, 0.769, 0.189, 0, 0,
0.349, 0.686, 0.168, 0, 0,
0.272, 0.534, 0.131, 0, 0,
0, 0, 0, 1, 0,
]);
Container _buildFilterThumbnail(int index, Size size) {
final Image image = Image.file(
widget.imageFile,
width: size.width,
fit: BoxFit.cover,
);
return Container(
padding: const EdgeInsets.all(4.0),
decoration: BoxDecoration(
border: Border.all(color: _selectedIndex == index ? Colors.blue : Theme.of(context).primaryColor, width: 4.0),
),
child: ColorFiltered(
colorFilter: ColorFilter.matrix(filters[index].matrixValues),
child: Container(
height: 80,
width: 80,
child: image,
),
),
);
}
How can we get the underlying image (in pixels/bytes) so that it can be saved on disk. I don't want to save the rendered ColorFiltered area on the screen.
Curretly am forced to use the photofilters library from pub.dev, which does the pixel manipulation to apply the custom filters. However, its not very efficient and essentially applies the pixel level manipulation for every thumbnail, making it very slow. On the other hand, ColorFiltered widget is lightening fast!
Below is internal working of photofilters library
int clampPixel(int x) => x.clamp(0, 255);
// ColorOverlay - add a slight color overlay.
void colorOverlay(Uint8List bytes, num red, num green, num blue, num scale) {
for (int i = 0; i < bytes.length; i += 4) {
bytes[i] = clampPixel((bytes[i] - (bytes[i] - red) * scale).round());
bytes[i + 1] =
clampPixel((bytes[i + 1] - (bytes[i + 1] - green) * scale).round());
bytes[i + 2] =
clampPixel((bytes[i + 2] - (bytes[i + 2] - blue) * scale).round());
}
}
// RGB Scale
void rgbScale(Uint8List bytes, num red, num green, num blue) {
for (int i = 0; i < bytes.length; i += 4) {
bytes[i] = clampPixel((bytes[i] * red).round());
bytes[i + 1] = clampPixel((bytes[i + 1] * green).round());
bytes[i + 2] = clampPixel((bytes[i + 2] * blue).round());
}
}
Any pointers appreciated.

How in Flink CEP can we detect a pattern that last a period of time?

I want to detect a pattern with Flink CEP, here my use case:
I should raise an event when i got the speed of my vehicle above a speedLimit for a laps of time.
Example1: (speedlimit = 100, period=60 seconds)
event1: speed = 50, eventtime=0
event1: speed = 100, eventtime=10
event1: speed = 120, eventtime=30
event1: speed = 150, eventtime=40
event1: speed = 120, eventtime=70
event1: speed = 50, eventtime=90
=> raise 1 event
Example1: (speedlimit = 100, period=60 seconds)
event1: speed = 50, eventtime=0
event1: speed = 100, eventtime=10
event1: speed = 120, eventtime=30
event1: speed = 150, eventtime=40
event1: speed = 60, eventtime=70
=> raise 0 event
Please, your help.
I would approach this by looking for a sequence of 2 or more events where the speed is greater than or equal to 100 for all of them, and where the timestamp of the last one minus the timestamp of the first one is greater than or equal to 60.
By the way, you may find MATCH_RECOGNIZE is easier to work with, but either it or CEP should be fine for this use case.

Why strange timing results appear in spite of binding a thread to a specific CPU-core?

I'm doing some experiments with low-latency programming. I want to eliminate context switching, and be able to reliably measure latency without affecting performance too much.
To begin with, I wrote a program that requests time in loop 1M times and than prints statistics (code below), since I wanted to know how much time the call to the timer is taking. Surprisingly, the output is the following (in microseconds):
Mean: 0.59, Min: 0.49, Max: 25.77
Mean: 0.59, Min: 0.49, Max: 11.73
Mean: 0.59, Min: 0.42, Max: 14.11
Mean: 0.59, Min: 0.42, Max: 13.34
Mean: 0.59, Min: 0.49, Max: 11.45
Mean: 0.59, Min: 0.42, Max: 14.25
Mean: 0.59, Min: 0.49, Max: 11.80
Mean: 0.59, Min: 0.42, Max: 12.08
Mean: 0.59, Min: 0.49, Max: 21.02
Mean: 0.59, Min: 0.42, Max: 12.15
As you can see, although the average time is less than one microsecond,
there are spikes of up to 20 microseconds. That's despite the fact that the code runs on a dedicated core (affinity set to a specific core, while the affinity of the init process is set to a group of other cores), and that hyper-threading is disabled on the machine. Tried it with multiple kernel versions, including preemptive and RT, and the results are essentially the same.
Can you explain the huge difference between mean and max?Is the problem in the calls to the timer, or with a process isolation?
I also tried this with calls to other timers -- CLOCK_THREAD_CPUTIME_ID, - CLOCK_MONOTONIC,- CLOCK_PROCESS_CPUTIME_ID- and the pattern observed was the same...
#include <time.h>
#include <sched.h>
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#define min(a, b) ((a) < (b) ? (a) : (b))
#define max(a, b) ((a) > (b) ? (a) : (b))
uint64_t
time_now()
{
struct timespec ts;
clock_gettime(CLOCK_REALTIME, &ts);
return ts.tv_sec * 1000000000 + ts.tv_nsec;
}
void
set_affinity(int cpu)
{
cpu_set_t set;
CPU_ZERO(&set);
CPU_SET(cpu, &set);
if (sched_setaffinity(0, sizeof(set), &set))
{
perror("sched_setaffinity");
}
}
#define NUM 1000000
#define LOOPS 10
int
main(int argc, char **argv)
{
set_affinity(3);
for (int loop = 0; loop < LOOPS; ++ loop)
{
uint64_t t_0 = time_now();
uint64_t sum_val = 0;
uint64_t max_val = 0;
uint64_t min_val = uint64_t(-1);
for (int k = 0; k < NUM; ++ k)
{
uint64_t t_1 = time_now();
uint64_t t_diff = t_1 - t_0;
sum_val += t_diff;
min_val = min(t_diff, min_val);
max_val = max(t_diff, max_val);
t_0 = t_1;
}
printf("Mean: %.2f, Min: %.2f, Max: %.2f\n", ((double )sum_val)/NUM/1000, ((double )min_val)/1000, ((double)max_val)/1000);
}
return 0;
}
Two sources of unpredictability are going to be from device interrupts and timers. While you have set the affinity of userspace processes, interrupts from devices will be occurring and will affect your process. The kernel will also use timers to occur at periodic ticks so it can keep track of time. I would say these are going to be your first two main sources of unpredictability. Inter-Process Interrupts (IPIs) used for cores to signal each other are going to be another but probably not as high as the first two.

JAGS - hierarchical model comparison not jumping between models even with pseudopriors

I'm using the hierarchical modelling framework described by Kruschke to set up a comparison between two models in JAGS. The idea in this framework is to run and compare multiple versions of a model, by specifying each version as one level of a categorical variable. The posterior distribution of this categorical variable then can be interpreted as the relative probability of the various models.
In the code below, I'm comparing two models. The models are identical in form. Each has a single parameter that needs to be estimated, mE. As can be seen, the models differ in their priors. Both priors are distributed as beta distributions that have a mode of 0.5. However, the prior distribution for model 2 is a much more concentrated. Note also that I've used pseudo priors that I had hoped would keep the chains from getting stuck on one of the models. But the model seems to get stuck anyway.
Here is the model:
model {
m ~ dcat( mPriorProb[] )
mPriorProb[1] <- .5
mPriorProb[2] <- .5
omegaM1[1] <- 0.5 #true prior
omegaM1[2] <- 0.5 #psuedo prior
kappaM1[1] <- 3 #true prior for Model 1
kappaM1[2] <- 5 #puedo prior for Model 1
omegaM2[1] <- 0.5 #psuedo prior
omegaM2[2] <- 0.5 #true prior
kappaM2[1] <- 5 #puedo prior for Model 2
kappaM2[2] <- 10 #true prior for Model 2
for ( s in 1:Nsubj ) {
mE1[s] ~ dbeta(omegaM1[m]*(kappaM1[m]-2)+1 , (1-omegaM1[m])*(kappaM1[m]-2)+1 )
mE2[s] ~ dbeta(omegaM2[m]*(kappaM2[m]-2)+1 , (1-omegaM2[m])*(kappaM2[m]-2)+1 )
mE[s] <- equals(m,1)*mE1[s] + equals(m,2)*mE2[s]
z[s] ~ dbin( mE[s] , N[s] )
}
}
Here is R code for the relevant data:
dataList = list(
z = c(81, 59, 36, 18, 28, 59, 63, 57, 42, 28, 47, 55, 38,
30, 22, 32, 31, 30, 32, 33, 32, 26, 13, 33, 30),
N = rep(96, 25),
Nsubj = 25
)
When I run this model, the MCMC spends every single iteration at m = 1, and never jumps over to m = 2. I've tried lots of different combinations of priors and pseudo priors, and can't seem to find a combination in which the MCMC will consider m = 2. I've even tried specifying identical priors and pseudo priors for models 1 and 2, but this was no help. In this situation, I would expect the MCMC to jump fairly frequently between models, spending about half the time considering one model, and half the time considering the other. However, JAGS still spent the whole time at m = 1. I've run chains as long as 6000 iterations, which should be more than long enough for a simple model like this.
I would really appreciate if anyone has any thoughts on how to resolve this issue.
Cheers,
Tim
I haven't been able to figure this out, but I thought that anybody else who works on this might appreciate the following code, which will reproduce the problem start-to-finish from R with rjags (must have JAGS installed).
Note that since there are only two competing models in this example, I changed m ~ dcat() to m ~ dbern(), and then replaced m with m+1 everywhere else in the code. I hoped this might ameliorate the behavior, but it did not. Note also that if we specify the initial value for m, it stays stuck at that value regardless of which value we pick, so m just fails to get updated properly (instead of getting weirdly attracted to one model or the other). A head-scratcher for me; could be worth posting for Martyn's eyes at http://sourceforge.net/p/mcmc-jags/discussion/
library(rjags)
load.module('glm')
dataList = list(
z = c(81, 59, 36, 18, 28, 59, 63, 57, 42, 28, 47, 55, 38,
30, 22, 32, 31, 30, 32, 33, 32, 26, 13, 33, 30),
N = rep(96, 25),
Nsubj = 25
)
sink("mymodel.txt")
cat("model {
m ~ dbern(.5)
omegaM1[1] <- 0.5 #true prior
omegaM1[2] <- 0.5 #psuedo prior
kappaM1[1] <- 3 #true prior for Model 1
kappaM1[2] <- 5 #puedo prior for Model 1
omegaM2[1] <- 0.5 #psuedo prior
omegaM2[2] <- 0.5 #true prior
kappaM2[1] <- 5 #puedo prior for Model 2
kappaM2[2] <- 10 #true prior for Model 2
for ( s in 1:Nsubj ) {
mE1[s] ~ dbeta(omegaM1[m+1]*(kappaM1[m+1]-2)+1 , (1-omegaM1[m+1])*(kappaM1[m+1]-2)+1 )
mE2[s] ~ dbeta(omegaM2[m+1]*(kappaM2[m+1]-2)+1 , (1-omegaM2[m+1])*(kappaM2[m+1]-2)+1 )
z[s] ~ dbin( (1-m)*mE1[s] + m*mE2[s] , N[s] )
}
}
", fill=TRUE)
sink()
inits <- function(){list(m=0)}
params <- c("m")
nc <- 1
n.adapt <-100
n.burn <- 200
n.iter <- 5000
thin <- 1
mymodel <- jags.model('mymodel.txt', data = dataList, inits=inits, n.chains=nc, n.adapt=n.adapt)
update(mymodel, n.burn)
mymodel_samples <- coda.samples(mymodel,params,n.iter=n.iter, thin=thin)
summary(mymodel_samples)
The trick is not assigning a fixed probability for the model, but rather estimating it (phi below) based on a uniform prior. You then want the posterior distribution for phi as that tells you the probability of selecting model 2 (ie, a "success" means m=1; Pr(model 1) = 1-phi).
sink("mymodel.txt")
cat("model {
m ~ dbern(phi)
phi ~ dunif(0,1)
omegaM1[1] <- 0.5 #true prior
omegaM1[2] <- 0.5 #psuedo prior
kappaM1[1] <- 3 #true prior for Model 1
kappaM1[2] <- 5 #puedo prior for Model 1
omegaM2[1] <- 0.5 #psuedo prior
omegaM2[2] <- 0.5 #true prior
kappaM2[1] <- 5 #puedo prior for Model 2
kappaM2[2] <- 10 #true prior for Model 2
for ( s in 1:Nsubj ) {
mE1[s] ~ dbeta(omegaM1[m+1]*(kappaM1[m+1]-2)+1 , (1-omegaM1[m+1])*(kappaM1[m+1]-2)+1 )
mE2[s] ~ dbeta(omegaM2[m+1]*(kappaM2[m+1]-2)+1 , (1-omegaM2[m+1])*(kappaM2[m+1]-2)+1 )
z[s] ~ dbin( (1-m)*mE1[s] + m*mE2[s] , N[s] )
}
}
", fill=TRUE)
sink()
inits <- function(){list(m=0)}
params <- c("phi")
See my comment above on Mark S's answer.
This answer is to show by example why we want inference on m and not on phi.
Imagine we have a model given by
data <- c(-1, 0, 1, .5, .1)
m~dbern(phi)
data[i] ~ m*dnorm(0, 1) + (1-m)*dnorm(100, 1)
Now, it is obvious that the true value of m is 1. But what do we know about the true value of phi? Obviously higher values of phi are more likely, but we don't actually have good evidence to rule out lower values of phi. For example, phi=0.1 still has a 10% chance of yielding m=1; and phi=0.5 still has a 50% chance of yielding m=1. So we don't have good evidence against fairly low values of phi, even though we have ironclad evidence that m=1. We want inference on m.