Cannot bracket slice for node, WinBUGS - winbugs

So I have been working on this code for awhile running hazard models for a population and for one of my initial parameters it cannot figure out how to fix it. This is the area of code that is giving me problems. I am coding in Notebook++, running in R, and it opens in WinBUGS to run it. Hz.Scale is the parameter that I am struggling with and that the error comes up with.
# Define model - Model 02 = hazard
if(detect.fun=="hazard"){
sink(paste(top.dir,model.name,sep="/"))
cat("
model{
# Priors
psi ~ dunif(0,1) # Data augmentation parameter
#hz.scale ~ dgamma(0.1,0.1) # Scale parameter for hazard distribution (sigma) #ORIGINAL CODE
hz.scale ~ dgamma(0,10) # Scale parameter for hazard distribution (sigma) #change the parameters to test if it really runs
hz.shape ~ dunif(0.001,1000) # Shape parameter for hazard distribution (b in green & yellow book)
lambda ~ dunif(0,20) # Expectation for cluster size
# Likelihood
## Construct conditional detection probability (log(p[g])) and Pr(x) (pi[g]) for each bin (g)
for(g in 1:nD){
# Hazard model:
cloglog(p[g]) <- hz.scale*log(hz.shape) - hz.scale*log(midpt[g]) # Kery and Royle 2016 page 507
# Probability of x in each interval
pi[g] <- delta[g]/B
}#g
for(i in 1:(nclus+nz)){
z[i] ~ dbern(psi) # Real observation or augmented?
dclass[i] ~ dcat(pi[]) # population distribution of distance class
mu[i] <- z[i] * p[dclass[i]] # p depends on distance class
y[i] ~ dbern(mu[i]) # Observed or not?
clsz[i] ~ dpois(lambda) # Poisson process for cluster size
}#i
# Derived quantities
num.clusters <- sum(z[]) # Number of clusters
back.lambda <- lambda+1 # Back-transformed lambda (minimum is 1)
total.pop <- num.clusters*back.lambda
}#model
",fill=TRUE)#cat
sink()
# Inits function and parameters to save
inits <- function(){list(psi=runif(1),z=y,hz.shape=runif(1,40,200),hz.scale=rgamma(1,0.1,0.1),lambda=runif(1,0,10))}
params <- c("hz.scale","hz.shape","num.clusters","back.lambda","total.pop")
}# hazard
# hazard

Related

updating subset of parameters in dynet

Is there a way to update a subset of parameters in dynet? For instance in the following toy example, first update h1, then h2:
model = ParameterCollection()
h1 = model.add_parameters((hidden_units, dims))
h2 = model.add_parameters((hidden_units, dims))
...
for x in trainset:
...
loss.scalar_value()
loss.backward()
trainer.update(h1)
renew_cg()
for x in trainset:
...
loss.scalar_value()
loss.backward()
trainer.update(h2)
renew_cg()
I know that update_subset interface exists for this and works based on the given parameter indexes. But then it is not documented anywhere how we can get the parameter indexes in dynet Python.
A solution is to use the flag update = False when creating expressions for parameters (including lookup parameters):
import dynet as dy
import numpy as np
model = dy.Model()
pW = model.add_parameters((2, 4))
pb = model.add_parameters(2)
trainer = dy.SimpleSGDTrainer(model)
def step(update_b):
dy.renew_cg()
x = dy.inputTensor(np.ones(4))
W = pW.expr()
# update b?
b = pb.expr(update = update_b)
loss = dy.pickneglogsoftmax(W * x + b, 0)
loss.backward()
trainer.update()
# dy.renew_cg()
print(pb.as_array())
print(pW.as_array())
step(True)
print(pb.as_array()) # b updated
print(pW.as_array())
step(False)
print(pb.as_array()) # b not updated
print(pW.as_array())
For update_subset, I would guess that the indices are the integers suffixed at the end of parameter names (.name()).
In the doc, we are supposed to use a get_index function.
Another option is: dy.nobackprop() which prevents the gradient to propagate beyond a certain node in the graph.
And yet another option is to zero the gradient of the parameter that do not need to be updated (.scale_gradient(0)).
These methods are equivalent to zeroing the gradient before the update. So, the parameter will still be updated if the optimizer uses its momentum from previous training steps (MomentumSGDTrainer, AdamTrainer, ...).

Extending Stargazer to multiwaycov

I'm using stargazer to create regression outputs for my bachelor thesis. Due to the structure of my data I have to use clustered models (code below). I'm using the vcovclust command from the multiwaycov package, which works perfectly. However, stargazer does not support it. Do you know another way to create outputs as nice as stargazer does? Or do you know an other package/command to cluster the models, which is suppported by stargazer?
model1.1.2 <- lm(leaflet ~ partisan + as.factor(gender) + age + as.factor(education) + meaning + as.factor(polintrest), data = voxit)
summary(model1.1.2)
#clustering
vcov_clust1.1.2 <- cluster.vcov(model1.1.2, cbind(voxit$id, voxit$projetx))
coeftest(model1.1.2, vcov_clust1.1.2)
You can supply the adjusted p- and se-values to stargazer manually.
# model1 and model2 are both objects returned from coeftest()
# Capture them in an object and extract the ses (2nd column) and ps (4th column) in a list
ses <- list(model1[,2], model2[,2])
ps <- list(model1[,4], model2[,4])
# you can then run your normal stargazer command and supply
# the se- and p-values manually to the stargazer function
stargazer(model1, model2, type = "text", se = ses, p = ps, p.auto = F)
Hope this helps!

Where in the sequence of a Probabilistic Suffix Tree does "e" occur?

In my data there are only missing data (*) on the right side of the sequences. That means that no sequence starts with * and no sequence has any other markers after *. Despite this the PST (Probabilistic Suffix Tree) seems to predict a 90% chance of starting with a *. Here's my code:
# Load libraries
library(RCurl)
library(TraMineR)
library(PST)
# Get data
x <- getURL("https://gist.githubusercontent.com/aronlindberg/08228977353bf6dc2edb3ec121f54a29/raw/c2539d06771317c5f4c8d3a2052a73fc485a09c6/challenge_level.csv")
data <- read.csv(text = x)
# Load and transform data
data <- read.table("thread_level.csv", sep = ",", header = F, stringsAsFactors = F)
# Create sequence object
data.seq <- seqdef(data[2:nrow(data),2:ncol(data)], missing = NA, right= NA, nr = "*")
# Make a tree
S1 <- pstree(data.seq, ymin = 0.05, L = 6, lik = TRUE, with.missing = TRUE)
# Look at first state
cmine(S1, pmin = 0, state = "N3", l = 1)
This generates:
[>] context: e
EX FA I1 I2 I3 N1 N2 N3 NR
S1 0.006821066 0.01107234 0.01218274 0.01208756 0.006821066 0.002569797 0.003299492 0.001554569 0.0161802
QU TR *
S1 0.01126269 0.006440355 0.9097081
How can the probability for * be 0.9097081 at the very beginning of the sequence, meaning after context e?
Does it mean that the context can appear anywhere inside a sequence, and that e denotes an arbitrary starting point somewhere inside a sequence?
A PST is a representation of a variable length Markov model (VLMC). As a classical Markov model a VLMC is assumed to be homogeneous (or stationary) meaning that the conditional probabilities of the outcome given the context are the same at each position in the sequence. In other words, the context can appear anywhere in the sequence. Actually, the search for contexts is done by exploring the tree that is supposed to apply anywhere in the sequences.
In your example, for l=1 (l is 1 + the length of the context), you look only for 0-length context, i.e., the only possible context is the empty sequence e. Your condition pmin=0, state=N3 (have a probability greater than 0 for N3) is equivalent to no condition at all. So you get the overall probability to observe each state. Because your sequences (with the missing states) are all of the same length, you would get the same results using TraMineR with
seqmeant(data.seq, with.missing=TRUE)/max(seqlength(data.seq))
To get the distribution at the first position, you can use TraMineR and look at the first column of the table of cross-sectional distributions at the successive positions returned by
seqstatd(data.seq, with.missing=TRUE)
Hope this helps.

A simple model in Winbugs but it says "This chain contains uninitialized variables"

I have some simple time to event data, no covariates. I was trying to fit a Weibull distribution to it. So I have the following code. Everything looks good until I load my initials. It says "this chain contains uninitialized variables". But I don't understand. I think Weibull dist only has 2 parameters, and I already specified them all. Could you please advise? Thanks!
model
{
for(i in 1 : N) {
t[i] ~ dweib(r, mu)I(t.cen[i],)
}
mu ~ dexp(0.001)
r ~ dexp(0.001)
}
# Data
list(
t.cen=c(0,3.91,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,21.95,23.98,33.08),
t=c(2.34,NA,5.16,5.63,6.17,6.8,7.03,8.05,8.13,8.36,8.83,10.16,
10.55,10.94,11.48,11.95,13.05,13.59,16.02,20.08,NA,NA,
NA),
N=23
)
# Initial values
list(
r=3,mu=3
)
The other uninitialised variables are the missing (NA) values in the vector of t. Remember that the BUGS language makes no distinction between data and parameters, and that supplying something as data with the value NA is equivalent to not supplying it as data.

Best way to create generic/method consistency for sort.data.frame?

I've finally decided to put the sort.data.frame method that's floating around the internet into an R package. It just gets requested too much to be left to an ad hoc method of distribution.
However, it's written with arguments that make it incompatible with the generic sort function:
sort(x,decreasing,...)
sort.data.frame(form,dat)
If I change sort.data.frame to take decreasing as an argument as in sort.data.frame(form,decreasing,dat) and discard decreasing, then it loses its simplicity because you'll always have to specify dat= and can't really use positional arguments. If I add it to the end as in sort.data.frame(form,dat,decreasing), then the order doesn't match with the generic function. If I hope that decreasing gets caught up in the dots `sort.data.frame(form,dat,...), then when using position-based matching I believe the generic function will assign the second position to decreasing and it will get discarded. What's the best way to harmonize these two functions?
The full function is:
# Sort a data frame
sort.data.frame <- function(form,dat){
# Author: Kevin Wright
# http://tolstoy.newcastle.edu.au/R/help/04/09/4300.html
# Some ideas from Andy Liaw
# http://tolstoy.newcastle.edu.au/R/help/04/07/1076.html
# Use + for ascending, - for decending.
# Sorting is left to right in the formula
# Useage is either of the following:
# sort.data.frame(~Block-Variety,Oats)
# sort.data.frame(Oats,~-Variety+Block)
# If dat is the formula, then switch form and dat
if(inherits(dat,"formula")){
f=dat
dat=form
form=f
}
if(form[[1]] != "~") {
stop("Formula must be one-sided.")
}
# Make the formula into character and remove spaces
formc <- as.character(form[2])
formc <- gsub(" ","",formc)
# If the first character is not + or -, add +
if(!is.element(substring(formc,1,1),c("+","-"))) {
formc <- paste("+",formc,sep="")
}
# Extract the variables from the formula
vars <- unlist(strsplit(formc, "[\\+\\-]"))
vars <- vars[vars!=""] # Remove spurious "" terms
# Build a list of arguments to pass to "order" function
calllist <- list()
pos=1 # Position of + or -
for(i in 1:length(vars)){
varsign <- substring(formc,pos,pos)
pos <- pos+1+nchar(vars[i])
if(is.factor(dat[,vars[i]])){
if(varsign=="-")
calllist[[i]] <- -rank(dat[,vars[i]])
else
calllist[[i]] <- rank(dat[,vars[i]])
}
else {
if(varsign=="-")
calllist[[i]] <- -dat[,vars[i]]
else
calllist[[i]] <- dat[,vars[i]]
}
}
dat[do.call("order",calllist),]
}
Example:
library(datasets)
sort.data.frame(~len+dose,ToothGrowth)
Use the arrange function in plyr. It allows you to individually pick which variables should be in ascending and descending order:
arrange(ToothGrowth, len, dose)
arrange(ToothGrowth, desc(len), dose)
arrange(ToothGrowth, len, desc(dose))
arrange(ToothGrowth, desc(len), desc(dose))
It also has an elegant implementation:
arrange <- function (df, ...) {
ord <- eval(substitute(order(...)), df, parent.frame())
unrowname(df[ord, ])
}
And desc is just an ordinary function:
desc <- function (x) -xtfrm(x)
Reading the help for xtfrm is highly recommended if you're writing this sort of function.
There are a few problems there. sort.data.frame needs to have the same arguments as the generic, so at a minimum it needs to be
sort.data.frame(x, decreasing = FALSE, ...) {
....
}
To have dispatch work, the first argument needs to be the object dispatched on. So I would start with:
sort.data.frame(x, decreasing = FALSE, formula = ~ ., ...) {
....
}
where x is your dat, formula is your form, and we provide a default for formula to include everything. (I haven't studied your code in detail to see exactly what form represents.)
Of course, you don't need to specify decreasing in the call, so:
sort(ToothGrowth, formula = ~ len + dose)
would be how to call the function using the above specifications.
Otherwise, if you don't want sort.data.frame to be an S3 generic, call it something else and then you are free to have whatever arguments you want.
I agree with #Gavin that x must come first. I'd put the decreasing parameter after the formula though - since it probably isn't used that much, and hardly ever as a positional argument.
The formula argument would be used much more and therefore should be the second argument. I also strongly agree with #Gavin that it should be called formula, and not form.
sort.data.frame(x, formula = ~ ., decreasing = FALSE, ...) {
...
}
You might want to extend the decreasing argument to allow a logical vector where each TRUE/FALSE value corresponds to one column in the formula:
d <- data.frame(A=1:10, B=10:1)
sort(d, ~ A+B, decreasing=c(A=TRUE, B=FALSE)) # sort by decreasing A, increasing B