Dealing with confounded effects in interactions, in R - linear-regression

I am doing a regression analysis of hunting and wind power turbine data.
My response variable is hunting harvest density, and a set of wind turbine variables: height, number of turbines, distance to turbines and establishment phase (pre-construction, construction and operative.
examplemodel <- lme(harvest ~ WP_distance + WP_height + phase + phase:WP_distance + phase:WP_turbines, weight = ~ I(1/Area), data = mydata, random =~ 1 | randomvar, method="ML")
In pre-construction phase, height, turbines and distance is set as zero (not built yet).
When conducting linear mixed effects, i try to include the turbine variables and interactions phase:distance, phase:height, phase:turbines. However, when doing this i get error:
Error in MEEM(object, conLin, control$niterEM) :
Singularity in backsolve at level 0, block 1
I figure it's because the variables phase and the wp-variables get confounded in this way, but how could i possibly deal with it otherwise? I want to display that the height, number of turbines and distance to the wind power parks depend on the construction phase.
Thank you!

Related

why the results from the joint_tests function (emmeans package) do not show one of the interactions of the model?

I run a GLMM_adaptive model (I am doing a resource selection function) and I am using the joint_tests function (emmeans package) to compute joint tests of the terms in the model. The problem is that one of the interactions does not appear in the results.
The model is:
mod.hinc <- mixed_model(fixed = Used ~ scale(ndvi) * season * vegfactor +
scale(ndvi^2) + scale(distance^2) + scale(distance) * season,
random = ~ 1 | id, data = hin.c,
family = binomial(link="logit"))
After running the model I run the joint_tests function:
install.packages("emmeans")
library(emmeans)
joint_tests(mod.hinc)
And this is the result:
joint_tests(mod.hinc)
model term df1 df2 F.ratio p.value
ndvi 1 Inf 36.465 <.0001
season 3 Inf 22.265 <.0001
vegfactor 4 Inf 4.548 0.0011
distance 1 Inf 33.939 <.0001
ndvi:season 3 Inf 13.826 <.0001
ndvi:vegfactor 4 Inf 8.500 <.0001
season:vegfactor 12 Inf 6.544 <.0001
ndvi:season:vegfactor 12 Inf 5.165 <.0001
I cannot find the reason why the interaction scale(distance)*season does not appear in the results.
Any help on that issue is welcome. I can provide more details about the model if is required.
Thank you very much in advance.
Juan
The short answer is that distance:season is not shown because it came up with zero d.f. for the associated interaction contrasts. You could verify this by running joint_tests(mod.hinc, show0df = TRUE).
Why it has 0 d.f. is less clear. However, that is not the only problem here. You have to be extremely careful with numeric predictors when using joint_tests(); it does not do a model ANOVA; instead, as documented, it constructs a reference grid from the fitted model and performs joint tests of interaction contrasts related to the predictors. With numeric predictors, the results depend on the reference grid used.
In this particular instance, the model includes quadratic effects of ndvi and distance; however, the default reference grid is constructed using the range of the covariates -- only two distinct values. Thus, we can pick up the effects of the overall linear trends, but not the curvature effects implied by the quadratic terms. That's why only 1 d.f. of those factors' main effects are tested. There are really 2 d.f. in the effects of ndvi and distance. In order to capture all of those effects, we need to have at least three distinct values of these covariates in the reference grid. One way (not the only way) to accomplish that is to reduce the covariates to their means, plus or minus 1 SD -- which can be accomplished via this code:
meanpm1sd <- function(x)
c(mean(x) - sd(x), mean(x), mean(x) + sd(x))
joint_tests(mod.hinc, cov.reduce = meanpm1sd)
This will yield a different set of joint tests that likely will include 2-d.f. tests of ndvi and distance. But I don't know if you will still have some interactions missing due to zero-d.f. dimensionalities.
You can look directly at the estimates being tested in detail if you have any questions about what those effects are. For example, for season:distance,
### construct the needed reference grid once and for all
RG <- ref_grid(mod.h1nc, cov.reduce = meanpm1sd)
EMM <- emmeans(RG, ~ season * distance)
CON <- contrast(EMM, interaction = "consec")
EMM ### see estimates
CON ### see interaction contrasts
test(CON, joint = TRUE)
I hope this helps shed some light on what is going on.

Dynamic force transmission simulation

I have been working on this all day but I haven't figured it out yet. So I thought I may as well ask on here and see if someone can help.
The problem is as follow:
----------
F(input)(t) --> | | --> F(output)(t)
----------
Given a sample with a known length, density, and spring constant (or young's modulus), find the 'output' force against time when a known variable force is applied at the 'input'.
My current solution can already discretise the sample into finite elements, however I am struggling to figure out how the force should transmit given that the change in transmission speed in the material changes itself with respect to the force (using equation c = sqrt(force*area/density)).
If someone could point me to a solution or any other helpful resources, it would be highly appreciated.
A method for applying damping to the system would also be helpful but I should be able to figure out that part myself. (losses to the environment via sound or internal heating)
I will remodel the porbem in the following way:
___ ___
F_input(t) --> |___|--/\/\/\/\/\/\/\/\--|___|
At time t=0 the system is in equilibrium, the distance between the two objects is L, the mass of the left one (object 1) is m1 and the mass of the right one (object 2) is m2.
___ ___
F_input(t) --> |<-x->|___|--/\/\/\/\/\/-|<-y->|___|
During the application of the force F_input(t), at time t > 0, denote by x the oriented distance of the position of object 1 from its original position at time t=0. Similarly, at time t > 0, denote by y the oriented distance of the position of object 2 from its original position at time t=0 (see the diagram above). Then the system is subject to the following system of ordinary differential equations:
x'' = -(k/m1) * x + (k/m2) * y + F_input(t)/m2
y'' = (k/m2) * x - (k/m2) * y
When you solve it, you get the change of x and y with time, i.e. you get two functions x = x(t), y = y(t). Then, the output force is
F_output(t) = m2 * y''(t)
The problem isn't well defined at all. For starters for F_out to exist, there must be some constraint it must obey. Otherwise, the system will have more unknowns than equations.
The discretization will lead you to a system like
M*xpp = -K*x + F
with m=ρ*A*Δx and k=E*A/Δx
But to solve this system with n equations, you either need to know F_in and F_out, or prescribe the motion of one of the nodes, like x_n = 0, which would lead to xpp_n = 0
As far as damping, usually, you employ proportional damping, with a damping matrix proportional to the stiffness matrix D = α*K multiplied by the vector of speeds.

Why is the confidence interval not consistent with the standard errors in this regression?

I am running a linear regression with fixed effect and standard errors clustered by a certain group.
areg ref1 ref1_l1 rf1 ew1 vol_ew1 sk_ew1, a(us_id) vce(cluster us_id)
The one line code is as above and the output is as follows:
Now, the t-stats and the P values look inconsistent. How can we have t-stat >5 and pval >11%?. Similarly the 95% confidence intervals appear to be way wider than Coeff. +- 2 Std. Err.
What am I missing?
There is nothing inconsistent here. You have a small sample size and a less than parsimonious model and have all but run out of degrees of freedom. Notice how areg won't post an F statistic or a P-value for the model, a strong danger sign. Your t statistics are consistent with checks by hand:
. display 2 * ttail(1, 5.54)
.11368912
. display 2 * ttail(1, 113.1)
.00562868
In short, there is no bug here and no programming issue. It's just a matter of your model over-fitting your data and the side-effects of that.
Similarly, +/- 2 SE for a 95% confidence interval is way off as a rule of thumb here. Again, a hand calculation is instructive:
. display invt(1, 0.975)
12.706205
. display invt(60, 0.975)
2.0002978
. display invt(61, 0.975)
1.9996236
. display invnormal(0.975)
1.959964

Assessing performance of a zero inflated negative binomial model

I am modelling the diffusion of movies through a contact network (based on telephone data) using a zero inflated negative binomial model (package: pscl)
m1 <- zeroinfl(LENGTH_OF_DIFF ~ ., data = trainData, type = "negbin")
(variables described below.)
The next step is to evaluate the performance of the model.
My attempt has been to do multiple out-of-sample predictions and calculate the MSE.
Using
predict(m1, newdata = testData)
I received a prediction for the mean length of a diffusion chain for each datapoint, and using
predict(m1, newdata = testData, type = "prob")
I received a matrix containing the probability of each datapoint being a certain length.
Problem with the evaluation: Since I have a 0 (and 1) inflated dataset, the model would be correct most of the time if it predicted 0 for all the values. The predictions I receive are good for chains of length zero (according to the MSE), but the deviation between the predicted and the true value for chains of length 1 or larger is substantial.
My question is:
How can we assess how well our model predicts chains of non-zero length?
Is this approach the correct way to make predictions from a zero inflated negative binomial model?
If yes: how do I interpret these results?
If no: what alternative can I use?
My variables are:
Dependent variable:
length of the diffusion chain (count [0,36])
Independent variables:
movie characteristics (both dummies and continuous variables).
Thanks!
It is straightforward to evaluate RMSPE (root mean square predictive error), but is probably best to transform your counts beforehand, to ensure that the really big counts do not dominate this sum.
You may find false negative and false positive error rates (FNR and FPR) to be useful here. FNR is the chance that a chain of actual non-zero length is predicted to have zero length (i.e. absence, also known as negative). FPR is the chance that a chain of actual zero length is falsely predicted to have non-zero (i.e. positive) length. I suggest doing a Google on these terms to find a paper in your favourite quantitative journals or a chapter in a book that helps explain these simply. For ecologists I tend to go back to Fielding & Bell (1997, Environmental Conservation).
First, let's define a repeatable example, that anyone can use (not sure where your trainData comes from). This is from help on zeroinfl function in the pscl library:
# an example from help on zeroinfl function in pscl library
library(pscl)
fm_zinb2 <- zeroinfl(art ~ . | ., data = bioChemists, dist = "negbin")
There are several packages in R that calculate these. But here's the by hand approach. First calculate observed and predicted values.
# store observed values, and determine how many are nonzero
obs <- bioChemists$art
obs.nonzero <- obs > 0
table(obs)
table(obs.nonzero)
# calculate predicted counts, and check their distribution
preds.count <- predict(fm_zinb2, type="response")
plot(density(preds.count))
# also the predicted probability that each item is nonzero
preds <- 1-predict(fm_zinb2, type = "prob")[,1]
preds.nonzero <- preds > 0.5
plot(density(preds))
table(preds.nonzero)
Then get the confusion matrix (basis of FNR, FPR)
# the confusion matrix is obtained by tabulating the dichotomized observations and predictions
confusion.matrix <- table(preds.nonzero, obs.nonzero)
FNR <- confusion.matrix[2,1] / sum(confusion.matrix[,1])
FNR
In terms of calibration we can do it visually or via calibration
# let's look at how well the counts are being predicted
library(ggplot2)
output <- as.data.frame(list(preds.count=preds.count, obs=obs))
ggplot(aes(x=obs, y=preds.count), data=output) + geom_point(alpha=0.3) + geom_smooth(col="aqua")
Transforming the counts to "see" what is going on:
output$log.obs <- log(output$obs)
output$log.preds.count <- log(output$preds.count)
ggplot(aes(x=log.obs, y=log.preds.count), data=output[!is.na(output$log.obs) & !is.na(output$log.preds.count),]) + geom_jitter(alpha=0.3, width=.15, size=2) + geom_smooth(col="blue") + labs(x="Observed count (non-zero, natural logarithm)", y="Predicted count (non-zero, natural logarithm)")
In your case you could also evaluate the correlations, between the predicted counts and the actual counts, either including or excluding the zeros.
So you could fit a regression as a kind of calibration to evaluate this!
However, since the predictions are not necessarily counts, we can't use a poisson
regression, so instead we can use a lognormal, by regressing the log
prediction against the log observed, assuming a Normal response.
calibrate <- lm(log(preds.count) ~ log(obs), data=output[output$obs!=0 & output$preds.count!=0,])
summary(calibrate)
sigma <- summary(calibrate)$sigma
sigma
There are more fancy ways of assessing calibration I suppose, as in any modelling exercise ... but this is a start.
For a more advanced assessment of zero-inflated models, check out the ways in which the log likelihood can be used, in the references provided for the zeroinfl function. This requires a bit of finesse.

Dijkstra's algorithm with negative weights

Can we use Dijkstra's algorithm with negative weights?
STOP! Before you think "lol nub you can just endlessly hop between two points and get an infinitely cheap path", I'm more thinking of one-way paths.
An application for this would be a mountainous terrain with points on it. Obviously going from high to low doesn't take energy, in fact, it generates energy (thus a negative path weight)! But going back again just wouldn't work that way, unless you are Chuck Norris.
I was thinking of incrementing the weight of all points until they are non-negative, but I'm not sure whether that will work.
As long as the graph does not contain a negative cycle (a directed cycle whose edge weights have a negative sum), it will have a shortest path between any two points, but Dijkstra's algorithm is not designed to find them. The best-known algorithm for finding single-source shortest paths in a directed graph with negative edge weights is the Bellman-Ford algorithm. This comes at a cost, however: Bellman-Ford requires O(|V|·|E|) time, while Dijkstra's requires O(|E| + |V|log|V|) time, which is asymptotically faster for both sparse graphs (where E is O(|V|)) and dense graphs (where E is O(|V|^2)).
In your example of a mountainous terrain (necessarily a directed graph, since going up and down an incline have different weights) there is no possibility of a negative cycle, since this would imply leaving a point and then returning to it with a net energy gain - which could be used to create a perpetual motion machine.
Increasing all the weights by a constant value so that they are non-negative will not work. To see this, consider the graph where there are two paths from A to B, one traversing a single edge of length 2, and one traversing edges of length 1, 1, and -2. The second path is shorter, but if you increase all edge weights by 2, the first path now has length 4, and the second path has length 6, reversing the shortest paths. This tactic will only work if all possible paths between the two points use the same number of edges.
If you read the proof of optimality, one of the assumptions made is that all the weights are non-negative. So, no. As Bart recommends, use Bellman-Ford if there are no negative cycles in your graph.
You have to understand that a negative edge isn't just a negative number --- it implies a reduction in the cost of the path. If you add a negative edge to your path, you have reduced the cost of the path --- if you increment the weights so that this edge is now non-negative, it does not have that reducing property anymore and thus this is a different graph.
I encourage you to read the proof of optimality --- there you will see that the assumption that adding an edge to an existing path can only increase (or not affect) the cost of the path is critical.
You can use Dijkstra's on a negative weighted graph but you first have to find the proper offset for each Vertex. That is essentially what Johnson's algorithm does. But that would be overkill since Johnson's uses Bellman-Ford to find the weight offset(s). Johnson's is designed to all shortest paths between pairs of Vertices.
http://en.wikipedia.org/wiki/Johnson%27s_algorithm
There is actually an algorithm which uses Dijkstra's algorithm in a negative path environment; it does so by removing all the negative edges and rebalancing the graph first. This algorithm is called 'Johnson's Algorithm'.
The way it works is by adding a new node (lets say Q) which has 0 cost to traverse to every other node in the graph. It then runs Bellman-Ford on the graph from point Q, getting a cost for each node with respect to Q which we will call q[x], which will either be 0 or a negative number (as it used one of the negative paths).
E.g. a -> -3 -> b, therefore if we add a node Q which has 0 cost to all of these nodes, then q[a] = 0, q[b] = -3.
We then rebalance out the edges using the formula: weight + q[source] - q[destination], so the new weight of a->b is -3 + 0 - (-3) = 0. We do this for all other edges in the graph, then remove Q and its outgoing edges and voila! We now have a rebalanced graph with no negative edges to which we can run dijkstra's on!
The running time is O(nm) [bellman-ford] + n x O(m log n) [n Dijkstra's] + O(n^2) [weight computation] = O (nm log n) time
More info: http://joonki-jeong.blogspot.co.uk/2013/01/johnsons-algorithm.html
Actually I think it'll work to modify the edge weights. Not with an offset but with a factor. Assume instead of measuring the distance you are measuring the time required from point A to B.
weight = time = distance / velocity
You could even adapt velocity depending on the slope to use the physical one if your task is for real mountains and car/bike.
Yes, you could do that with adding one step at the end i.e.
If v ∈ Q, Then Decrease-Key(Q, v, v.d)
Else Insert(Q, v) and S = S \ {v}.
An expression tree is a binary tree in which all leaves are operands (constants or variables), and the non-leaf nodes are binary operators (+, -, /, *, ^). Implement this tree to model polynomials with the basic methods of the tree including the following:
A function that calculates the first derivative of a polynomial.
Evaluate a polynomial for a given value of x.
[20] Use the following rules for the derivative: Derivative(constant) = 0 Derivative(x) = 1 Derivative(P(x) + Q(y)) = Derivative(P(x)) + Derivative(Q(y)) Derivative(P(x) - Q(y)) = Derivative(P(x)) - Derivative(Q(y)) Derivative(P(x) * Q(y)) = P(x)*Derivative(Q(y)) + Q(x)*Derivative(P(x)) Derivative(P(x) / Q(y)) = P(x)*Derivative(Q(y)) - Q(x)*Derivative(P(x)) Derivative(P(x) ^ Q(y)) = Q(y) * (P(x) ^(Q(y) - 1)) * Derivative(Q(y))