partial Distance Based RDA - Centroids vanished from Plot - vegan

I am trying to fir a partial db-RDA with field.ID to correct for the repeated measurements character of the samples. However including Condition(field.ID) leads to Disappearance of the centroids of the main factor of interest from the plot (left plot below).
The Design: 12 fields have been sampled for species data in two consecutive years, repeatedly. Additionally every year 3 samples from reference fields have been sampled. These three fields have been changed in the second year, due to unavailability of the former fields.
Additionally some environmental variables have been sampled (Nitrogen, Soil moisture, Temperature). Every field has an identifier (field.ID).
Using field.ID as Condition seem to erroneously remove the F1 factor. However using Sampling campaign (SC) as Condition does not. Is the latter the rigth way to correct for repeated measurments in partial db-RDA??
set.seed(1234)
df.exp <- data.frame(field.ID = factor(c(1:12,13,14,15,1:12,16,17,18)),
SC = factor(rep(c(1,2), each=15)),
F1 = factor(rep(rep(c("A","B","C","D","E"),each=3),2)),
Nitrogen = rnorm(30,mean=0.16, sd=0.07),
Temp = rnorm(30,mean=13.5, sd=3.9),
Moist = rnorm(30,mean=19.4, sd=5.8))
df.rsp <- data.frame(Spec1 = rpois(30, 5),
Spec2 = rpois(30,1),
Spec3 = rpois(30,4.5),
Spec4 = rpois(30,3),
Spec5 = rpois(30,7),
Spec6 = rpois(30,7),
Spec7 = rpois(30,5))
data=cbind(df.exp, df.rsp)
dbRDA <- capscale(df.rsp ~ F1 + Nitrogen + Temp + Moist + Condition(SC), df.exp); ordiplot(dbRDA)
dbRDA <- capscale(df.rsp ~ F1 + Nitrogen + Temp + Moist + Condition(field.ID), df.exp); ordiplot(dbRDA)

You partial out variation due to ID and then you try to explain variable aliased to this ID, but it was already partialled out. The key line in the printed output was this:
Some constraints were aliased because they were collinear (redundant)
And indeed, when you ask for details, you get
> alias(dbRDA, names=TRUE)
[1] "F1B" "F1C" "F1D" "F1E"
The F1? variables were constant within ID which already was partialled out, and nothing was left to explain.

Related

gtsummary::tbl_regression() - Obtain Random Effects from GLMM Zero-Inflated Model

When trying to create a table with the conditional random effects in r using the gtsummary function tbl_regression from a glmmTMB mixed effects negative-binomial zero-inflated model, I get duplicate random effects rows.
Example (using Mollie Brooks' Zero-Inflated GLMMs on Salamanders Dataset):
data(Salamanders)
head(Salamanders)
library(glmmTMB)
zinbm2 = glmmTMB(count~spp + mined +(1|site), zi=~spp + mined + (1|site), Salamanders, family=nbinom2)
zinbm2_table_cond <- tbl_regression(
zinbm2,
tidy_fun = function(...) broom.mixed::tidy(..., component = "cond"),
exponentiate = TRUE,
estimate_fun = purrr::partial(style_ratio, digits = 3),
pvalue_fun = purrr::partial(style_sigfig, digits = 3))
zinbm2_table_cond
Output:
Random Effects Output (cond)
When extracting the random effects from de zero-inflated part of the model I get the same problem.
Example:
zinbm2_table_zi <- tbl_regression(
zinbm2,
tidy_fun = function(...) broom.mixed::tidy(..., component = "zi"),
exponentiate = TRUE,
estimate_fun = purrr::partial(style_ratio, digits = 3),
pvalue_fun = purrr::partial(style_sigfig, digits = 3))
zinbm2_table_zi
Output:
Random Effects Output (zi)
The problem persists if I specify the effects argument in broom.mixed.
tidy_fun = function(...) broom.mixed::tidy(..., effects = "ran_pars", component = "cond"),
Looking at confidence intervals in both outputs it seems that somehow it is extracting random effects from both parts of the model and changing the estimate of the zero-inflated random effects (in 1st image; opposite in the 2nd image) to match the conditional part estimate while keeping the CI.
I am not knowledgeable enough to understand why this is happening. Since both rows have the same label I am having difficulty removing the wrong one.
Any tips on how to avoid this problem or a workaround to remove the undesired rows?
If you need more info, let me know.
Thank you in advance.
PS: Output images were changed to link due to insufficient reputation.

Issues with ordiplot3d NMDS in 3dvegan package

I am looking for some help here with this 3d NMDS code. I have 3 issues.
The layout of the plot moves significantly each time I execute the code.
The sites and species are sometimes far off of the plot.
The species text is often overlapping. How can I fix this?
I am unsure how to change the plotting environment to ggplot, so that might be out of the question.
library(vegan)
library(vegan3d)
library(tidyverse)
data("dune")
SiteID <- 1:20
NMDS = metaMDS(dune,distance="bray", try=500, wascores = TRUE, k=3)
NMDS1 = NMDS$points[,1]
NMDS2 = NMDS$points[,2]
NMDS3 = NMDS$points[,3]
NMDS = data.frame(NMDS1 = NMDS1, NMDS2 = NMDS2, NMDS3 = NMDS3, SiteID=SiteID)
NMDS_input <- metaMDS(dune,distance="bray",try=500,k=3,wascores = T)
pl4 <- with(NMDS, ordiplot3d(NMDS_input, pch=16, angle=50, main="Fish ion level 3", cex.lab=1.7,cex.symbols=1.5, tick.marks=FALSE))
sp <- scores(NMDS_input, choices=1:3, display="species", scaling="symmetric")
si <- scores(NMDS_input, choices=1:3, display="sites", scaling="symmetric")
text(pl4$xyz.convert(sp), rownames(sp), cex=0.7, xpd=TRUE)
sii <- as.data.frame(cbind(NMDS$SiteID,si))
with(NMDS, orditorp(pl4, labels = sii$V1, air=1, cex = 1))
labels must be character variables in orditorp. We always assumed so, but this was not checked in vegan::orditorp. Latest vegan version in github will take care of this and will also work with numeric labels.
ordiplot3d returns projected coordinates (in 2D) and if you want to plot those, you can just use the pl4 object that you saved and you do not need to use pl4$xyz.convert. This object will also be accepted in orditorp.
If you want to plot points that were not used in the original mock-3D plot, you must use pl4$xyz.convert for their 2D projection. This function will return the projected coordinates in a form that is directly accepted by standard R functions text, points (and some others), but they will not be accepted by orditorp (and I won't change this). You must make these into two-column matrix-like object; data.frame() will work.
Your example code contains a lot of un-needed code. The following is an edit with only necessary lines and fixes that make this example work with current vegan release.
library(vegan)
library(vegan3d)
data(dune)
SiteID <- as.character(1:20) # must be character
NMDS_input <- metaMDS(dune,distance="bray",try=500,k=3,wascores = T)
pl4 <- ordiplot3d(NMDS_input, pch=16, angle=50, main="Fish ion level 3", cex.lab=1.7,cex.symbols=1.5, tick.marks=FALSE) # no with(NMDS,...)
sp <- scores(NMDS_input, choices=1:3, display="species") # no arg scaling in scores.metaMDS
text(pl4$xyz.convert(sp), rownames(sp), cex=0.7, xpd=TRUE)
orditorp(pl4, labels = SiteID, air=1, cex = 1) # character labels w/points in the same location

Display a parameter on icon level from calculation prior to simulation

I want to have a parameter lets say n = 1 that is not displayed in the UI, but another parameter n_add = n + 1 is displayed at the icon of the model.
parameter Integer n = 1 "not to be displayed";
parameter Integer n_add = n + 1 "Displayed on the model";
On the icon level I write as text " %n_add " the result is not the calculation of n + 1 = "2", but rather the calculation to be done (literally "n+1"). The parameter n_add should be visible prior to simulation/initialization during the parametrization of the model.
Is this even possible?
Seems to be very similar to this: Displaying parameter in annotation in DYMOLA but this question is actually more compact to read, therefore the code that should solve your problem:
model showN1
parameter Integer n = 1 "not to be displayed";
final parameter Integer n_add = n + 1 "Displayed on the model";
annotation (Icon(graphics={Text(
extent={{-100,-20},{100,20}},
lineColor={0,0,0},
textString="n_add = " + DynamicSelect("?", String(n_add)))}));
end showN1;
Prior to simulation is possible for values which are known prior to the simulation (e.g. parameters). DynamicSelect can also show values that change during simulation which have to be computed first. These are then read from the result file which is only available after the simulation has started.

How do I merge or combine error rates?

Let's say I have a dataset that has 9 continuous columns of data and 4 columns of categorical data. In Matlab, I separate the columns into two groups and do training/testing (naïve bayes) on them separately and determine that the continuous columns have an error rate of 0.45 and the categorical columns have an error 0.33. My question is - how do I determine the combined error?
EDIT - Simple pseudocode overview added:
for x = 1:num_iterations
Mdl_NB1 = fitcnb(TrainingSet_Con,TrainingTargets,'Distribution','normal');
Mdl_NB2 = fitcnb(TrainingSet_Dis,TrainingTargets,'Distribution','mn');
[NB1_label,NB1_Posterior,NB1_Cost] = predict(Mdl_NB1,TestPoint_Con);
[NB2_label,NB2_Posterior,NB2_Cost] = predict(Mdl_NB2,TestPoint_Dis);
NB1_cumulLoss = NB1_cumulLoss + resubLoss(Mdl_NB1);
NB2_cumulLoss = NB2_cumulLoss + resubLoss(Mdl_NB2);
end
NB1_avg_score = NB1_cumulLoss/num_iterations
NB2_avg_score = NB2_cumulLoss/num_iterations
total_avg_score = ???
The three obvious choices, in principle, are:
(A+B) / 2
A * B
(A*(CountA/TotalCount)) + (B*(CountB/TotalCount))
But not sure if any of these are right, in this case.
This does not make sense; you are effectively building two separate models. So either build one model with all columns (maybe with 'Distribution','mvmn') or combine both models into one with something like
Mdl_Ens = fitcnb([NB1_Posterior; NB2_Posterior],TrainingTargets,'Distribution','normal');
NEns_cumulLoss = NEns_cumulLoss + resubLoss(Mdl_Ens);
to actually build a single model out of the output of the two models based on a subset of the columns each.

Consolidating a data table in Scala

I am working on a small data analysis tool, and practicing/learning Scala in the process. However I got stuck at a small problem.
Assume data of type:
X Gr1 x_11 ... x_1n
X Gr2 x_21 ... x_2n
..
X GrK x_k1 ... x_kn
Y Gr1 y_11 ... y_1n
Y Gr3 y_31 ... y_3n
..
Y Gr(K-1) ...
Here I have entries (X,Y...) that may or may not exist in up to K groups, with a series of values for each group. What I want to do is pretty simple (in theory), I would like to consolidate the rows that belong to the same "entity" in different groups. so instead of multiple lines that start with X, I want to have one row with all values from x_11 to x_kn in columns.
What makes things complicated however is that not all entities exist in all groups. So wherever there's "missing data" I would like to pad with for instance zeroes, or some string that denotes a missing value. So if I have (X,Y,Z) in up to 3 groups, the type I table I want to have is as follows:
X x_11 x_12 x_21 x_22 x_31 x_32
Y y_11 y_12 N/A N/A y_31 y_32
Z N/A N/A z_21 z_22 N/A N/A
I have been stuck trying to figure this out, is there a smart way to use List functions to solve this?
I wrote this simple loop:
for {
(id, hitlist) <- hits.groupBy(_.acc)
h <- hitlist
} println(id + "\t" + h.sampleId + "\t" + h.ratios.mkString("\t"))
to able to generate the tables that look like the example above. Note that, my original data is of a different format and layout,but that has little to do with the problem at hand, thus I have skipped all steps regarding parsing. I should be able to use groupBy in a better way that actually solves this for me, but I can't seem to get there.
Then I modified my loop mapping the hits to ratios and appending them to one another:
for ((id, hitlist) <- hits.groupBy(_.acc)){
val l = hitlist.map(_.ratios).foldRight(List[Double]()){
(l1: List[Double], l2: List[Double]) => l1 ::: l2
}
println(id + "\t" + l.mkString("\t"))
//println(id + "\t" + h.sampleId + "\t" + h.ratios.mkString("\t"))
}
That gets me one step closer but still no cigar! Instead of a fully padded "matrix" I get a jagged table. Taking the example above:
X x_11 x_12 x_21 x_22 x_31 x_32
Y y_11 y_12 y_31 y_32
Z z_21 z_22
Any ideas as to how I can pad the table so that values from respective groups are aligned with one another? I should be able to use _.sampleId, which holds the "group membersip" for each "hit", but I am not sure how exactly. ´hits´ is a List of type Hit which is practically a wrapper for each row, giving convenience methods for getting individual values, so essentially a tuple which have "named indices" (such as .acc, .sampleId..)
(I would like to solve this problem without hardcoding the number of groups, as it might change from case to case)
Thanks!
This is a bit of a contrived example, but I think you can see where this is going:
case class Hit(acc:String, subAcc:String, value:Int)
val hits = List(Hit("X", "x_11", 1), Hit("X", "x_21", 2), Hit("X", "x_31", 3))
val kMax = 4
val nMax = 2
for {
(id, hitlist) <- hits.groupBy(_.acc)
k <- 1 to kMax
n <- 1 to nMax
} yield {
val subId = "x_%s%s".format(k, n)
val row = hitlist.find(h => h.subAcc == subId).getOrElse(Hit(id, subId, 0))
println(row)
}
//Prints
Hit(X,x_11,1)
Hit(X,x_12,0)
Hit(X,x_21,2)
Hit(X,x_22,0)
Hit(X,x_31,3)
Hit(X,x_32,0)
Hit(X,x_41,0)
Hit(X,x_42,0)
If you provide more information on your hits lists then we could probably come with something a little more accurate.
I have managed to solve this problem with the following code, I am putting it here as an answer in case someone else runs into a similar problem and requires some help. The use of find() from Noah's answer was definitely very useful, so do give him a +1 in case this code snippet helps you out.
val samples = hits.groupBy(_.sampleId).keys.toList.sorted
for ((id, hitlist) <- hits.groupBy(_.acc)) {
val ratios =
for (sample <- samples)
yield hitlist.find(h => h.sampleId == sample).map(_.ratios)
.getOrElse(List(Double.NaN, Double.NaN, Double.NaN, Double.NaN, Double.NaN, Double.NaN))
println(id + "\t" + ratios.flatten.mkString("\t"))
}
I figure it's not a very elegant or efficient solution, as I have two calls to groupBy and I would be interested to see better solutions to this problem.