Since installing the newest version of R, all my gtsummary table values less than -1 have been outputting to 1.00. Does anyone have insight on how to fix this very weird issue?
Here is example code:
library(tidyverse)
library(gtsummary)
library(haven)
library(mice)
library(googlesheets4)
data <- read_sheet("https://docs.google.com/spreadsheets/d/1yyw-0xseZSLjD4jc8sw7IksN-S0M3vcKWHy4ksMPL4c/edit?usp=sharing")
datami <- mice(data, m = 23, seed=10)
datareg <- with(datami,
lm(SUD ~ NUM + MIND +
AGE + SEX + CRAVE) )
table <- tbl_regression(datareg,
estimate_fun = purrr::partial(style_ratio, digits = 2),
pvalue_fun = ~style_pvalue(.x, digits = 2),
add_estimate_to_reference_rows = TRUE
) %>% modify_header(label="**Predictor**",estimate="**Unstandardized Coefficient**") %>%
modify_footnote(update = c(p.value, ci, estimate) ~ "Reference group")%>%
modify_caption("Table: Multiple Imputation Predicting Variable")
table
Have re-installed R & gtsummary multiple times to no avail.
You're using style_ratio() to round/format the estimates: this is meant to round odds ratios, risk ratios, etc., which are all positive numbers. Update this to use style_number().
I should update the ratio function to have better behavior when negative values are passed.
Related
When trying to create a table with the conditional random effects in r using the gtsummary function tbl_regression from a glmmTMB mixed effects negative-binomial zero-inflated model, I get duplicate random effects rows.
Example (using Mollie Brooks' Zero-Inflated GLMMs on Salamanders Dataset):
data(Salamanders)
head(Salamanders)
library(glmmTMB)
zinbm2 = glmmTMB(count~spp + mined +(1|site), zi=~spp + mined + (1|site), Salamanders, family=nbinom2)
zinbm2_table_cond <- tbl_regression(
zinbm2,
tidy_fun = function(...) broom.mixed::tidy(..., component = "cond"),
exponentiate = TRUE,
estimate_fun = purrr::partial(style_ratio, digits = 3),
pvalue_fun = purrr::partial(style_sigfig, digits = 3))
zinbm2_table_cond
Output:
Random Effects Output (cond)
When extracting the random effects from de zero-inflated part of the model I get the same problem.
Example:
zinbm2_table_zi <- tbl_regression(
zinbm2,
tidy_fun = function(...) broom.mixed::tidy(..., component = "zi"),
exponentiate = TRUE,
estimate_fun = purrr::partial(style_ratio, digits = 3),
pvalue_fun = purrr::partial(style_sigfig, digits = 3))
zinbm2_table_zi
Output:
Random Effects Output (zi)
The problem persists if I specify the effects argument in broom.mixed.
tidy_fun = function(...) broom.mixed::tidy(..., effects = "ran_pars", component = "cond"),
Looking at confidence intervals in both outputs it seems that somehow it is extracting random effects from both parts of the model and changing the estimate of the zero-inflated random effects (in 1st image; opposite in the 2nd image) to match the conditional part estimate while keeping the CI.
I am not knowledgeable enough to understand why this is happening. Since both rows have the same label I am having difficulty removing the wrong one.
Any tips on how to avoid this problem or a workaround to remove the undesired rows?
If you need more info, let me know.
Thank you in advance.
PS: Output images were changed to link due to insufficient reputation.
I'm trying to figure out how to use the gtsummary package for my dataset.
I have three categorical values and two of those set as strata. I'm not interested in the frequency of each sample but want the numeric value in the table.
Currently I'm using this simple code (x, y, z, are my categorical values, whereas SOC is the numerical values. Y and Z should go in the headline (strata).
Data %>%
select(x, y, z , SOC) %>%
tbl_strata(strata=z,
.tbl_fun =
~ .x %>%
tbl_summary(by = y , missing = "no"),
statistic = list(all_continuous()~ "{mean} ({sd})" ))%>%
modify_caption("**Soil organic carbon [%]**")%>%
bold_labels()
Edit
Let's take the trial dataset as an example:
trial %>%
select(trt, grade, stage, age, marker) %>%
tbl_strata(strata=stage,
~tbl_summary(.x, by = grade , missing = "no"),
statistic = list(all_continuous()~ "{mean} ({sd})"
))%>%
bold_labels()
What I'm looking for is a table like this, but without the frequency showing of each treatment (Drug A, B). I only want the age and marker to show up in my table but organized by treatment. I'd like to have the first section showing only the age and marker for the group that received Drug A. Then a section showing the same for Drug B.
Edit 2
Your input is exactly what I am looking for. With the trial dataset it works perfectly fine. However, ones I put in my data, the numeric values are all in one column instead of in rows. I also still get the frequencies and I can't figure out why. I use exactly the same code and the same amount of variables and my table looks somewhat like this:
I think nesting calls to tbl_strata() (one merging and the other stacking) will get you what you're after. Example below!
library(gtsummary)
packageVersion("gtsummary")
#> [1] '1.5.2'
tbl <-
trial %>%
select(trt, grade, stage, age, marker) %>%
tbl_strata(
strata = trt,
function(data) {
data %>%
tbl_strata(
strata = stage,
~ tbl_summary(
.x,
by = grade,
statistic = all_continuous() ~ "{mean} ({sd})",
missing = "no"
) %>%
modify_header(all_stat_cols() ~ "**{level}**")
)
},
.combine_with = "tbl_stack",
.combine_args = list(group_header = c("Drug A", "Drug B"))
) %>%
bold_labels()
Created on 2022-03-04 by the reprex package (v2.0.1)
I have used the gtsummary package (great package btw) since last month on my reports.
Now I am building a cohort table that will show pre-test value, post-test value, difference (p.p) and a t-test p-value.
I'm trying to build the same table as I have built it under Arsenal with pre-test being the first column and post-test being in the second column and so on, but the difference column shows a negative output when it isn't supposed to be.
I used mutate() to swap both columns, as when I don't use it it shows the post-test as the first column. I also tried swapping the post-test columns at first rows in the dataset itself as what I read in some posts. But to no avail.
homesurvey %>%
select(period, CB2.Textbooks, CB2.Magazines, CB2.Newspapers, CB2.Religious_books, CB2.Coloring_books, CB2.Comics) %>%
mutate(period = forcats::fct_rev(period)) %>%
tbl_summary(by = period,
statistic = all_continuous() ~ "{n} ({sd})",
label = list (CB2.Textbooks ~ "Textbooks",
CB2.Magazines ~ "Magazines",
CB2.Newspapers ~ "Newspapers",
CB2.Religious_books ~ "Religious books",
CB2.Coloring_books ~ "Coloring books",
CB2.Comics ~ "Comics")
)%>%
add_difference() %>%
modify_column_hide(ci)
It shows a negative difference even if it isn't supposed to be.
Output
I am looking at your example output (thanks for including it). The first row is showing 82% in pre-assessment and 96% in the post-assessment. 82 - 96 = -15%, so the difference should indeed be negative.
You can, however, flip the estimate by multiplying it by -1. Example below!
library(gtsummary)
packageVersion("gtsummary")
#> [1] '1.5.0'
tbl <-
trial %>%
select(response, death, trt) %>%
tbl_summary(by = trt, missing = "no") %>%
add_difference() %>%
modify_column_hide(ci) %>%
# you can flip the difference estimate by multiplying it by -1
modify_table_body(
~.x %>%
dplyr::mutate(estimate = -1 * estimate)
)
Created on 2021-11-10 by the reprex package (v2.0.1)
Follow up question to (Renaming Rows in gtsummary, tbl_regression/tbl_stack):
I am now trying to merge the renamed, stacked table (Table 1) with a tbl_summary table that includes the prevalence for each of the outcomes (Table 2). However, because each renamed line of Table 1 is, in reality, just the same variable repeated over and over again, it doesn't merge with Table 2, instead creating a (Table 3) that has duplicated outcome names stacked onto one another. Any way to merge these tables so that the lines of Table 1 match seamlessly with those from Table 2?
UPDATE:
As of gtsummary v 1.4.0, tbl_uvregression() now accepts survey objects.
library(gtsummary)
packageVersion("gtsummary")
#> [1] '1.4.0'
# convert trial data frame to survey object
tbl <-
survey::svydesign(
data = trial[c("response", "death", "age", "marker")],
ids = ~1,
weights = ~1
) %>%
# build univariate regression models
tbl_uvregression(
x = age,
method = survey::svyglm,
method.args = list(family = binomial),
exponentiate = TRUE,
formula = "{y} ~ {x} + marker",
label = list(response = "Response", death = "Death"),
hide_n = TRUE,
include = -marker
) %>%
add_n() %>%
add_nevent() %>%
modify_header(
label = "**Outcome**",
estimate = "**Age OR**"
)
Created on 2021-04-14 by the reprex package (v2.0.0)
I am trying to fir a partial db-RDA with field.ID to correct for the repeated measurements character of the samples. However including Condition(field.ID) leads to Disappearance of the centroids of the main factor of interest from the plot (left plot below).
The Design: 12 fields have been sampled for species data in two consecutive years, repeatedly. Additionally every year 3 samples from reference fields have been sampled. These three fields have been changed in the second year, due to unavailability of the former fields.
Additionally some environmental variables have been sampled (Nitrogen, Soil moisture, Temperature). Every field has an identifier (field.ID).
Using field.ID as Condition seem to erroneously remove the F1 factor. However using Sampling campaign (SC) as Condition does not. Is the latter the rigth way to correct for repeated measurments in partial db-RDA??
set.seed(1234)
df.exp <- data.frame(field.ID = factor(c(1:12,13,14,15,1:12,16,17,18)),
SC = factor(rep(c(1,2), each=15)),
F1 = factor(rep(rep(c("A","B","C","D","E"),each=3),2)),
Nitrogen = rnorm(30,mean=0.16, sd=0.07),
Temp = rnorm(30,mean=13.5, sd=3.9),
Moist = rnorm(30,mean=19.4, sd=5.8))
df.rsp <- data.frame(Spec1 = rpois(30, 5),
Spec2 = rpois(30,1),
Spec3 = rpois(30,4.5),
Spec4 = rpois(30,3),
Spec5 = rpois(30,7),
Spec6 = rpois(30,7),
Spec7 = rpois(30,5))
data=cbind(df.exp, df.rsp)
dbRDA <- capscale(df.rsp ~ F1 + Nitrogen + Temp + Moist + Condition(SC), df.exp); ordiplot(dbRDA)
dbRDA <- capscale(df.rsp ~ F1 + Nitrogen + Temp + Moist + Condition(field.ID), df.exp); ordiplot(dbRDA)
You partial out variation due to ID and then you try to explain variable aliased to this ID, but it was already partialled out. The key line in the printed output was this:
Some constraints were aliased because they were collinear (redundant)
And indeed, when you ask for details, you get
> alias(dbRDA, names=TRUE)
[1] "F1B" "F1C" "F1D" "F1E"
The F1? variables were constant within ID which already was partialled out, and nothing was left to explain.