Is there a way to change the categorical variables in tbl_summary from displaying as {p}% to proportions? - gtsummary

Is there a way to change the categorical variables in tbl_summary from displaying as {p}% to proportions?
An example, instead of showing 48%, it will be 0.48. I tried style_percent and style number without any look. I also tried datasummary but it didn't work and I couldn't display the standard deviation under the average in parentheses.
tbl_summary(data = CPS, by = "Type",
include = c(Female, Hispanic,
age,
DadGradCollege,
MomGradCollege,
ftotval_def
),
statistic = list(all_continuous() ~ "{mean} \n({sd})",
all_categorical() ~ "{p}%"),
label = list(
Female~ "Female",
Hispanic ~ "Hispanic",
age ~ "Age",
DadGradCollege ~ "Fathers with \n \t College",
MomGradCollege ~ "Mothers with \n \t College",
ftotval_def ~ "Total Family Income \n \t (1999 dollars)"
),
missing = "no")

Yep, you can use the digits= argument to pass functions that will style/round/format the statistics. Example below!
library(gtsummary)
packageVersion("gtsummary")
#> [1] '1.6.2'
trial |>
tbl_summary(
include = grade,
statistic = all_categorical() ~ "{p}",
digits = all_categorical() ~ function(x) style_number(x, digits = 2)
) %>%
as_kable() # convert to kable for SO
Characteristic
N = 200
Grade
I
0.34
II
0.34
III
0.32
Created on 2022-11-04 with reprex v2.0.2

Related

Effect size estimates for tbl_svysummary

I had a quick question. Is there any way to add effect size estimates (i.e., Cohen's D and/or Cramer's V) for tbl_svysummary when comparing demographic factors to one another? I am looking for pretty much the same answer that was provided for this post (How to add the Chi-square effect size Cramer's V in the summary table using R package “gtsummary”?)
library(gtsummary)
my_ES_test <- function(data, variable, by, ...) {
rstatix::cohens_d(data, as.formula(glue::glue("{variable} ~
{by}")))$effsize
}
my_cramer_v <- function(data, variable, by, ...) {
table(data[[variable]], data[[by]]) %>%
rstatix::cramer_v()
}
gtTable <-
mtcars %>%
select(hp, vs, am) %>%
tbl_summary(by = vs) %>%
add_p() %>%
add_stat(
fns = list(all_continuous() ~ my_ES_test,
all_categorical() ~ my_cramer_v)) %>%
modify_header(add_stat_1 ~ "**Effect size**")
however, when I tried the methodology suggested therein, it did not work for tbl_svysummary. Example below:
library(tidyverse)
library(rstatix)
library(gtsummary)
my_ES_test <- function(data, variable, by, ...) {
rstatix::cohens_d(data, as.formula(glue::glue("{variable} ~
{by}")))$effsize
}
my_cramer_v <- function(data, variable, by, ...) {
table(data[[variable]], data[[by]]) %>%
rstatix::cramer_v()
}
tbl_svysummary_ex1 <-
survey::svydesign(~1, data = as.data.frame(Titanic), weights =
~Freq) %>%
tbl_svysummary(by = Survived, percent = "row", include =
c(Class, Age)) %>%
add_p(test = list(all_categorical() ~ "svy.chisq.test")) %>%
add_stat(
fns = list(all_continuous() ~ my_ES_test,
all_categorical() ~ my_cramer_v)) %>%
modify_header(add_stat_1 ~ "**Effect size**")
Furthermore, on the gtsummary website, there do not seem to be any instructions for how to do this in tbl_svysummary either. Any guidance here would be much appreciated!
The examples below shoe the statistic and the DFs. If you want an effect size that is not returned by default, then you'll need to write a custom method for add_difference() that includes the estimate.
library(gtsummary)
# create summary table
tbl_svysummary_ex1 <-
survey::svydesign(~1, data = as.data.frame(Titanic), weights = ~Freq) %>%
tbl_svysummary(by = Survived, percent = "row", include = c(Class, Age)) %>%
add_p(test = list(all_categorical() ~ "svy.chisq.test"))
tbl_svysummary_ex1$table_body |> names()
#> [1] "variable" "test_name" "var_type" "var_label" "row_type"
#> [6] "label" "stat_1" "stat_2" "test_result" "statistic"
#> [11] "p.value" "ndf" "ddf"
# unhide the statistic and DF columns by assigning a header
tbl_svysummary_ex1 |>
modify_header(
statistic = "**Chi-square**",
ndf = "**ndf**",
ddf = "**ddf**"
) |>
modify_fmt_fun(c(statistic, ndf, ddf) ~ style_sigfig) |>
as_kable()
Characteristic
No, N = 1,490
Yes, N = 711
Chi-square
p-value
ndf
ddf
Class
0.41
0.7
2.6
81
1st
122 (38%)
203 (62%)
2nd
167 (59%)
118 (41%)
3rd
528 (75%)
178 (25%)
Crew
673 (76%)
212 (24%)
Age
0.63
0.4
1.0
31
Child
52 (48%)
57 (52%)
Adult
1,438 (69%)
654 (31%)
Created on 2022-11-20 with reprex v2.0.2

Add frequency and % of missing values in gtsummary

df_nhpi %>%
select(AGE, SEX, MAR_STAT, HEIGHT, WEIGHT, BMI, HTN, HTNMED, MI, Smoking, COPD, CANCER, DIABETES) %>%
tbl_summary(by = SEX,
label = list(MAR_STAT ~ 'Marital Status',
HTN ~ 'Hypertension',
HTNMED ~ 'Hypertension Medication',
MI ~ 'Heart Attack',
Smoking ~ 'Smoking Status',
COPD ~ 'Chronic Obstructive Pulmonary Disease'),
type = list(c("HTN","HTNMED", "MI", "COPD", "CANCER") ~ "categorical"),
missing = "ifany",
missing_text = "Unknown",
statistic = list(all_continuous() ~ "{mean} ({sd})",
all_categorical() ~ "{n} ({p}%)"),
digits = all_continuous() ~ 2, percent = "column") %>%
add_stat_label() %>%
add_p(test = all_continuous() ~ "t.test", pvalue_fun =
function(x) style_pvalue(x, digits = 3)) %>%
bold_p() %>%
modify_caption("**Table 1. Baseline Characteristics**") %>% bold_labels()
I'm trying to generate a table one. But, the issue here is, I want % for missing values across columns (specifically for categorical variables) and at the same time, I don't want missing values to be included while calculating p-values. I'm trying to do this in single chunk of code. Is there anyway to do this or should I go for the conventional method?
I've been searching the whole internet for the past three days. But, I don't find anything that works in my case.
PS: mutate and forcats doesn't work as it skews my p-values.
I prepared two solutions that both report the proportion of missing data. Hopefully one of them works for you!
library(gtsummary)
packageVersion("gtsummary")
#> [1] '1.5.2'
# add % missing in new column
tbl1 <-
trial %>%
tbl_summary(
by = trt,
include = response,
type = all_dichotomous() ~ "categorical",
missing = "no"
) %>%
add_p() %>%
add_n(statistic = "{n_miss} ({p_miss}%)") %>%
modify_header(n = "**Missing**")
# prepare tbl_summary with rows for missing, then merge in p-values
tbl2 <-
trial %>%
dplyr::mutate(response = forcats::fct_explicit_na(factor(response))) %>%
tbl_summary(
by = trt,
include = response,
label = list(response = "Tumor Response")
) %>%
list(tbl1 %>% modify_column_hide(c(n, all_stat_cols()))) %>%
tbl_merge(tab_spanner = FALSE)
Created on 2022-03-22 by the reprex package (v2.0.1)

Is there a way to add percentage to tbl_regression add_nevent?

I've just discovered that add_nevent in gtsummary can have the option location = "level". I am rapt! But I would like it to have a percentage as well. I've tried adding statistic = "{n}({p}%)" but nothing changes.
Here is my code:
tbl_regression(glm(rellife ~ age + gender, data = df, family = "binomial"), exponentiate = TRUE) %>%
add_nevent(location = "level", statistic = "{n}/{N}%") %>% # add number of events of the outcome
add_n(location = "level")
And the table:
I would like to have 1601 (93.6%) in the column Event N for Age and so on.
Any help would be appreciated.
Thanks
After adding the N and N event, you can use the modify_table_body() function to calculate the event rate. Example below!
library(gtsummary)
#> #BlackLivesMatter
packageVersion("gtsummary")
#> [1] '1.5.2'
tbl <-
glm(response ~ age + grade, trial, family = binomial) %>%
tbl_regression(exponentiate = TRUE) %>%
add_nevent(location = "level") %>%
add_n(location = "level") %>%
# adding event rate
modify_table_body(
~ .x %>%
dplyr::mutate(
stat_nevent_rate =
ifelse(
!is.na(stat_nevent),
paste0(style_sigfig(stat_nevent / stat_n, scale = 100), "%"),
NA
),
.after = stat_nevent
)
) %>%
# merge the colums into a single column
modify_cols_merge(
pattern = "{stat_nevent} / {stat_n} ({stat_nevent_rate})",
rows = !is.na(stat_nevent)
) %>%
# update header to event rate
modify_header(stat_nevent = "**Event Rate**")
Created on 2022-03-21 by the reprex package (v2.0.1)

How to change the formatting values of the results obtained by "add_difference" function in a gtsummary table?

I'm using gtsummary package to generate great summary table of mean difference and 95% IC among paired values.
However, the default output format of the mean difference and 95% IC did not include the same format and round (i.e. in my data : no digit after decimal point for the mean difference and 1 digit after decimal point for the inferior limit and no digit after decimal point for the superior limit of the 95% IC).
I try to change it using the estimate_fun= argument but i only obtained error message. Probably due to a bad syntax ? Is anyone has a solution ? :)
Example using the example table for paired data (for example, i try to obtain 1 digit round for difference and the 95%CI)(http://www.danieldsjoberg.com/gtsummary/articles/gallery.html)
trial_paired <-
trial %>%
select(trt, marker) %>%
group_by(trt) %>%
mutate(id = row_number()) %>%
ungroup()
trial_paired %>%
filter(!is.na(marker)) %>%
group_by(id) %>%
filter(n() == 2) %>%
ungroup() %>%
tbl_summary(by = trt, include = -id, statistic = list(all_continuous() ~ "{mean} ({sd})")) %>%
add_difference(test = list(all_continuous() ~ "paired.t.test"), group = id, estimate_fun = list(all_continuous() ~ style_sigfig(.x, digits=1)))
Result is only : Erreur : Error in estimate_fun= argument input. Select from ‘marker’
Many thanks is anybody has a solution and sorry if the question is not so clear...
Hello and welcome to stackoverflow!
There was a bug in the add_difference(estimate_fun=) that is now fixed in the dev version of the package on GitHub. Install the version from GitHub and use the code below.
# renv::install("ddsjoberg/gtsummary")
library(gtsummary)
#> #Uighur
packageVersion("gtsummary")
#> [1] '1.4.2.9001'
trial_paired <-
trial %>%
select(trt, marker) %>%
dplyr::group_by(trt) %>%
mutate(id = dplyr::row_number()) %>%
dplyr::ungroup()
tbl <-
trial_paired %>%
dplyr::filter(!is.na(marker)) %>%
dplyr::group_by(id) %>%
dplyr::filter(dplyr::n() == 2) %>%
dplyr::ungroup() %>%
tbl_summary(by = trt, include = -id, statistic = list(all_continuous() ~ "{mean} ({sd})")) %>%
add_difference(
test = list(all_continuous() ~ "paired.t.test"),
group = id,
estimate_fun = marker ~ function(x) style_sigfig(x, digits = 1)
)
Created on 2021-07-16 by the reprex package (v2.0.0)

glue cannot interpolate functions into strings. * object 'n' is a function

I'm loving gtsummary. I write a lot of reports and love my pretty tables!
I've run into a problem when I updated.
I just updated to using all_stat_cols(TRUE) instead of stat_by.
I'm getting an error with I try to include {level} or {n} when I add overall.
It works if stat_0 isn't included, so I tried with with all_stat_cols(FALSE) and a separate stat_0 ~ "Total n = {n}", but get the error "Error: glue cannot interpolate functions into strings. * object 'n' is a function."
This works:
nr %>%
select(gender, year) %>%
tbl_summary (by = gender, missing = "no") %>%
bold_labels() %>%
add_overall(last=TRUE) %>%
modify_footnote(update = everything() ~ NA) %>%
modify_header(update = list(label ~ "",
all_stat_cols(FALSE) ~ "{level}\n n = {n}\n"))
But I also want my overall to be changed "Total n = 17" like this:
nr %>%
select(gender, year) %>%
tbl_summary (by = gender, missing = "no") %>%
bold_labels() %>%
add_overall(last=TRUE) %>%
modify_footnote(update = everything() ~ NA) %>%
modify_header(update = list(label ~ "",
all_stat_cols(FALSE) ~ "{level}\n n = {n}\n",
stat_0 ~ "Total\n = {n}"))
But get this error:
Error: glue cannot interpolate functions into strings. * object 'n' is a function.
I also want to remove the first row (Year level) if anyone knows how to do that too!
Any help or ideas would be very much appreciated.
hello! The reason you're getting an error is because little n represents the in the by= group, and big N is the overall number of obs. When you try to use little n in the header for the overall column you get the error because it's not defined.
Change the little n to big N, and you should be all set! Example below!
library(gtsummary)
#> #BlackLivesMatter
packageVersion("gtsummary")
#> [1] '1.4.1'
tbl <-
trial %>%
select(trt, age, grade) %>%
tbl_summary(by = trt, missing = "no") %>%
add_overall() %>%
modify_header(
update = list(label ~ "",
all_stat_cols(FALSE) ~ "{level}\n n = {n}\n",
stat_0 ~ "Total\n = {N}")
)
Created on 2021-06-01 by the reprex package (v2.0.0)