How can we sort by frequency in a tbl_stack with two or more group headers in gtsummary - gtsummary

it's me again with questions I haven't seen in search.
I have a similar problem with this Sorting overlapping categorical variables in {gtsummary}. His was a dataset generated in R, I could not figure out how to do it in a coding that has dataset imported through read_excel package.
The dataset looks like this, similar to above link's problem. And here is the table result.
Simply, I would want to sort frequencies in a low-to-high or high-to-low for "overall" per header when I used a tbl_stack
Here is my coding in RMarkdown.
`{r knowledge and access, echo=FALSE, message=FALSE}
table2 <-
mtbaseline %>%
select (q2, `q7/1`, `q7/2`, `q7/3`, `q7/4`, `q7/5`, `q7/6`, `q7/7`, `q7/8`, `q7/888`, `q7/999`) %>%
mutate(q2 = case_when(q2 == "1" ~ "1.Girl",
q2 == "2" ~ "2.Boy",
q2 == "999" ~ "Don't know")) %>%
tbl_summary(by = q2,
statistic = all_continuous() ~ "{n} ({sd})",
label = list (`q7/1` ~ "Label 1",
`q7/2` ~ "Label 2",
`q7/3` ~ "Label 3",
`q7/4` ~ "Label 4",
`q7/5` ~ "Label 5",
`q7/6` ~ "Label 6",
`q7/7` ~ "Label 7",
`q7/8` ~ "Label 8",
`q7/888` ~ "Others",
`q7/999` ~ "Dont know ")) %>%
add_overall()
table3 <-
mtbaseline %>%
select (q2, `q8/1`, `q8/2`, `q8/3`, `q8/4`, `q8/5`, `q8/6`, `q8/7`, `q8/8`, `q8/888`, `q8/999`) %>%
mutate(q2 = case_when(q2 == "1" ~ "1.Girl",
q2 == "2" ~ "2.Boy",
q2 == "999" ~ "Don't know")) %>%
tbl_summary(by = q2,
statistic = all_continuous() ~ "{n} ({sd})",
label = list (`q8/1` ~ "Label 1",
`q8/2` ~ "Label 2",
`q8/3` ~ "Label 3 ",
`q8/4` ~ "Label 4",
`q8/5` ~ "Label 5",
`q8/6` ~ "Label 6",
`q8/7` ~ "Label 7",
`q8/8` ~ "Label 8",
`q8/888` ~ "Others",
`q8/999` ~ "Dont know ")) %>%
add_overall()
tbl_stack(list(table2, table3), group_header = c("Header 1", "Header 2")) %>%
modify_caption("**Figure 2. Knowledge and Access**") %>%
as_gt() %>%
gt::tab_style(style = gt::cell_text(weight = "bold"), locations = gt::cells_row_groups(groups = everything()))

Related

Display only certain percentages

I have chosen the percentage calculation by rows. What do I have to do to get only the percentage displayed for certain columns?
Thanks in advance!
Table1234 %>%
select(everything(), -c(screening_id, m07_mainsourceincome_c, m01_2_ageyears_q, r_sample)) %>%
tbl_summary(by = "test_result",
percent = "row",
digits = NULL,
label = list(age_group_10yrs ~ "Age Groups",
m01_1_sex_d ~ "Sex",
m05_qualificationmg_c ~ "Education",
m06_indivemployment_c ~ "Profession",
m06_indivemployment_currently_employed ~ "Employment Status")) %>%
modify_column_hide(., columns = stat_1) %>%
bold_labels() %>%
add_p() %>%
add_overall()

Is there a way to change the categorical variables in tbl_summary from displaying as {p}% to proportions?

Is there a way to change the categorical variables in tbl_summary from displaying as {p}% to proportions?
An example, instead of showing 48%, it will be 0.48. I tried style_percent and style number without any look. I also tried datasummary but it didn't work and I couldn't display the standard deviation under the average in parentheses.
tbl_summary(data = CPS, by = "Type",
include = c(Female, Hispanic,
age,
DadGradCollege,
MomGradCollege,
ftotval_def
),
statistic = list(all_continuous() ~ "{mean} \n({sd})",
all_categorical() ~ "{p}%"),
label = list(
Female~ "Female",
Hispanic ~ "Hispanic",
age ~ "Age",
DadGradCollege ~ "Fathers with \n \t College",
MomGradCollege ~ "Mothers with \n \t College",
ftotval_def ~ "Total Family Income \n \t (1999 dollars)"
),
missing = "no")
Yep, you can use the digits= argument to pass functions that will style/round/format the statistics. Example below!
library(gtsummary)
packageVersion("gtsummary")
#> [1] '1.6.2'
trial |>
tbl_summary(
include = grade,
statistic = all_categorical() ~ "{p}",
digits = all_categorical() ~ function(x) style_number(x, digits = 2)
) %>%
as_kable() # convert to kable for SO
Characteristic
N = 200
Grade
I
0.34
II
0.34
III
0.32
Created on 2022-11-04 with reprex v2.0.2

Add frequency and % of missing values in gtsummary

df_nhpi %>%
select(AGE, SEX, MAR_STAT, HEIGHT, WEIGHT, BMI, HTN, HTNMED, MI, Smoking, COPD, CANCER, DIABETES) %>%
tbl_summary(by = SEX,
label = list(MAR_STAT ~ 'Marital Status',
HTN ~ 'Hypertension',
HTNMED ~ 'Hypertension Medication',
MI ~ 'Heart Attack',
Smoking ~ 'Smoking Status',
COPD ~ 'Chronic Obstructive Pulmonary Disease'),
type = list(c("HTN","HTNMED", "MI", "COPD", "CANCER") ~ "categorical"),
missing = "ifany",
missing_text = "Unknown",
statistic = list(all_continuous() ~ "{mean} ({sd})",
all_categorical() ~ "{n} ({p}%)"),
digits = all_continuous() ~ 2, percent = "column") %>%
add_stat_label() %>%
add_p(test = all_continuous() ~ "t.test", pvalue_fun =
function(x) style_pvalue(x, digits = 3)) %>%
bold_p() %>%
modify_caption("**Table 1. Baseline Characteristics**") %>% bold_labels()
I'm trying to generate a table one. But, the issue here is, I want % for missing values across columns (specifically for categorical variables) and at the same time, I don't want missing values to be included while calculating p-values. I'm trying to do this in single chunk of code. Is there anyway to do this or should I go for the conventional method?
I've been searching the whole internet for the past three days. But, I don't find anything that works in my case.
PS: mutate and forcats doesn't work as it skews my p-values.
I prepared two solutions that both report the proportion of missing data. Hopefully one of them works for you!
library(gtsummary)
packageVersion("gtsummary")
#> [1] '1.5.2'
# add % missing in new column
tbl1 <-
trial %>%
tbl_summary(
by = trt,
include = response,
type = all_dichotomous() ~ "categorical",
missing = "no"
) %>%
add_p() %>%
add_n(statistic = "{n_miss} ({p_miss}%)") %>%
modify_header(n = "**Missing**")
# prepare tbl_summary with rows for missing, then merge in p-values
tbl2 <-
trial %>%
dplyr::mutate(response = forcats::fct_explicit_na(factor(response))) %>%
tbl_summary(
by = trt,
include = response,
label = list(response = "Tumor Response")
) %>%
list(tbl1 %>% modify_column_hide(c(n, all_stat_cols()))) %>%
tbl_merge(tab_spanner = FALSE)
Created on 2022-03-22 by the reprex package (v2.0.1)

glue cannot interpolate functions into strings. * object 'n' is a function

I'm loving gtsummary. I write a lot of reports and love my pretty tables!
I've run into a problem when I updated.
I just updated to using all_stat_cols(TRUE) instead of stat_by.
I'm getting an error with I try to include {level} or {n} when I add overall.
It works if stat_0 isn't included, so I tried with with all_stat_cols(FALSE) and a separate stat_0 ~ "Total n = {n}", but get the error "Error: glue cannot interpolate functions into strings. * object 'n' is a function."
This works:
nr %>%
select(gender, year) %>%
tbl_summary (by = gender, missing = "no") %>%
bold_labels() %>%
add_overall(last=TRUE) %>%
modify_footnote(update = everything() ~ NA) %>%
modify_header(update = list(label ~ "",
all_stat_cols(FALSE) ~ "{level}\n n = {n}\n"))
But I also want my overall to be changed "Total n = 17" like this:
nr %>%
select(gender, year) %>%
tbl_summary (by = gender, missing = "no") %>%
bold_labels() %>%
add_overall(last=TRUE) %>%
modify_footnote(update = everything() ~ NA) %>%
modify_header(update = list(label ~ "",
all_stat_cols(FALSE) ~ "{level}\n n = {n}\n",
stat_0 ~ "Total\n = {n}"))
But get this error:
Error: glue cannot interpolate functions into strings. * object 'n' is a function.
I also want to remove the first row (Year level) if anyone knows how to do that too!
Any help or ideas would be very much appreciated.
hello! The reason you're getting an error is because little n represents the in the by= group, and big N is the overall number of obs. When you try to use little n in the header for the overall column you get the error because it's not defined.
Change the little n to big N, and you should be all set! Example below!
library(gtsummary)
#> #BlackLivesMatter
packageVersion("gtsummary")
#> [1] '1.4.1'
tbl <-
trial %>%
select(trt, age, grade) %>%
tbl_summary(by = trt, missing = "no") %>%
add_overall() %>%
modify_header(
update = list(label ~ "",
all_stat_cols(FALSE) ~ "{level}\n n = {n}\n",
stat_0 ~ "Total\n = {N}")
)
Created on 2021-06-01 by the reprex package (v2.0.0)

ggvis with tooltip not working with layer_smooths

This code works as expected:
all_values <- function(x) {
if(is.null(x)) return(NULL)
row <- mtc[mtc$id == x$id, ]
paste0(names(row), ": ", format(row), collapse = "<br />")
}
mtc %>% ggvis(x = ~wt, y = ~mpg, key := ~id) %>%
layer_points() %>%
add_tooltip(all_values, "hover")
but when I add layer_smooths(stroke := "red", se = T) the code give me an error:
mtc %>% ggvis(x = ~wt, y = ~mpg, key := ~id) %>%
layer_points() %>%
layer_smooths(stroke := "red", se = T) %>%
add_tooltip(all_values, "hover")
Error in eval(expr, envir, enclos) : object 'id' not found
Why? how can I fix it?
Thanks!
If I hadn't recognized this as an example from one of the ggvis help pages, I wouldn't have known where mtc came from. The problem seems to be that you set the key property in the ggvis() statement, but layer_smooths() evidently doesn't support it, so you need to move it into layer_points(). I got the visualization to run with the following code:
library(ggvis)
mtc <- mtcars
mtc$id <- seq_len(nrow(mtc))
all_values <- function(x)
{
if(is.null(x)) return(NULL)
row <- mtc[mtc$id == x$id, ]
paste0(names(row), ": ", format(row), collapse = "<br />")
}
mtc %>% ggvis(x = ~wt, y = ~mpg) %>%
layer_smooths(stroke := "red", se = T) %>%
layer_points(key := ~id) %>%
add_tooltip(all_values, "hover")
However, when you hover over the smooth or the confidence bands, all of the values associated with the variables are labeled 'character(0)' in the tooltip.