I need to create a table where summarize several continuous variables by two categorical variables - gtsummary

In the first figure you can see what I´ve been trying with no success.
"ciclo" and "etnibee" are two categorical variables
In the second figure you can see what I wish I could get...
Expected Outcomes
Please help me, thanks in advance.

The table is possible to construct using the various building blocks of tables available in gtsummary. Admittedly, it's not the easiest, though. Example below!
library(gtsummary)
packageVersion("gtsummary")
#> [1] '1.4.1'
library(tidyverse)
fun1 <- function(data, variable, by) {
# extract variable label
lbl <- attr(data[[variable]], "label") %||% variable
# construct table
data %>%
nest(data = -all_of(by)) %>%
arrange(.data[[by]]) %>%
rowwise() %>%
mutate(
tbl =
tbl_summary(
data = data,
include = variable,
missing = "no",
label = list(as.character(.data[[by]])) %>% setNames(.env$variable)
) %>%
modify_header(stat_0 ~ paste0("**", lbl, "**")) %>%
modify_table_body(~.x %>% mutate(variable = .env$by)) %>%
list()
) %>%
pull(tbl) %>%
tbl_stack(quiet = TRUE)
}
# now stratify all these resulst by another variable
final_tbl <-
tbl_strata(
trial,
strata = trt,
~c("age", "marker") %>%
# now add multiple variable columns
map(function(v) fun1(data = .x, variable = v, by = "grade")) %>%
tbl_merge() %>%
modify_spanning_header(everything() ~ NA),
.combine_with = "tbl_stack"
)
Created on 2021-07-10 by the reprex package (v2.0.0)

Related

How do wee keep only levels of interstes in the output table in tbl_summary?

I was trying to keep only "Yes" in the out put and keep "No" in the back ground instead of all levels are printed out.
i tried the below code
table4<-Age_group_socio %>%
tbl_summary(by=Age england,missing="no",value =list(c(Cardiac diseases~'Yes', Hypertension~'Yes',Liver diseases~'Yes', Renal diseases~'Yes', Diabetes~'Yes', Neurological diseases~"Yes", Malignancy~'Yes', Malaria~'Yes', HIV~'Yes', Other immune deficiency diseases~'Yes',Tuberculosis~'Yes',Other chronic lung diseases~'Yes',Measles~"Yes" ))) %>%
bold_labels()
Error: 'value' argument must be a list of formulas or named list (see ?syntax). LHS of the formula is the variable specification, and the RHS is the value specification: list(stage ~ "T1")

Using mgcv gam() and gtsummary tbl_uvregression()

How do we specify a smoothing spline fit for certain variables in tbl_uvregression() with method = gam?
data %>%
select(outcome, predictors) %>%
tbl_uvregression(
method = gam,
y = outcome,
method.args = list(family = binomial),
exponentiate = T)
For example if I want to indicate s(x1) in the gam model formula for variable x1, how do we add that in the above code?
You cannot wrap the variable in a function, like s() in tbl_uvregression(). You will need to construct the individual tables with tbl_regression(), then stack them on top of one another. Code example below! But, it is a bit strange because the smoothed terms don't have a single odds ratio....so you're just getting a table of p-values....
library(gtsummary)
packageVersion("gtsummary")
#> [1] '1.4.1'
library(tidyverse)
library(mgcv)
#> Loading required package: nlme
#>
#> Attaching package: 'nlme'
#> The following object is masked from 'package:dplyr':
#>
#> collapse
#> This is mgcv 1.8-35. For overview type 'help("mgcv-package")'.
tbl_uv <-
tibble(variable = c("age", "marker")) %>%
rowwise() %>%
mutate(
# build reg models
tbl =
glue::glue("response ~ s({variable})") %>%
as.formula() %>%
gam(data = trial, family = binomial) %>%
tbl_regression() %>%
list()
) %>%
# stack the regression tables
pull() %>%
tbl_stack()
Created on 2021-05-20 by the reprex package (v2.0.0)

gtsummary inline statement in gt resulting in NA next to percentage

I use the inline statement in Rmarkdown from the gtsummary package. However, I get a strange result when I use it with a certain variable !!
The problem happens when a variable and a level of the variable have the same level. Here is problem demonstrated with the trial data frame that comes with the package.
var_label(trial) <- list(trt = "Drug A")
tbl1 <- trial %>%
select(trt) %>%
tbl_summary()
inline_text(tbl1, variable = trt, level = "Drug A")
it results in:
[1] NA "98 (49%)"
Any idea why this is happening?
Here is my very minimalistic YAML:
title: "hello"
author: "ebay"
date: "3/5/2021"
and my setup chunk:
library(gtsummary)
knitr::opts_chunk$set(error = F, echo = F, warning = F, fig.width=6.3, fig.height=4.5, fig.align = "center")
The labels of the variable and its levels shouldn't be 100% identical.

How to modify the default variable type defined by "all_categorical()" in "gtsummary"? when mean of ordinal variable were wanted?

The variable "Var2" has been set as categorical variable by default, while the mean(sd) were needed sometimes. So I am interested in how to modified this.
data_table_1 =
data %>%
dplyr::select(group, var1, var2)
data_table_1 %>%
tbl_summary(by = group, missing = "no",
statistic = list(all_continuous() ~ "{mean} ± {sd}",
all_categorical() ~ "{n} ({p}%)"),
digits = list(all_continuous() ~ c(2, 2))) %>%
add_p(test = list(all_continuous() ~ "pttest2", all_categorical() ~ "pttest2"),
pvalue_fun = function(x) sprintf(x, fmt='%#.3f'))
The function tbl_summary() does its best to guess the type of summary that best suits the data...but this is not always how you'd like to summarize your data. To update the default summary type, use the type= argument. In this case you'd want to include type = list(Var2 ~ "continuous") to summarize the data continuously.
Hope this helps!

Shiny - postgres database query using dplyr and a reactive value

I'm using a postgres db for a shiny app, and I'm having trouble getting a dplyr query to work.
I have the following reactive. si.division is a dataframe, and input$si_division is a select input:
si_division_selected <- reactive({
si.division %>%
filter(division_name %in% input$si_division) %>%
select(division_code) %>%
unlist(use.names = FALSE)
})
I'm trying to pass this into a dplyr query using src_pool
industry_division_code <- src_pool(pool) %>% tbl("si_alldata") %>%
translate_sql(division_code %in% si_division_selected()) %>%
select(industry_code)
I'm getting the following error:
Error in UseMethod: no applicable method for 'select_' applied to an
object of class "c('sql', 'character')
I have also tried:
industry_division_code <- src_pool(pool) %>% tbl("si_alldata") %>%
filter(division_code %in% si_division_selected()) %>%
select(industry_code)
Which returns:
Error in postgresqlExecStatement(conn, statement, ...) : RS-DBI
driver: (could not Retrieve the result : ERROR: syntax error at or
near "SI_DIVISION_SELECTED" LINE 5: WHERE ("division_code" IN
SI_DIVISION_SELECTED()))
If I load the file into R instead of using the database I have no issues:
industry_division_code <- si_alldata %>%
filter(division_code %in% si_division_selected()) %>%
select(industry_code)
I think if you want to keep using si_division_selected() as the value that is passed in the filter, then you should be able to use the rlang package to force the evaluation of the function, so the line would look like this: filter(division_code %in% !! si_division_selected()). Although, your current solution of just saving the results off to a variable would be my preferred avenue.