How to remove the question mark symbol in reference group row of gtsummary regression table? - gtsummary

I used tbl_regression to visualize my main effects (see image and code below), and I'm wondering how to get rid of the question mark symbol in the confidence interval column for my reference group ("Neutral"). add_estimate_to_reference_rows only adds the null value for the OR. A horizontal line or a null 95% CI would look better than the question mark symbol in the CI column.
m.crude.cat %>% tbl_regression(exponentiate = TRUE, add_estimate_to_reference_rows = TRUE)
tbl_regression output

The question make is an em-dash, and it looks like there is some kind of encoding issue on your machine. You can change the em-dash to anything else, however. Example below!
library(gtsummary)
packageVersion("gtsummary")
#> [1] '1.4.0'
tbl <-
glm(response ~ grade, trial, family = binomial) %>%
tbl_regression(exponentiate = TRUE,
add_estimate_to_reference_rows = TRUE) %>%
modify_table_styling(
columns = c(estimate, ci),
rows = reference_row %in% TRUE,
missing_symbol = "Ref."
)
Created on 2021-04-30 by the reprex package (v2.0.0)

Related

Is there a way of adding a footnote to a gtsummary table that will persist after merging with another table?

I have two tables 'tbl' that I have merged. I want to add a footnote to the final merged table, but it doesn't work for me.
I've tried both adding it to the individual tables like so:
library(gtsummary)
packageVersion("gtsummary")
tbl <-
glm(response ~ grade, trial, family = binomial) %>%
tbl_regression(exponentiate = TRUE) %>%
add_glance_table(include = nobs) %>%
add_n(location = c("label", "level")) %>%
modify_footnote(all_stat_cols() ~ "models")
And it works for the individual tables, but as soon as I merge them the footnote disappears.
I've tried adding it to the merged table like so, but without success:
tbl_final <-
tbl_merge(list(tbl, tbl), tab_spanner = c("**Men**", "**Women**")) %>%
modify_footnote(all_stat_cols() ~ "Models")
How should I go about it?
Thanks!
You are using all_stat_cols(), which is intended for use with tbl_summary() objects, rather than tbl_regression() tables. Use the show_header_names() function to print the column names to be able to place the footnotes on the columns you need, e.g. modify_footnote(estimate ~ "Odds Ratio")

Using select statement in pyspark changes values in column

I'm experiencing a very weird behavior in pyspark (databricks).
In my initial dataframe (df_original) I have multiple columns (id, text and some_others) and I add a new column 'detected_language'. The new column is added using a join with another dataframe df_detections (with columns id and detected_language). The ids in the two dataframes correspond to each other).
df_detections is created like this:
ids = [125, ...] # length x
detections = ['ko', ...] # length x
detections_with_id = list(zip(ids, detections))
df_detections = spark.createDataFrame(detections_with_id, ["id", "detected_language"])
df = df_original.join(df_detections, on='id', how='left)
Here is the weird part. Whenever I display the dataframe using a select statement I get the correct detected_language value. However, using only display I get a totally different value (e.g. 'fr' or any other language code) for the same entry (see the statements and their corresponding results below).
How is that possible? Can anybody think of a reason why this is? And how would I solve something like this?
Displaying correct value with select:
display(df.select(['id', 'text', 'detected_language']))
id
text
detected_language
125
내 한국어 텍스트
ko
...
...
...
Displaying wrong value without select:
display(df)
id
text
other_columns...
detected_language
125
내 한국어 텍스트
...
fr
...
...
...
...
I appreciate any hints or ideas! Thank you!

Getting an extra column for median in gtsummary table

I saw an article that I'd like to replicate. They had categorical predictions as the rows, and use N for column 1 and Median of another variable for column 2. I'd like to be able to create a function to get me Median for Column 2.
Sample image
I'm not sure how to get the median at each specific level. I tried group by, but that would only give me median's per higher level.
library(gtsummary)
trial2 <- trial %>% select(stage, grade, ttdeath) %>%
group_by(stage, grade) %>%
mutate(median_ttdeath = median(ttdeath))
The easiest way to report continuous summaries within a group, is to use the tbl_continuous() function.
https://www.danieldsjoberg.com/gtsummary/reference/tbl_continuous.html
In your case, you'll need to make 2 summary tables, then merge the tables together. Example below!
library(gtsummary)
packageVersion("gtsummary")
#> [1] '1.5.0'
tbl1 <-
trial %>%
select(trt, grade) %>%
tbl_summary()
tbl2 <-
trial %>%
tbl_continuous(
variable = age,
include = c(trt, grade)
) %>%
modify_header(all_stat_cols() ~ "**Age**")
tbl_final <-
tbl_merge(list(tbl1, tbl2)) %>%
modify_spanning_header(everything() ~ NA)
Created on 2021-11-15 by the reprex package (v2.0.1)

How to label columns and retain group sizes when splitting summary table by group?

When creating a summary table, split by group, the size of each group automatically shows up at the top of their respective columns. So the column headings look like this: Characteristic | 1, N = 100 | 2, N = 120. Code below:
library(dplyr)
library(gtsummary)
data %>%
select(group, age, sex) %>%
tbl_summary(by = group)
However, I would like to name my groups to something more meaningful than "1" and "2". For example, if my data consists of kids in a swim class, I would want to name the groups by the name of the swim class: ducks and turtles. So I do something like this:
library(dplyr)
library(gtsummary)
data %>%
select(group, age, sex) %>%
tbl_summary(by = group) %>%
modify_header(
update = list(
stat_1 ~ "**Ducks**",
stat_2 ~ "**Turtles**"))
modify_spanning_header(
update = starts_with("stat_") ~ "Swim Class Name")
This works! However, the size of each group disappears from the top of their respective columns. My work-around is to add in the size of each group manually, as part of the names. I have to leave a little note for myself to check the N for each group before adding it in. Like this:
library(dplyr)
library(gtsummary)
data %>%
select(group, age, sex) %>%
tbl_summary(by = group) %>%
modify_header(
update = list(
stat_1 ~ "**Ducks**, N = 100",
stat_2 ~ "**Turtles**, N = 120")) %>% # to check the N for each group, remove this to see default appearance which shows the N
modify_spanning_header(
update = starts_with("stat_") ~ "Swim Class Name")
This works but its error-prone as it requires me to double check the numbers then add them in manually.
How do I label the columns, representing each group, AND retain the numbers showing group sizes when splitting the summary table by group?
There are two ways to get this done.
The first is to change the levels in the data frame before you pass it to tbl_summary(). Then the default column header will have your custom headers with the correct Ns by default.
The second is to take advantage dynamic statistics available within modify_header(). When you have a tbl_summary(by=) object split by a variable, you can access {n}, {N}, {p}, and they can be placed in the column header. Review the help file for details: http://www.danieldsjoberg.com/gtsummary/reference/modify.html (Note you need gtsummary v1.3.6 for this code to work.)
Both methods lead to identical tables.
library(gtsummary)
packageVersion("gtsummary")
#> [1] '1.3.6'
# Method 1: Change the underlying data before passing df to `tbl_summary()`
tbl1 <-
trial %>%
select(trt, age) %>%
mutate(trt = factor(trt, labels = c("Duck", "Turtle"))) %>%
tbl_summary(by = trt, missing = "no")
# Method 2: Use the dynamic stats available in `modify_header()`
tbl2 <-
trial %>%
select(trt, age) %>%
tbl_summary(by = trt, missing = "no") %>%
modify_header(list(
stat_1 ~ "**Duck**, N = {n}",
stat_2 ~ "**Turtle**, N = {n}"
))
Created on 2021-01-18 by the reprex package (v0.3.0)

How to generate t-value, F-value or Chi-square in the summary table using R package "gtsummary"?

I am working on creating summary table using the excellent R package "gtsummary", it really help me a lot in efficiently and accurately generating summary tables. But I wonder whether some of the statistics such as t-value, F-value, and Chi-square could be automatically generated just like the p-value?
library(gtsummary)
add_p_ex1 <-
trial[c("age", "grade", "response", "trt")] %>%
tbl_summary(by = trt) %>%
add_p()
Here is the summary table generated using "gtsummary"
UPDATED 2021-07-23
The test statistics are returned in a column called statistic. The column, however, is hidden by default from the output. You can add the test statistics to the table by assigning a column header (which will auto unhide the column). Example below!
library(gtsummary)
packageVersion("gtsummary")
#> [1] '1.4.2'
tbl <-
trial %>%
select(age, grade, response, trt) %>%
tbl_summary(by = trt) %>%
add_p(test = all_continuous() ~ "t.test") %>%
# add a header to the statistic column, which is hidden by default
# adding the header will also unhide the column
modify_header(statistic ~ "**Test Statistic**") %>%
modify_fmt_fun(statistic ~ style_sigfig)
Created on 2021-07-23 by the reprex package (v2.0.0)
modify_fmt_fun(statistic ~ style_sigfig)
this part of the code doesn't work. however the rest gives the output with 4 d