Using function lm() with group_by in R - group-by

I'm trying to create some lm() models for every level of the categorical variable, from one dataframe.
I used function lm() with group_by, but it doesn't work, creating only one model.
Of course, it is easy to create each datasets and use lm() for each of these, but I want to know other way, using group_by, apply, etc.
make_model <- function(data){
lm(Sepal.Length~Sepal.Width,data)
}
models <- iris %>%
group_by(Species) %>%
make_model
predicted <- iris %>%
group_by(Species) %>%
mutate(prediction=predict(models,.))

I would check out the many models chapter of R for data science:
https://r4ds.had.co.nz/many-models.html
library(tidyverse)
make_model <- function(data){
lm(Sepal.Length~Sepal.Width,data)
}
iris %>%
group_by(Species) %>%
nest() %>%
mutate(lm = map(data,
make_model)) %>%
mutate(tidy = map(lm,
broom::tidy)) %>%
unnest(tidy)

Related

R gtsummary package: How to Hide Colums in Summary Table

I'm using gtsummary to prepare my tables, I'm trying to hide one of the columns from the groups, the third column labeled as "1, N = 61"
Below is the code ran,
library(gtsummary)
trial <- trial
trial %>%
tbl_summary(by=response,
statistic = list(trt~"{n}/{N} ({p}%)")) %>%
add_overall() %>%
add_p() %>%
modify_column_hide(columns = "1")
The output provided
I was expecting that the third column would be hidden "1, N = 61"
The 2 columns (0 , 1) are named "stat_1" and "stat_2" respectively.
So, to hide the one you ask for:
trial %>%
tbl_summary(by=response,
statistic = list(trt~"{n}/{N} ({p}%)")) %>%
modify_column_hide(columns = "stat_2") %>%
add_p() %>%
add_overall()
Output:
Now, if you want to hide parameters of statistical tests you run, you would do it as indicated here:
https://rdrr.io/cran/gtsummary/man/modify_column_hide.html

Trying to display both mean and median with gtsummary

What is the correct syntax to display both median and mean of a continuous variable using tbl_continuous? Also, is it possible to display on 2 lines as you can do with tbl_summary and the continuous2 argument?
Code below is just displaying medians (see image).
comparison.data %>%
select(imaging, los.minutes, acuity) %>%
tbl_continuous(
by = imaging,
variable = los.minutes,
statistic =
los.minutes ~ c("{mean} ({sd})",
"{median} ({sd})")
) %>%
modify_spanning_header(all_stat_cols() ~ "**Imaging status**")
Just displaying medians
Example below!
library(gtsummary)
packageVersion("gtsummary")
#> [1] '1.6.1'
tbl <-
trial %>%
tbl_continuous(
variable = age,
by = trt,
include = grade,
statistic = ~"{mean} \n{median}"
) %>%
as_gt() %>%
gt::fmt_markdown(columns = everything())
Created on 2022-08-24 by the reprex package (v2.0.1)

Rstudio gtsummary table with three categorical variables

I'm trying to figure out how to use the gtsummary package for my dataset.
I have three categorical values and two of those set as strata. I'm not interested in the frequency of each sample but want the numeric value in the table.
Currently I'm using this simple code (x, y, z, are my categorical values, whereas SOC is the numerical values. Y and Z should go in the headline (strata).
Data %>%
select(x, y, z , SOC) %>%
tbl_strata(strata=z,
.tbl_fun =
~ .x %>%
tbl_summary(by = y , missing = "no"),
statistic = list(all_continuous()~ "{mean} ({sd})" ))%>%
modify_caption("**Soil organic carbon [%]**")%>%
bold_labels()
Edit
Let's take the trial dataset as an example:
trial %>%
select(trt, grade, stage, age, marker) %>%
tbl_strata(strata=stage,
~tbl_summary(.x, by = grade , missing = "no"),
statistic = list(all_continuous()~ "{mean} ({sd})"
))%>%
bold_labels()
What I'm looking for is a table like this, but without the frequency showing of each treatment (Drug A, B). I only want the age and marker to show up in my table but organized by treatment. I'd like to have the first section showing only the age and marker for the group that received Drug A. Then a section showing the same for Drug B.
Edit 2
Your input is exactly what I am looking for. With the trial dataset it works perfectly fine. However, ones I put in my data, the numeric values are all in one column instead of in rows. I also still get the frequencies and I can't figure out why. I use exactly the same code and the same amount of variables and my table looks somewhat like this:
I think nesting calls to tbl_strata() (one merging and the other stacking) will get you what you're after. Example below!
library(gtsummary)
packageVersion("gtsummary")
#> [1] '1.5.2'
tbl <-
trial %>%
select(trt, grade, stage, age, marker) %>%
tbl_strata(
strata = trt,
function(data) {
data %>%
tbl_strata(
strata = stage,
~ tbl_summary(
.x,
by = grade,
statistic = all_continuous() ~ "{mean} ({sd})",
missing = "no"
) %>%
modify_header(all_stat_cols() ~ "**{level}**")
)
},
.combine_with = "tbl_stack",
.combine_args = list(group_header = c("Drug A", "Drug B"))
) %>%
bold_labels()
Created on 2022-03-04 by the reprex package (v2.0.1)

Change the default Statistical test performed by "add_p()" function in gtsummary summary tables

I am using gtsummary package to generate summary tables.
I would like to do the following:
That the "add_p" function performs a two-proportions z-test for the proportions in the "by" variable instead of chi-square test for independence. Using stats::prop.test
Displays on the footnote that the "Statistical tests performed" are "2-sample test for equality of proportions with continuity correction"
How can I do that within this example code?
trial2 <- trial %>% select(trt, grade)
trial3 <- trial2[-which(trial2$grade == "III"),]
trial4 <- droplevels(trial3)
trial4 %>%
tbl_summary(
by = trt,
statistic = list(all_continuous() ~ "{mean} ({sd})",
all_categorical() ~ "{n} / {N} ({p}%)"),
digits = all_continuous() ~ 2,
label = grade ~ "Tumor Grade"
) %>% add_p()
Thank you!
You can have two options. First, build a custom p-value function to calculate the p-value based off of prop.test(). There is an example of this in the add_p.tbl_summary() help file.
The second option (and easier option) is to download the current development version of the package from GitHub. In this version, the prop.test() option is already built in. Example below!
remotes::install_github("ddsjoberg/gtsummary")
library(gtsummary)
packageVersion("gtsummary")
#> [1] ‘1.3.5.9017’
trial %>%
select(response, death, trt) %>%
tbl_summary(by = trt) %>%
add_p(test = everything() ~ "prop.test") %>%
modify_footnote(p.value ~ "2-sample test for equality of proportions with continuity correction")
You may also want to check out the new function add_difference() that also reports the prop.test() p-value along with differences between groups.
trial %>%
select(trt, response, death) %>%
tbl_summary(by = trt,
statistic = all_dichotomous() ~ "{p}%",
missing = "no") %>%
modify_footnote(all_stat_cols() ~ NA) %>%
add_n() %>%
add_difference(estimate_fun = ~paste0(style_sigfig(. * 100), "%"))

table generation in rmarkdown using gtsummary and flextable

Most collaborators prefer tables in word format. With the advent of rmarkdown,knitr,gtsummary and flextable this is finally coming of age, but I cannot wrap my head around how I can generate the final table below without resorting to manually setting the indentation. I think table I below leaves far too much air between the rows, but I cannot figure out how to set the row spacing tighter programmatically (tried autofit, height, height_all, hrule without obtaining desired output). Instead, I used the compact style in word to generate tbl 2 below. However, then I´d have to manually insert the indentation for the cyl categories. Anyone know how this can be done programmatically?
title: "testing T´s"
output:
word_document:
reference_docx: temp.docx
html_document:
df_print: paged
editor_options:
chunk_output_type: inline
---
Plain
====
```{r results='asis',echo=FALSE,message=FALSE}
library(gtsummary)
library(flextable)
set_gtsummary_theme(theme_gtsummary_jama())
a <- mtcars[1:20,c(1,2,9,4)]
b <- tbl_summary(a,
missing="ifany",
by=am,
type=list(cyl~"categorical"))%>%
bold_labels() %>%
add_p() %>% add_overall()
```
Flextable
====
```{r results='asis',echo=FALSE,message=FALSE}
fl <- gtsummary::as_flextable(b) %>% font(fontname = "Bodoni 72",part = "all") %>% fontsize(size=8,part="all") %>% autofit(add_h = -.5)
fl
```
At the moment, there is no simple way to do this. But I have included a code example that I think does solve your problem.
With {flextable} it's important the order the functions are called. Running as_flextable() then appending additional calls doesn't seem to get you what you want.
The alternative is save the calls, insert the new flextable function calls where needed, then evaluate the calls. That is what is done in the example below.
---
title: "Untitled"
output: word_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = FALSE, message = FALSE)
```
```{r}
library(tidyverse)
library(gtsummary)
library(flextable)
set_gtsummary_theme(theme_gtsummary_jama())
tbl <-
mtcars[1:20, c(1, 2, 9, 4)] %>%
tbl_summary(
missing = "ifany",
by = am,
type = list(cyl ~ "categorical")
) %>%
bold_labels() %>%
add_p() %>%
add_overall()
```
### Default Flextable
```{r}
gtsummary::as_flextable(tbl)
```
### Compact Flextable
```{r}
# this function inserts additional flextable calls, then evaluates the calls
update_flextable_calls <- function(x, call_list, after) {
# saving calls that create the flextable
x_calls <- gtsummary::as_flextable(x, return_calls = TRUE)
# adding new calls at `after=`
after_n <- names(x_calls) %in% after %>% which()
x_calls <- c(
x_calls[1:after_n],
call_list,
x_calls[(after_n + 1):length(x_calls)]
)
# evaluating calls
x_calls %>%
unlist() %>%
purrr::compact() %>%
# concatenating expressions with %>% between each of them
purrr::reduce(function(x, y) rlang::expr(!!x %>% !!y)) %>%
# evaluating expressions
eval()
}
# list of calls that make a table compact
compact_calls <- list(
rlang::expr(font(fontname = "Bodoni 72", part = "all")),
rlang::expr(fontsize(size = 8, part = "all")),
rlang::expr(padding(padding.top = 0, part = "all")),
rlang::expr(padding(padding.bottom = 0, part = "all"))
)
# adding the compact calls, and evaluating them
update_flextable_calls(
x = tbl, # gtsummary table
call_list = compact_calls, # calls that make flextable compact
after = "footnote" # add calls after the "footnote" functions
)
```
This obviously isn't a great permanent solution. We have a theme called theme_gtsummary_compact() that makes the {gt} tables compact with smaller font and reduced padding. We can update the theme to also make flextables more compact! I'd love it if you created an issue on GitHub to update theme_gtsummary_compact() for flexables, and we can collaborate on a solution that works well for you.
https://github.com/ddsjoberg/gtsummary/issues/new/choose