Group_by returns just one row while aggregate returns the expected outcome - group-by

I am currently stuck at the post-processing of some EddyData. Following an example (https://github.com/bgctw/REddyProc/blob/master/vignettes/aggUncertainty.md) I came up with an unexpected outcome of group_by which is reproducible but I don't understand why.
Group_by returns just one row while aggregate gives the expected outcome.
Here is a minimal example:
library(tidyverse)
#create example data frame
date.time <- seq(from=as.POSIXct("2015-01-01 00:30:00"), to=as.POSIXct("2015-01-03 00:30:00"),by="30 mins")
nee <- runif(length(date.time),-200,200)
df <- data.frame(date.time, nee)
#calculate day of the year
df <- df %>% mutate(
date.time = df$date.time
, DoY = as.POSIXlt(date.time - 15*60)$yday # midnight belongs to the previous
)
#trying to summarise nee for each day
aggDay <- df %>% group_by(DoY) %>% summarise(nee=sum(nee))
aggDay
nee
1 322.1195
aggDay just returns one row while aggregate would work in this case
aggregate(df$nee, by=list(df$DoY), sum)
Group.1 x
1 0 -25.15698
2 1 448.13960
3 2 -100.86310
Unfortunately, the original code involves some further calculations which is the reason why I'd like to stay with group_by.
#original code, not reproducible here
aggDay <- df %>% group_by(DoY) %>%
summarise(
DateTime = first(DateTime)
, nRec = sum( NEE_uStar_fqc == 0, na.rm = TRUE)
, nEff = computeEffectiveNumObs(
resid, effAcf = !!autoCorr, na.rm = TRUE)
, NEE = mean(NEE_uStar_f, na.rm = TRUE)
, sdNEE = if (nEff <= 1) NA_real_ else sqrt(
mean(NEE_uStar_fsd^2, na.rm = TRUE) / (nEff - 1))
, sdNEEuncorr = if (nRec == 0) NA_real_ else sqrt(
mean(NEE_uStar_fsd^2, na.rm = TRUE) / (nRec - 1))
)

I restarted RStudio and now it works. Don't ask me. There must have been a problem with another loaded package.

Related

summary row with gtsummary

I am trying to create a table of events with gtsummary and I would like to obtain a final row counting the events of the previous rows. add_overall() and add_n() do add the total but in a column, counting the same event across groups but not the overall events.
I created this example.
x1 <- sample(c("No", "Yes"), 30, replace = TRUE, prob = c(0.85, 0.15))
x2 <- sample(c("No", "Yes"), 30, replace = TRUE, prob = c(0.9, 0.1))
x3 <- sample(c("No", "Yes"), 30, replace = TRUE, prob = c(0.75, 0.25))
y <- sample(c("A", "B"), 30, replace = TRUE, prob = c(0.5, 0.5))
df <- data.frame(as_factor(x1), as_factor(x2), as_factor(x3), as_factor(y))
colnames(df) <-c("event_1", "event_2", "event_3", "group")
tbl_summary(df, by=group, statistic = all_categorical() ~ "{n}")
example
I tried using summary_rows() function from gt package after converting the table to a gt object but there is an error when summarising because these variables are factors.
Any other ideas?
You can do this by adding a new variable to your data frame that is the row sum of each of the events. Then you can display that variable's sum in the summary table. Example below!
library(gtsummary)
#> #Uighur
library(tidyverse)
df <-
data.frame(
event_1 = sample(c(FALSE, TRUE), 30, replace = TRUE, prob = c(0.85, 0.15)),
event_2 = sample(c(FALSE, TRUE), 30, replace = TRUE, prob = c(0.9, 0.1)),
event_3 = sample(c(FALSE, TRUE), 30, replace = TRUE, prob = c(0.75, 0.25)),
group = sample(c("A", "B"), 30, replace = TRUE, prob = c(0.5, 0.5))
) |>
rowwise() |>
mutate(Total = sum(event_1, event_2, event_3))
tbl_summary(
df,
by = group,
type = Total ~ "continuous",
statistic =
list(all_categorical() ~ "{n}",
all_continuous() ~ "{sum}")
) |>
as_kable() # convert to kable to display on stack overflow
Characteristic
A, N = 16
B, N = 14
event_1
4
4
event_2
1
2
event_3
7
6
Total
12
12
Created on 2023-01-12 with reprex v2.0.2
Thank you so much (great package gtsummary). That works! I had some trouble summing over factors. If variables are factors the code
mutate(Total = sum(event_1=="Yes", event_2=="Yes", event_3=="Yes"))
does it.

Is there a way to add percentage to tbl_regression add_nevent?

I've just discovered that add_nevent in gtsummary can have the option location = "level". I am rapt! But I would like it to have a percentage as well. I've tried adding statistic = "{n}({p}%)" but nothing changes.
Here is my code:
tbl_regression(glm(rellife ~ age + gender, data = df, family = "binomial"), exponentiate = TRUE) %>%
add_nevent(location = "level", statistic = "{n}/{N}%") %>% # add number of events of the outcome
add_n(location = "level")
And the table:
I would like to have 1601 (93.6%) in the column Event N for Age and so on.
Any help would be appreciated.
Thanks
After adding the N and N event, you can use the modify_table_body() function to calculate the event rate. Example below!
library(gtsummary)
#> #BlackLivesMatter
packageVersion("gtsummary")
#> [1] '1.5.2'
tbl <-
glm(response ~ age + grade, trial, family = binomial) %>%
tbl_regression(exponentiate = TRUE) %>%
add_nevent(location = "level") %>%
add_n(location = "level") %>%
# adding event rate
modify_table_body(
~ .x %>%
dplyr::mutate(
stat_nevent_rate =
ifelse(
!is.na(stat_nevent),
paste0(style_sigfig(stat_nevent / stat_n, scale = 100), "%"),
NA
),
.after = stat_nevent
)
) %>%
# merge the colums into a single column
modify_cols_merge(
pattern = "{stat_nevent} / {stat_n} ({stat_nevent_rate})",
rows = !is.na(stat_nevent)
) %>%
# update header to event rate
modify_header(stat_nevent = "**Event Rate**")
Created on 2022-03-21 by the reprex package (v2.0.1)

Interaction terms in tbl_regression in R

How do I include coefficient of interaction between age and stage
glm(response~age+grade, family=binomial(link=logit),
data=trial) %>%
tbl_regression(
exponentiate = TRUE,
pvalue_fun = ~style_pvalue(.x, digits = 2)
)
The tbl_regression() functions provides a summary of the model results. To include an interaction in the summary table, the interaction must first be added to the model. Example below.
library(gtsummary)
packageVersion("gtsummary")
#> [1] '1.5.2'
tbl <-
glm(response ~ age * grade, family = binomial, data=trial) %>%
tbl_regression(
exponentiate = TRUE,
pvalue_fun = ~style_pvalue(.x, digits = 2)
)
Created on 2022-02-03 by the reprex package (v2.0.1)

glue cannot interpolate functions into strings. * object 'n' is a function

I'm loving gtsummary. I write a lot of reports and love my pretty tables!
I've run into a problem when I updated.
I just updated to using all_stat_cols(TRUE) instead of stat_by.
I'm getting an error with I try to include {level} or {n} when I add overall.
It works if stat_0 isn't included, so I tried with with all_stat_cols(FALSE) and a separate stat_0 ~ "Total n = {n}", but get the error "Error: glue cannot interpolate functions into strings. * object 'n' is a function."
This works:
nr %>%
select(gender, year) %>%
tbl_summary (by = gender, missing = "no") %>%
bold_labels() %>%
add_overall(last=TRUE) %>%
modify_footnote(update = everything() ~ NA) %>%
modify_header(update = list(label ~ "",
all_stat_cols(FALSE) ~ "{level}\n n = {n}\n"))
But I also want my overall to be changed "Total n = 17" like this:
nr %>%
select(gender, year) %>%
tbl_summary (by = gender, missing = "no") %>%
bold_labels() %>%
add_overall(last=TRUE) %>%
modify_footnote(update = everything() ~ NA) %>%
modify_header(update = list(label ~ "",
all_stat_cols(FALSE) ~ "{level}\n n = {n}\n",
stat_0 ~ "Total\n = {n}"))
But get this error:
Error: glue cannot interpolate functions into strings. * object 'n' is a function.
I also want to remove the first row (Year level) if anyone knows how to do that too!
Any help or ideas would be very much appreciated.
hello! The reason you're getting an error is because little n represents the in the by= group, and big N is the overall number of obs. When you try to use little n in the header for the overall column you get the error because it's not defined.
Change the little n to big N, and you should be all set! Example below!
library(gtsummary)
#> #BlackLivesMatter
packageVersion("gtsummary")
#> [1] '1.4.1'
tbl <-
trial %>%
select(trt, age, grade) %>%
tbl_summary(by = trt, missing = "no") %>%
add_overall() %>%
modify_header(
update = list(label ~ "",
all_stat_cols(FALSE) ~ "{level}\n n = {n}\n",
stat_0 ~ "Total\n = {N}")
)
Created on 2021-06-01 by the reprex package (v2.0.0)

How do you remove the row_labels text in an expss table?

I like to pipe my expss tables into kable to get access to some additional formatting options. That sometimes requires some tweaking, and I'm looking for a tweak here to get rid of the row_labels text in the first column of the header in the example below.
Simple reprex:
df <- data.frame(x=rbinom(100,1,0.5), y=rnorm(100,1,0.6),
z=rnorm(100,1,0.2), grp = rep(1:5,20))
var_lab(df$grp) = ""
df %>%
tab_cells(x,y,z) %>%
tab_cols(grp) %>%
tab_stat_mean (label = "") %>%
tab_pivot %>%
kable(caption= "Title",
digits = c(0,rep(3,5))) %>%
kable_styling(full_width=F, position="center",
bootstrap_options = c("striped"))%>%
add_header_above(c("", "Group" = 5))
Generates this:
Thanks!
It's better to use 'htmlTable' or 'huxtable' for output expss tables. It is because they are both support complex multilevel and multinested headers.
However, if you want to use 'kable' you can set first column name to empty string just after 'tab_pivot':
library(expss)
library(knitr)
library(kableExtra)
# function which remove first column name
remove_first_name = function(x){
setNames(x, c("", names(x)[-1]))
}
df <- data.frame(x=rbinom(100,1,0.5), y=rnorm(100,1,0.6),
z=rnorm(100,1,0.2), grp = rep(1:5,20))
var_lab(df$grp) = ""
df %>%
tab_cells(x,y,z) %>%
tab_cols(grp) %>%
tab_stat_mean (label = "") %>%
tab_pivot %>%
remove_first_name %>% # remove 'row_labels'
kable(caption= "Title",
digits = c(0,rep(3,5))) %>%
kable_styling(full_width=F, position="center",
bootstrap_options = c("striped"))%>%
add_header_above(c("", "Group" = 5))