Why I obtain different sample size using tbl_summary and tbl_svysummary - gtsummary

I have a dataset of 2355 observations, when I obtained summary statistics using tbl_summary, the number of samples remained the same (2355). But when I wanted to account for survey design, I used tbl_svysummary, I got a very small sample size in the output.
Here are the codes and the output attached
Table summary when not accounting for survey design
tbl_summary(d_adjust, by = zero,
digits = all_categorical()~0,
statistic = list(all_continuous() ~ "{mean} ({sd})",
all_categorical()~"{n} ({p}%)"),
include = c(area, windex5, hhsex, madolescent, welevel,
marital, ceb, tetanus,anc, delivery, csex,
fever, cough, treatfc, diarrhea, treatd, fradio, ftv)) %>%
add_p()%>%
add_overall()
Table summary when accounting for survey design
tbl_svysummary(madagascar, by = zero,
digits = all_categorical()~0,
statistic = list(all_continuous() ~ "{mean} ({sd})",
all_categorical()~"{n} ({p}%)"),
include = c(area, windex5, hhsex, madolescent, welevel,
marital, ceb, tetanus,anc, delivery, csex,
fever, cough, treatfc, diarrhea, treatd, fradio, ftv)) %>%
add_p()%>%
add_overall()
In another instance, I used another dataset and obtained summary statistics using tbl_svysummary, the initial number of observation is 6084, but the output showed 13,456 number of samples (More than times 2 of the original sample size).
I really need to understand why it is so.
Thank you and I look forward to hearing your opinions.

Related

Gtsummary table values less than -1 outputting as 1.00?

Since installing the newest version of R, all my gtsummary table values less than -1 have been outputting to 1.00. Does anyone have insight on how to fix this very weird issue?
Here is example code:
library(tidyverse)
library(gtsummary)
library(haven)
library(mice)
library(googlesheets4)
data <- read_sheet("https://docs.google.com/spreadsheets/d/1yyw-0xseZSLjD4jc8sw7IksN-S0M3vcKWHy4ksMPL4c/edit?usp=sharing")
datami <- mice(data, m = 23, seed=10)
datareg <- with(datami,
lm(SUD ~ NUM + MIND +
AGE + SEX + CRAVE) )
table <- tbl_regression(datareg,
estimate_fun = purrr::partial(style_ratio, digits = 2),
pvalue_fun = ~style_pvalue(.x, digits = 2),
add_estimate_to_reference_rows = TRUE
) %>% modify_header(label="**Predictor**",estimate="**Unstandardized Coefficient**") %>%
modify_footnote(update = c(p.value, ci, estimate) ~ "Reference group")%>%
modify_caption("Table: Multiple Imputation Predicting Variable")
table
Have re-installed R & gtsummary multiple times to no avail.
You're using style_ratio() to round/format the estimates: this is meant to round odds ratios, risk ratios, etc., which are all positive numbers. Update this to use style_number().
I should update the ratio function to have better behavior when negative values are passed.

There is a way to add a column with the test statistics in gtsummary::tbl_regression()?

Is there a way to add the Statistic for each variable as a column in a tbl_regresion() with {gtsummary}? In Psychology is not uncommon for reviewers to expect this column in the tables.
When using {sjPlot}, just need to add the show.stat parameter, and the Statistic column will appear showing the t value from summary(model).
library(gtsummary)
library(sjPlot)
model <- lm(mpg ~ cyl + wt, data = mtcars)
tab_model(model, show.stat = TRUE)
With {gtsummary} I can't find anywhere how to add an equivalent column when using tbl_regression(). Would just want to show a column with the values already present in the model.
library(gtsummary)
model <- lm(mpg ~ cyl + wt, data = mtcars)
stats_to_include =c("r.squared", "adj.r.squared", "nobs")
tbl_regression(model, intercept = TRUE, show_single_row = everything()) %>%
bold_p() %>%
add_glance_table(include = all_of(stats_to_include))
Yep! The statistic column is already in the table, but it's hidden by default. You can unhide it (and other columns) with the modify_column_unhide() function. Example below!
library(gtsummary)
packageVersion("gtsummary")
#> [1] '1.5.2'
model <- lm(mpg ~ cyl + wt, data = mtcars)
tbl <-
tbl_regression(model, intercept = TRUE, show_single_row = everything()) %>%
bold_p() %>%
add_glance_table(include = c(r.squared, adj.r.squared, nobs)) %>%
modify_column_unhide(columns = c(statistic, std.error))
Created on 2022-02-07 by the reprex package (v2.0.1)
FYI if you're interested, we also support journal themes in gtsummary. You can, for example, load the JAMA theme and the gtsummary results will be auto-formatted for publication in JAMA. We don't have Psychology theme, but if you file a GitHub Issue, we can collaborate on adding one. We can add things like showing the statistic column by default (and much more).
https://www.danieldsjoberg.com/gtsummary/reference/theme_gtsummary.html

Rstudio gtsummary table with three categorical variables

I'm trying to figure out how to use the gtsummary package for my dataset.
I have three categorical values and two of those set as strata. I'm not interested in the frequency of each sample but want the numeric value in the table.
Currently I'm using this simple code (x, y, z, are my categorical values, whereas SOC is the numerical values. Y and Z should go in the headline (strata).
Data %>%
select(x, y, z , SOC) %>%
tbl_strata(strata=z,
.tbl_fun =
~ .x %>%
tbl_summary(by = y , missing = "no"),
statistic = list(all_continuous()~ "{mean} ({sd})" ))%>%
modify_caption("**Soil organic carbon [%]**")%>%
bold_labels()
Edit
Let's take the trial dataset as an example:
trial %>%
select(trt, grade, stage, age, marker) %>%
tbl_strata(strata=stage,
~tbl_summary(.x, by = grade , missing = "no"),
statistic = list(all_continuous()~ "{mean} ({sd})"
))%>%
bold_labels()
What I'm looking for is a table like this, but without the frequency showing of each treatment (Drug A, B). I only want the age and marker to show up in my table but organized by treatment. I'd like to have the first section showing only the age and marker for the group that received Drug A. Then a section showing the same for Drug B.
Edit 2
Your input is exactly what I am looking for. With the trial dataset it works perfectly fine. However, ones I put in my data, the numeric values are all in one column instead of in rows. I also still get the frequencies and I can't figure out why. I use exactly the same code and the same amount of variables and my table looks somewhat like this:
I think nesting calls to tbl_strata() (one merging and the other stacking) will get you what you're after. Example below!
library(gtsummary)
packageVersion("gtsummary")
#> [1] '1.5.2'
tbl <-
trial %>%
select(trt, grade, stage, age, marker) %>%
tbl_strata(
strata = trt,
function(data) {
data %>%
tbl_strata(
strata = stage,
~ tbl_summary(
.x,
by = grade,
statistic = all_continuous() ~ "{mean} ({sd})",
missing = "no"
) %>%
modify_header(all_stat_cols() ~ "**{level}**")
)
},
.combine_with = "tbl_stack",
.combine_args = list(group_header = c("Drug A", "Drug B"))
) %>%
bold_labels()
Created on 2022-03-04 by the reprex package (v2.0.1)

Using mutate to swap columns renders a negative Difference output

I have used the gtsummary package (great package btw) since last month on my reports.
Now I am building a cohort table that will show pre-test value, post-test value, difference (p.p) and a t-test p-value.
I'm trying to build the same table as I have built it under Arsenal with pre-test being the first column and post-test being in the second column and so on, but the difference column shows a negative output when it isn't supposed to be.
I used mutate() to swap both columns, as when I don't use it it shows the post-test as the first column. I also tried swapping the post-test columns at first rows in the dataset itself as what I read in some posts. But to no avail.
homesurvey %>%
select(period, CB2.Textbooks, CB2.Magazines, CB2.Newspapers, CB2.Religious_books, CB2.Coloring_books, CB2.Comics) %>%
mutate(period = forcats::fct_rev(period)) %>%
tbl_summary(by = period,
statistic = all_continuous() ~ "{n} ({sd})",
label = list (CB2.Textbooks ~ "Textbooks",
CB2.Magazines ~ "Magazines",
CB2.Newspapers ~ "Newspapers",
CB2.Religious_books ~ "Religious books",
CB2.Coloring_books ~ "Coloring books",
CB2.Comics ~ "Comics")
)%>%
add_difference() %>%
modify_column_hide(ci)
It shows a negative difference even if it isn't supposed to be.
Output
I am looking at your example output (thanks for including it). The first row is showing 82% in pre-assessment and 96% in the post-assessment. 82 - 96 = -15%, so the difference should indeed be negative.
You can, however, flip the estimate by multiplying it by -1. Example below!
library(gtsummary)
packageVersion("gtsummary")
#> [1] '1.5.0'
tbl <-
trial %>%
select(response, death, trt) %>%
tbl_summary(by = trt, missing = "no") %>%
add_difference() %>%
modify_column_hide(ci) %>%
# you can flip the difference estimate by multiplying it by -1
modify_table_body(
~.x %>%
dplyr::mutate(estimate = -1 * estimate)
)
Created on 2021-11-10 by the reprex package (v2.0.1)

How to use caffe to classify text?

I'm using the Rotten Tomatoes dataset to train my net. It's divided in two groups, positive and negative examples. How can I configure my cnn in caffe to predict if a given text is a positive or a negative example?
I already formatted the data, each sentence has a size of 56 words. But using the following config does not give me even a satisfactory result.
n = caffe.NetSpec()
n.data, n.label = L.Data(batch_size=batch_size, backend=P.Data.LMDB,
source=db_path,
transform_param=dict(scale= 1 / mean),
ntop=2)
n.conv1 = L.Convolution(n.data, kernel_size=3, pad=1,
param=dict(lr_mult=1), num_output=10,
weight_filler=dict(type='xavier'))
n.pool1 = L.Pooling(n.conv1, kernel_size=n_classes,
stride=2, pool=P.Pooling.MAX)
n.ip1 = L.InnerProduct(n.pool1, num_output=100,
weight_filler=dict(type='xavier'))
n.relu1 = L.ReLU(n.ip1, in_place=True)
n.ip2 = L.InnerProduct(n.relu1, num_output=n_classes,
weight_filler=dict(type='xavier'))
n.loss = L.SoftmaxWithLoss(n.ip2, n.label)
My dataset is divided in two text files. One containing the positives examples and other containing negatives examples. Polarity dataset v1.1. To organize my data I get the length of the biggest sentence (59 words) so if a sentence is smaller than 59 words I add some text to it. I adapted from this code. For example, lets pretend that the biggest sentence has 3 words:
data = 'abc def ghijkl. mnopqrst uvwxyz. abcd.'
##
#In this data I have 3 sentences:
##
sentence_one = 'abc def ghijkl
sentence_two = 'mnopqrst uvwxyz'
sentence_three = 'abcd'
The sentence_one is the biggest (3 words), so to format the others two sentence I did the following:
sentence_two = 'mnopqrst uvwxyz <PAD>'
sentence_three = 'abcd <PAD> <PAD>'
Saved each positive and negative sentence to a caffe datum and saved in lmdb:
datum = caffe.proto.caffe_pb2.Datum()
datum.channels = 1
datum.height = 59 #biggest sentence
datum.width = 1
datum.label = label # 0 or 1
datum.data = sentence.tobytes()
Using my datum database and the above caffe's configuration I get a poor accuracy (less than 3 percent). What am I doing wrong?