How can I ignore missing values in multiple t-tests? - t-test

I have been using this code for a while to do t-tests on multiple (300+) variables simultaneously. It has always worked in the past but for one particular dataset I am getting an error saying that there are not enough observations. I think it is because there are a lot of NA values for some of the variables. Is there a way to adjust the code so that it just ignores these variables and continues to perform t-tests for the remaining variables?
This is the code I have been using:
# Put all variables in the same column except the grouping variable
Bio.test.long <- Bio.test %>%
pivot_longer(-Bacterial.v.viral, names_to = "variables", values_to = "value")
Bio.test.long %>% sample_n(6)
t.test.bio <- Bio.test.long %>%
group_by(variables) %>%
t_test(value ~ Bacterial.v.viral) %>%
adjust_pvalue(method = "none") %>%
add_significance()
t.test.bio
Error in mutate_cols():
! Problem with mutate() column data.
i data = map(.data$data, .f, ...).
x not enough 'y' observations
Caused by error in t.test.default():
! not enough 'y' observations
Run rlang::last_error() to see where the error occurred
I think it is because there are a lot of NA values for some of the variables. Is there a way to adjust the code so that it just ignores these variables and continues to perform t-tests for the remaining variables?

Related

Using a special character for figure legend

I need to correctly spell an Indigenous name on a figure I am developing in R.
To start, the Geography was "Nisga'a Lands". Ultimately I want it to read "Nisg̱̱a'a Lands". So, the g becomes a g with a dash below (g̱̱).
I tried simply copying and pasting this and mutating the data frame, as well as playing with the encoding:
all_income_data = as.data.frame(all_income_data) %>%
mutate(Geography = stri_enc_toutf8(Geography)) %>%
mutate(Geography = ifelse(Geography == "Nisga'a Lands", "Nisg̱̱a'a Lands", Geography))
I unfortunately was only able to produce this result:
Is it possible to get the g the way I need it? Thanks so much in advance for any help

Arranging dates on 2 CSV files

I am trying to arrange the date of the column published, so I can perform text minning on specific months.
The 2 CSV files I am working with can be found on Kaggle. Kaggle link
I have tried to make the date arrangeable by the following code:
Guardians_Russia_Ukraine <- read_csv("news_conflict/Guardians_Russia_Ukraine.csv",
col_types = cols(published = col_character()))
Guardians_Russia_Ukraine %>% mutate(published1 = parse_date(published))
published1 gives the result NA and is therefore not usable.

Using tbl_regression with imputed data/pooled regression models

I've had great success using the gtsummary::tbl_regression function to display regression model results. I can't see how to use tbl_regression with pooled regression models from imputed data sets, however, and I'd really like to.
I don't have a reproducible example handy, I just wanted to see if anyone else has found a way to work with, say, mids objects created by the mice package in tbl_regression.
In the current development version of gtsummary, it's possible to summarize models estimated on imputed data from the mice package. Here's an example
# install dev version of gtsummary
remotes::install_github("ddsjoberg/gtsummary")
library(gtsummary)
packageVersion("gtsummary")
#> [1] ‘1.3.5.9012’
# impute the data
df_imputed <- mice::mice(trial, m = 2)
# build the model
imputed_model <- with(df_imputed, lm(age ~ marker + grade))
# present beautiful table with gtsummary
tbl_regression(imputed_model)
#> pool_and_tidy_mice: Tidying mice model with
#> `mice::pool(x) %>% mice::tidy(exponentiate = FALSE, conf.int = TRUE, conf.level = 0.95)`
Created on 2020-12-16 by the reprex package (v0.3.0)
It's important to note that you pass the mice model object to tbl_regression() BEFORE you pool the results. The tbl_regression() function needs access to the individual models in order to correctly identify the reference row and variable labels (among other things). Internally, the tidying function used on the mice model will first pool the results, then tidy the results. The code used for this process is printed to the console for transparency (as seen in the example above).

Is it possible to use tbl_regression fonction with lmer fonction with random effect?

I work on antifungal activity of some molecules ("cyclo") added with fungicides and I want to assess impact of these cyclos and their concentration ratio. CMI is a quantitative variable and all other variables are factors.
I have this script:
mod=lmer(CMI ~ cyclo*ratio + (1|fungicide) + (1|strains), data)
And I'd like to know if I can use tbl_regression() (library(gtsummary)) with my lmer()?
If yes, what do I have to specify for exponentiate term ?
If I write exponentiate=FALSE I obtain the same values than the estimates in the classical summary(mod).
Thank you for your help
Steffi
The default behavior for tbl_regression() for a mixed-effects models is to print the fixed-effects only. To see the full output, including the random components, you need to override the default function for tidying up the model results using the tidy_fun= argument.
library(gtsummary)
lme4::lmer(age ~ marker + (1|grade), trial) %>%
tbl_regression(
# set the tidying function to broom.mixed::tidy to show random effects
tidy_fun = broom.mixed::tidy,
)
You can use the label= argument to update the label displayed for the random components if you wish.
The default is exponentiate = FALSE, so you don't need specify in the tbl_regression() call.
For more details on the tidy_fun= argument, you can review this help file: http://www.danieldsjoberg.com/gtsummary/reference/vetted_models.html
Hope this helps! Happy Coding!

Calculations in table based on variable names in matlab

I am trying to find a better solution to calculation using data stored in table. I have a large table with many variables (100+) from which I select smaller sub-table with only two observations and their difference for smaller selection of variables. Thus, the resulting table looks for example similarly to this:
air bbs bri
_________ ________ _________
test1 12.451 0.549 3.6987
test2 10.2 0.47 3.99
diff 2.251 0.078999 -0.29132
Now, I need to multiply the ‘diff’ row with various coefficients that differ between variables. I can get the same result with the following code:
T(4,:) = array2table([T.air(3)*0.2*0.25, T.bbs(3)*0.1*0.25, T.bri(3)*0.7*0.6/2]);
However, I need more flexible solution since the selection of variables will differ between applications. I was thinking that better solution might be using either varfun or rowfun and speficic function that would assign correct coefficients/equations based on variable names:
T(4,:) = varfun(#func, T(3,:), 'InputVariables', {'air' 'bbs' 'bri'});
or
T(4,:) = rowfun(#func, T(3,:), 'OutputVariableNames', T.Properties.VariableNames);
However, the current solution I have is similarly inflexible as the basic calculation above:
function [air_out, bbs_out, bri_out] = func(air, bbs, bri)
air_out = air*0.2*0.25;
bbs_out = bbs*0.1*0.25;
bri_out = bri*0.7*0.6/2;
since I need to define every input/output variable. What I need is to assign in the function coefficients/equations for every variable and the ability of the function to apply it only to the variables that are present in the specific sub-table.
Any suggestions?