How to add a number from a for cycle to the end of each new table? - tablename

I have a for cycle which makes new table for every i. How can I name those tables ``table_i`?
I have tried to save new tables under the name table_i, table_[i] and also paste("table", i, sep = "_"), but nothing seems to work. This is what I have now.
for (i in 2010:2016) {
df %>%
filter(df$year == i) -> table_i}
I would like to have 7 tables called table_2010, table_2011, .., table_2016.

Related

Is there a function in R to create a new column in a tibble that depends on values from a previous row?

First time poster and quite new to R.
I'm trying to add a new variable to a tibble ("joined") that adds value nrow-1 from column 22 ("NurseID"), if the value of the variable in column 3("AccountID") on nrow matches the one on nrow-1.
I can do it with a sorted loop, but this is a large dataset and it takes a long time to run and I wonder if there is a faster/easier way to do this
arrange (joined, AccountID, date_day, shift)
tie <- "."
for (i in 2:nrow(joined))
{
ifelse (joined[i,3]==joined[i-1,3], temp<-joined[i-1,22], temp<-".")
tie <- c(tie,temp)
}
temptie <- as.numeric(tie)
joined <- as_tibble(cbind(joined,temptie))
Any help / input is much appreciated. Please kindly let me know if you need more information on the tibble

There is a way to add a column with the test statistics in gtsummary::tbl_regression()?

Is there a way to add the Statistic for each variable as a column in a tbl_regresion() with {gtsummary}? In Psychology is not uncommon for reviewers to expect this column in the tables.
When using {sjPlot}, just need to add the show.stat parameter, and the Statistic column will appear showing the t value from summary(model).
library(gtsummary)
library(sjPlot)
model <- lm(mpg ~ cyl + wt, data = mtcars)
tab_model(model, show.stat = TRUE)
With {gtsummary} I can't find anywhere how to add an equivalent column when using tbl_regression(). Would just want to show a column with the values already present in the model.
library(gtsummary)
model <- lm(mpg ~ cyl + wt, data = mtcars)
stats_to_include =c("r.squared", "adj.r.squared", "nobs")
tbl_regression(model, intercept = TRUE, show_single_row = everything()) %>%
bold_p() %>%
add_glance_table(include = all_of(stats_to_include))
Yep! The statistic column is already in the table, but it's hidden by default. You can unhide it (and other columns) with the modify_column_unhide() function. Example below!
library(gtsummary)
packageVersion("gtsummary")
#> [1] '1.5.2'
model <- lm(mpg ~ cyl + wt, data = mtcars)
tbl <-
tbl_regression(model, intercept = TRUE, show_single_row = everything()) %>%
bold_p() %>%
add_glance_table(include = c(r.squared, adj.r.squared, nobs)) %>%
modify_column_unhide(columns = c(statistic, std.error))
Created on 2022-02-07 by the reprex package (v2.0.1)
FYI if you're interested, we also support journal themes in gtsummary. You can, for example, load the JAMA theme and the gtsummary results will be auto-formatted for publication in JAMA. We don't have Psychology theme, but if you file a GitHub Issue, we can collaborate on adding one. We can add things like showing the statistic column by default (and much more).
https://www.danieldsjoberg.com/gtsummary/reference/theme_gtsummary.html

Spark, Scala, Databricks, combine and add columns

Using Spark/Scala to attempt a "simple" query. I have a file which, after line 1 below runs, looks like this
EmpReg,EmpOT,RegPay,OTPay
Alice,Alice,400,20
Bob,Bob,300,0
Carol,Carol,450,120
Dan,Dan,400,200
Ellen,Ellen,360,40
The first and third columns (EmpReg, RegPay) come from one source and the second and third columns (EmpOT, OTPay) come from a second source. My objective is output that looks like this.
Emp,Pay
Alice,420
Bob,300
Carol,570
Dan,600
Ellen,400
Here is the code that I have been trying, at least what I have saved.
var q2 = q.join(q1, q("EmpReg") === q1("EmpOT"), "fullouter")
//q2 = q2.select("EmpReg", ($"RegPay" + $"OTPay"))
//q2 = q2.groupBy($"EmpReg".sum($"RegPay" + $"OTPay"))
var add = q2.select(($"RegPay" + $"OTPay"))
//q2 = q2.sum("RegPay", "OTPay")
//q2 = q2.groupBy("EmpReg", "EmpOT")
//var q2 = q.join(q1).where("EmpReg") === "EmpOT"))
//q2 = q2.select("EmpReg").sum("RegPay", "OTPay")
//q2.show
add.show
[q] is the first file which represents regular pay. [q1] is the second file which represents overtime pay. [q2] is the combination shown in the first example above. Primary keys are [EmpReg] and [EmpOT]. don't really need to combine [EmpReg] and [EmpOT] since they are the same, and it doesn't make any difference which I use.
I really need to add [RegPay] and [OTPay] to get [Pay], but for the life of me I can't get it to work. The lines commented out return various errors. I can add the two pay columns, and select an appropriate employee column, but can't seem to do it in one query. I am constrained to use Scala on Databricks. Othewise, I might do something like this.
select q.EmpReg as Emp, (q.RegPay + q1.OTPay) as Pay
from q join q1 on q.EmpReg = q1.EmpOT
(Why can't things ever be simple?)
You can use a similar approach as in your SQL query:
val q2 = q.join(q1, q("EmpReg") === q1("EmpOT"), "fullouter")
val add = q2.select(q("EmpReg").as("Emp"), (q("RegPay") + q1("OTPay")).as("Pay"))
Your code has this line
q2.select("EmpReg", ($"RegPay" + $"OTPay"))
which should work if you add $ before "EmpReg". You can't have both strings and columns in the select statement. This works in Python but not Scala.

Merging tbl_svysummary and stacked tbl_regression tables with different variable names but same labels

Follow up question to (Renaming Rows in gtsummary, tbl_regression/tbl_stack):
I am now trying to merge the renamed, stacked table (Table 1) with a tbl_summary table that includes the prevalence for each of the outcomes (Table 2). However, because each renamed line of Table 1 is, in reality, just the same variable repeated over and over again, it doesn't merge with Table 2, instead creating a (Table 3) that has duplicated outcome names stacked onto one another. Any way to merge these tables so that the lines of Table 1 match seamlessly with those from Table 2?
UPDATE:
As of gtsummary v 1.4.0, tbl_uvregression() now accepts survey objects.
library(gtsummary)
packageVersion("gtsummary")
#> [1] '1.4.0'
# convert trial data frame to survey object
tbl <-
survey::svydesign(
data = trial[c("response", "death", "age", "marker")],
ids = ~1,
weights = ~1
) %>%
# build univariate regression models
tbl_uvregression(
x = age,
method = survey::svyglm,
method.args = list(family = binomial),
exponentiate = TRUE,
formula = "{y} ~ {x} + marker",
label = list(response = "Response", death = "Death"),
hide_n = TRUE,
include = -marker
) %>%
add_n() %>%
add_nevent() %>%
modify_header(
label = "**Outcome**",
estimate = "**Age OR**"
)
Created on 2021-04-14 by the reprex package (v2.0.0)

Consolidating a data table in Scala

I am working on a small data analysis tool, and practicing/learning Scala in the process. However I got stuck at a small problem.
Assume data of type:
X Gr1 x_11 ... x_1n
X Gr2 x_21 ... x_2n
..
X GrK x_k1 ... x_kn
Y Gr1 y_11 ... y_1n
Y Gr3 y_31 ... y_3n
..
Y Gr(K-1) ...
Here I have entries (X,Y...) that may or may not exist in up to K groups, with a series of values for each group. What I want to do is pretty simple (in theory), I would like to consolidate the rows that belong to the same "entity" in different groups. so instead of multiple lines that start with X, I want to have one row with all values from x_11 to x_kn in columns.
What makes things complicated however is that not all entities exist in all groups. So wherever there's "missing data" I would like to pad with for instance zeroes, or some string that denotes a missing value. So if I have (X,Y,Z) in up to 3 groups, the type I table I want to have is as follows:
X x_11 x_12 x_21 x_22 x_31 x_32
Y y_11 y_12 N/A N/A y_31 y_32
Z N/A N/A z_21 z_22 N/A N/A
I have been stuck trying to figure this out, is there a smart way to use List functions to solve this?
I wrote this simple loop:
for {
(id, hitlist) <- hits.groupBy(_.acc)
h <- hitlist
} println(id + "\t" + h.sampleId + "\t" + h.ratios.mkString("\t"))
to able to generate the tables that look like the example above. Note that, my original data is of a different format and layout,but that has little to do with the problem at hand, thus I have skipped all steps regarding parsing. I should be able to use groupBy in a better way that actually solves this for me, but I can't seem to get there.
Then I modified my loop mapping the hits to ratios and appending them to one another:
for ((id, hitlist) <- hits.groupBy(_.acc)){
val l = hitlist.map(_.ratios).foldRight(List[Double]()){
(l1: List[Double], l2: List[Double]) => l1 ::: l2
}
println(id + "\t" + l.mkString("\t"))
//println(id + "\t" + h.sampleId + "\t" + h.ratios.mkString("\t"))
}
That gets me one step closer but still no cigar! Instead of a fully padded "matrix" I get a jagged table. Taking the example above:
X x_11 x_12 x_21 x_22 x_31 x_32
Y y_11 y_12 y_31 y_32
Z z_21 z_22
Any ideas as to how I can pad the table so that values from respective groups are aligned with one another? I should be able to use _.sampleId, which holds the "group membersip" for each "hit", but I am not sure how exactly. ´hits´ is a List of type Hit which is practically a wrapper for each row, giving convenience methods for getting individual values, so essentially a tuple which have "named indices" (such as .acc, .sampleId..)
(I would like to solve this problem without hardcoding the number of groups, as it might change from case to case)
Thanks!
This is a bit of a contrived example, but I think you can see where this is going:
case class Hit(acc:String, subAcc:String, value:Int)
val hits = List(Hit("X", "x_11", 1), Hit("X", "x_21", 2), Hit("X", "x_31", 3))
val kMax = 4
val nMax = 2
for {
(id, hitlist) <- hits.groupBy(_.acc)
k <- 1 to kMax
n <- 1 to nMax
} yield {
val subId = "x_%s%s".format(k, n)
val row = hitlist.find(h => h.subAcc == subId).getOrElse(Hit(id, subId, 0))
println(row)
}
//Prints
Hit(X,x_11,1)
Hit(X,x_12,0)
Hit(X,x_21,2)
Hit(X,x_22,0)
Hit(X,x_31,3)
Hit(X,x_32,0)
Hit(X,x_41,0)
Hit(X,x_42,0)
If you provide more information on your hits lists then we could probably come with something a little more accurate.
I have managed to solve this problem with the following code, I am putting it here as an answer in case someone else runs into a similar problem and requires some help. The use of find() from Noah's answer was definitely very useful, so do give him a +1 in case this code snippet helps you out.
val samples = hits.groupBy(_.sampleId).keys.toList.sorted
for ((id, hitlist) <- hits.groupBy(_.acc)) {
val ratios =
for (sample <- samples)
yield hitlist.find(h => h.sampleId == sample).map(_.ratios)
.getOrElse(List(Double.NaN, Double.NaN, Double.NaN, Double.NaN, Double.NaN, Double.NaN))
println(id + "\t" + ratios.flatten.mkString("\t"))
}
I figure it's not a very elegant or efficient solution, as I have two calls to groupBy and I would be interested to see better solutions to this problem.