Related
How can I use tbl_strata and get the output to show counts by distinct individual rather than rows?
Also, how can I change the order that displays for the variable I am putting in the by= section in tbl_summary?
I have a long table AND a wide table with one row per patient. Not sure how to apply the wide table to this code. I can apply the long table but getting row counts instead of distinct patient counts.
I have included an example of the Long table I have and the wide table and what I would like the output to look like in the picture.
Example code:
#Wide Table Example
df_Wide <- data.frame(patientICN =c(1, 2, 3, 4, 5)
,testtype =c("liquid", "tissue", "tissue", "liquid", "liquid")
,gene1 =c("unk", "pos", "neg", "neg", "unk")
,gene2 =c("pos", "neg", "pos", "unk", "neg")
,gene3 =c("neg", "unk", "unk", "pos", "pos"))
#Long Table Example
df_Long <- data.frame(patientICN =c(1, 1, 2, 2, 3)
,testtype =c("liquid", "tissue", "tissue", "liquid", "liquid")
,gene =c("Gene1", "Gene2", "Gene3", "Gene1", "Gene2")
,result=c("Positve", "Negative", "Unknown","Positive","Unknown"))
#Table Categorized by testtype and result for long table
df_Long %>%
select (result, gene, testtype)%>%
mutate(testcategory=paste("TestType",testtype))%>%
tbl_strata(
strata=testtype,
.tbl_fun =
~.x %>%
tbl_summary(by=result,missing="no")%>%
add_n(),
.header= "**{strata}**, N={n}"
)
##above is giving multiple Rows per patient counts
Is this what you're after? You can install the bstfun pkg from my R-universe: https://ddsjoberg.r-universe.dev/ui#packages
library(gtsummary)
library(dplyr)
#Long Table Example
df_Long <- data.frame(patientICN =c(1, 1, 2, 2, 3)
,testtype =c("liquid", "tissue", "tissue", "liquid", "liquid")
,gene =c("Gene1", "Gene2", "Gene3", "Gene1", "Gene2")
,result=c("Positive", "Negative", "Unknown","Positive","Unknown"))
tbl <-
df_Long %>%
tidyr::pivot_wider(
id_cols = c(patientICN, testtype),
names_from = gene,
values_from = result,
values_fill = "Unknown"
) %>%
mutate(across(starts_with('Gene'), ~factor(.x, levels = c("Positive", "Negative", "Unknown")))) %>%
tbl_strata(
strata = testtype,
~ .x %>%
bstfun::tbl_likert(
include = starts_with("Gene")
)
)
Created on 2022-10-06 with reprex v2.0.2
I am a beginner in r and trying to create some tables using this great gtsummary package. My question is that is it possible to use add_difference to add two separate difference statistics at the same time (or to combine these somehow)? I am able to create a perfect table with p-values or effect size, but not with both. Also, is it possible to use bonferroni adjusted p-values?
My simple code (with t.test) looks like this:
table1 <- tbl_summary(df, by = gr, statistic = list(all_continuous() ~ "{mean} ({sd})")) %>% add_difference(test = list(all_continuous() ~ "t.test"), group = gr, conf.level = 0.95, pvalue_fun = function(x) style_pvalue(x, digits = 2)))
thanks for help.
Yes, the table you're after is possible in gtsummary. Use the tbl_merge() function to merge the tables with standardized differences and the p-values. There is an example of this below. An alternative to this is to use add_stat() to construct customized columns.
library(gtsummary)
#> #BlackLivesMatter
packageVersion("gtsummary")
#> [1] '1.4.1'
# standardized difference
tbl1 <-
trial %>%
select(trt, age, marker) %>%
tbl_summary(by = trt, missing = "no",
statistic = all_continuous() ~ "{mean} ({sd})") %>%
add_difference(all_continuous() ~ "cohens_d")
# table with p-value and corrected p-values
tbl2 <-
trial %>%
select(trt, age, marker) %>%
tbl_summary(by = trt, missing = "no") %>%
add_p(all_continuous() ~ "t.test") %>%
add_q(method = "bonferroni") %>%
# hide the trt summaries, because we don't need them repeated
modify_column_hide(all_stat_cols())
#> add_q: Adjusting p-values with
#> `stats::p.adjust(x$table_body$p.value, method = "bonferroni")`
# merge tbls together
tbl_final <-
tbl_merge(list(tbl1, tbl2)) %>%
# remove spanning headers
modify_spanning_header(everything() ~ NA)
Created on 2021-07-20 by the reprex package (v2.0.0)
I have multiple dataframes I need to concat the addresses and zip based condition.Actually I had sql query which i need to convert to dataframe join
I had written UDF which is working fine for concating multiple columns to obtain a single column,
val getConcatenated = udf( (first: String, second: String,third: String,fourth: String,five: String,six: String) => { first + "," + second + "," +third + "," +fourth + "," +five + "," +six } )
MySQl Query
select
CONCAT(al.Address1,',',al.Address2,',',al.Zip) AS AtAddress,
CONCAT(rl.Address1,',',rl.Address2,',',rl.Zip) AS RtAddress,
CONCAT(d.Address1,',',d.Address2,','d.Zip) AS DAddress,
CONCAT(s.Address1,',',s.Address2,',',s.Zip) AS SAGddress,
CONCAT(vl.Address1,',',vl.Address2,',vl.Zip) AS VAddress,
CONCAT(sg.Address1,',',sg.Address2,',sg.Zip) AS SAGGddress
FROM
si s inner join
at a on s.cid = a.cid and s.cid =a.cid
inner join De d on s.cid = d.cid AND d.aid = a.aid
inner join SGrpM sgm on s.cid = sgm.cid and s.sid =sgm.sid and sgm.status=1
inner join SeG sg on sgm.cid =sg.cid and sgm.gid =sg.gid
inner join bd bu on s.cid = bu.cid and s.sid =bu.sid
inner join locas al on a.ALId = al.lid
inner join locas rl on a.RLId = rl.lid
inner join locas vl on a.VLId = vl.lid
I am facing issue when joining the dataframes which gives me null value.
val DS = DS_SI.join(at,Seq("cid","sid"),"inner").join(DS_DE,Seq("cid","aid"),"inner") .join(DS_SGrpM,Seq("cid","sid"),"inner").join(DS_SG,Seq("cid","gid"),"inner") .join(at,Seq("cid","sid"),"inner")
.join(DS_BD,Seq("cid","sid"),"inner").join(DS_LOCAS("ALId") <=> DS_LOCATION("lid") && at("RLId") <=> DS_LOCAS("lid")&& at("VLId") <=> DS_LOCAS("lid"),"inner")
Iam trying to join my dataFrames like above which is not giving be proper results and then I want to concat by adding the column
.withColumn("AtAddress",getConcatenated())
.withColumn("RtAddress",getConcatenated())....
Any one tell me how effectively we can achieve this and am I joining the dataframes correctly or any better approach for this .....
You can use concat_ws(separator, columns_to_concat).
Example:
import org.apache.spark.sql.functions._
df.withColumn("title", concat_ws(", ", DS_DE("Address2"), DS_DE("Address2"), DS_DE("Zip")))
Update
Created a runnable demo for this problem.
https://github.com/narayanjr/anorm_test
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
I am unable to access fields on an aliased table. I keep getting error messages saying the field is not an option and the available fields are the base field name, or the `table_name'.field_name. But not the aliased field name. This makes it impossible to JOIN the same table twice and access all the fields.
var vendor_client_parser_1 = SqlParser.long("vid") ~ SqlParser.str("vname") ~ SqlParser.long("cid") ~ SqlParser.str("cname") map
{
case vid ~ vn ~ cid ~ cn => println(vid + "," + vn + "," + cid + "," + cn + ",")
}
var vendor_client_parser_2 = SqlParser.long("v.business_id") ~ SqlParser.str("v.name") ~ SqlParser.long("c.business_id") ~ SqlParser.str("c.name") map
{
case vid ~ vn ~ cid ~ cn => println(vid + "," + vn + "," + cid + "," + cn + ",")
}
var vendor_client_parser_3 = SqlParser.long(1) ~ SqlParser.str(2) ~ SqlParser.long(3) ~ SqlParser.str(4) map
{
case vid ~ vn ~ cid ~ cn => println(vid + "," + vn + "," + cid + "," + cn + ",")
}
DB.withConnection
{
implicit c =>
var results =
SQL"""
SELECT v.business_id AS vid, v.name AS vname, c.business_id AS cid, c.name AS cname
FROM #$BUSINESS_CONNECTION_TABLE
JOIN #$BUSINESS_TABLE AS v ON (vendor_id = v.business_id)
JOIN #$BUSINESS_TABLE AS c ON (client_id = c.business_id)
LIMIT 20
""".as(vendor_client_parser.*)
}
Expected result:
1, Vendor A, 10, Vendor K
2, Vendor B, 11, Vendor L
2, Vendor B, 1, Vendor A
12, Vendor M, 3, Vendor C
Result from vendor_client_parser_1:
10, Vendor K, 10, Vendor K
11, Vendor L, 11, Vendor L
1, Vendor A, 1, Vendor A
3, Vendor C, 3, Vendor C
Result from vendor_client_parser_2:
Execution exception[[AnormException: 'v.business_id' not found, available columns: business.business_id, business_id, business.name, name, business.business_id, business_id, business.name, name]]
Result from vendor_client_parser_3: (Same as expected)
1, Vendor A, 10, Vendor K
2, Vendor B, 11, Vendor L
2, Vendor B, 1, Vendor A
12, Vendor M, 3, Vendor C
vendor_client_parser_3 works but it relies on using index instead of names. I dis like using indexes because If I mess up an index I might still get a valid response back and not notice. If I mess up a name the column wont exist and I will know something is wrong.
Is there something I am missing? Is there any way to achieve the results I need without having to rely on using the index?
Play Scala 2.4.1
Anorm 2.5.0
Update:
If I do not alias the columns and use vendor_client_parser_2 I get the same result as when the columns are alias.
Modified Query:
SQL"""
SELECT v.business_id, v.name, c.business_id, c.name
FROM #$BUSINESS_CONNECTION_TABLE
JOIN #$BUSINESS_TABLE AS v ON (vendor_id = v.business_id)
JOIN #$BUSINESS_TABLE AS c ON (client_id = c.business_id)
LIMIT 20
""".as(vendor_client_parser_2.*)
Result with vendor_client_parser_2.*:
Execution exception[[AnormException: 'v.business_id' not found, available columns: business.business_id, business_id, business.name, name, business.business_id, business_id, business.name, name]]
I also tested it with a single table aliased and it refuses to see the aliased table name
Single Table Test:
SQL"""
SELECT v.business_id, v.name, business_id, name
FROM #$BUSINESS_TABLE AS v
LIMIT 20
""".as(test_parser.*)
test_parser:
var test_parser = SqlParser.long("v.business_id") ~ SqlParser.str("v.name") ~ SqlParser.long("business_id") ~ SqlParser.str("name") map
{
case vid ~ vn ~ cid ~ cn => println(vid + "," + vn + "," + cid + "," + cn + ",")
}
Result:
[AnormException: 'v.business_id' not found, available columns: business.business_id, business_id, business.name, name, business.business_id, business_id, business.name, name]
I then tested if aliased columns are accessible by both their original and aliased names.
Test Aliased columns:
SQL"""
SELECT business_id AS vid, name AS vname
FROM #$BUSINESS_TABLE
LIMIT 20
""".as(test_parser_2.*)
test_parser_2:
var test_parser_2 = SqlParser.long("business_id") ~ SqlParser.str("name") ~ SqlParser.long("vid") ~ SqlParser.str("vname") map
{
case vid ~ vn ~ cid ~ cn => println(vid + "," + vn + "," + cid + "," + cn + ",")
}
This test did not error out and it properly pulled in values as business_id and vid. As well as name and vname.
I forced it to error out so it would give me a list of column names. And it does seem like Anorm doesn't offer the non-aliased names as suggestions but they do work in this case.
[AnormException: 'forceError' not found, available columns: business.business_id, vid, business.name, vname]
I also tried not using SqlParser.
var businesses = SQL"""
SELECT v.business_id AS vid, v.name AS vname, c.business_id AS cid, c.name AS cname
FROM #$BUSINESS_CONNECTION_TABLE
JOIN #$BUSINESS_TABLE AS v ON (vendor_id = v.business_id)
JOIN #$BUSINESS_TABLE AS c ON (client_id = c.business_id)
LIMIT 20
""".fold(List[(Long, String, Long, String)]())
{
(list, row) =>
list :+ (row[Long]("v.business_id"), row[String]("v.name"), row[Long]("c.business_id"), row[String]("c.name")) //attempt_1
//list :+ (row[Long]("vid"), row[String]("vname"), row[Long]("cid"), row[String]("cname")) //attempt_2
}
If I use attempt_1 I get this error which as you suggested shouldn't work.
Left('v.business_id' not found, available columns: business.business_id, vid, business.name, vname, business.business_id, cid, business.name, cname)))
If I use attempt_2, I get the same results as vendor_client_parser_1
10, Vendor K, 10, Vendor K
11, Vendor L, 11, Vendor L
1, Vendor A, 1, Vendor A
3, Vendor C, 3, Vendor C
If I do not alias the columns and use this same method
SQL"""
SELECT v.business_id, v.name, c.business_id, c.name
FROM #$BUSINESS_CONNECTION_TABLE
JOIN #$BUSINESS_TABLE AS v ON (vendor_id = v.business_id)
JOIN #$BUSINESS_TABLE AS c ON (client_id = c.business_id)
LIMIT 20
""".fold(List[(Long, String, Long, String)]())
{
(list, row) =>
list :+ (row[Long]("v.business_id"), row[String]("v.name"), row[Long]("c.business_id"), row[String]("c.name")) //Attempt_3
}
Using this query without aliasing the columns causes this error,
Left('v.business_id' not found, available columns: business.business_id, business_id, business.name, name, business.business_id, business_id, business.name, name)))
I then tested a simple aliased tabled using this method
SQL"""
SELECT v.business_id, v.name
FROM #$BUSINESS_TABLE AS v
LIMIT 20
""".fold(List[(Long, String)]())
{
(list, row) =>
list :+ (row[Long]("v.business_id"), row[String]("v.name")) //simple_attempt_1
}
I get the same error
Left(List(java.lang.RuntimeException: Left('v.business_id' not found, available columns: business.business_id, business_id, business.name, name)))
So as far as I can tell it isn't possible to access fields that are part of an aliased table if the same table is used twice using the field names instead of index.
Update 2:
I tried reversing the order of fields in the SQL so it was c.business_id AS cid, c.name AS cname, v.business_id AS vid, v.name AS vname and rerunning vendor_client_parser_1. It gave me the inverse results
Result from vendor_client_parser_1 with mysql fields switched:
1, Vendor A, 1, Vendor A
2, Vendor B, 2, Vendor B
2, Vendor B, 2, Vendor B
12, Vendor M, 12, Vendor M
When I force an error and it shows me possible fields I get these,
Fields in original order:
Left('forceError' not found, available columns: business.business_id, vid, business.name, vname, business.business_id, cid, business.name, cname)
Fields in switched order:
Left('forceError' not found, available columns: business.business_id, cid, business.name, cname, business.business_id, vid, business.name, vname)
This makes me think this scenario is happening.
In case several columns are found with same name in query result, for example columns named code in both Country and CountryLanguage tables, there can be ambiguity. By default a mapping like following one will use the last column:
https://www.playframework.com/documentation/2.4.1/ScalaAnorm
If you look at the suggested fields business.business_id and business.name occur twice because the table is referenced twice. It seems like Anorm is associating the last occurrence of business.business_id and business.name with both aliases.
Update
Created a runnable demo for this problem.
https://github.com/narayanjr/anorm_test
I have not tried it in 2.4, but you could possibly use a parser method to generate the column name for you, rather than a variable. My approach would be something like the following:
case class BusinessConnection(vendor_id: Int, client_id: Int)
case class Business(id: Int, name: String)
case class VendorClient(vendor: Business, client: Business)
object VendorClient {
val businessConnectionP =
get[Int]("vendor_id") ~
get[Int]("client_id") map {
case vendor_id ~ client_id =>
BusinessConnection(vendor_id, client_id)
}
def businessP(alias: String) =
getAliased[Int](alias + ".id") ~
getAliased[String](alias + ".name") map {
case id ~ name =>
Business(id, name)
}
val vendorClientP =
businessP("v") ~ businessP("c") map {
case business ~ client =>
VendorClient(business, client)
}
val sqlSelector = "v.id, v.name, c.id, c.name"
def all() = DB.withConnection { implicit c =>
SQL(s"""
select $sqlSelector
from businessConnection
join business as v on vendor_id=v.id
join business as c on client_id=c.id
""").as(vendorClientP *)
}
}
I want to force slick to create queries like
select max(price) from coffees where ...
But slick's documentation doesn't help
val q = Coffees.map(_.price) //this is query Query[Coffees.type, ...]
val q1 = q.min // this is Column[Option[Double]]
val q2 = q.max
val q3 = q.sum
val q4 = q.avg
Because those q1-q4 aren't queries, I can't get the results but can use them inside other queries.
This statement
for {
coffee <- Coffees
} yield coffee.price.max
generates right query but is deprecated (generates warning: " method max in class ColumnExtensionMethods is deprecated: Use Query.max instead").
How to generate such query without warnings?
Another issue is to aggregate with group by:
"select name, max(price) from coffees group by name"
Tried to solve it with
for {
coffee <- Coffees
} yield (coffee.name, coffee.price.max)).groupBy(x => x._1)
which generates
select x2.x3, x2.x3, x2.x4 from (select x5."COF_NAME" as x3, max(x5."PRICE") as x4 from "coffees" x5) x2 group by x2.x3
which causes obvious db error
column "x5.COF_NAME" must appear in the GROUP BY clause or be used in an aggregate function
How to generate such query?
As far as I can tell is the first one simply
Query(Coffees.map(_.price).max).first
And the second one
val maxQuery = Coffees
.groupBy { _.name }
.map { case (name, c) =>
name -> c.map(_.price).max
}
maxQuery.list
or
val maxQuery = for {
(name, c) <- Coffees groupBy (_.name)
} yield name -> c.map(_.price).max
maxQuery.list