jdbc reading resultSet by colName issue for aliases - scala

I have a generic repository with a method as:
object Queries {
def getByFieldId(field: String, id: Int): String = {
s"""
|SELECT
| DF.id AS fileId,
| DF.name AS fileName,
| AG.id AS groupId,
| AG.name AS groupName
|FROM $tableName DFG
|INNER JOIN directory_files DF on DF.id = DFG.file_id
|INNER JOIN ad_groups AG on AG.id = DFG.group_id
|WHERE DFG.$field = $id
|""".stripMargin
}
}
def getByFieldId(field: String, id: Int): Try[List[Object]] = {
try {
val sqlQuery = Queries.getByFieldId("ad_group", 1)
statement = conn.getPreparedStatement(sqlQuery)
setParameters(statement, params)
resultSet = statement.executeQuery()
val metadata = resultSet.getMetaData
val columnCount = metadata.getColumnCount
val columns: ListBuffer[String] = ListBuffer.empty
for (i <- 1 to columnCount) {
columns += metadata.getColumnName(i)
}
var item: List[Object] = List.empty
while (resultSet.next()) {
val row = columns.toList.map(x => resultSet.getObject(x))
item = row
}
Success(item)
} catch {
case e: Any => Failure(errorHandler(e))
} finally conn.closeConnection(resultSet, statement)
}
The problems is that my result set ignore the query aliases and return columns as (id, name, id, name) instead of (fileId, fileName, groupId, groupName).
One solution found is to use column index instead of col names, but I'm not sure if this solution will cover entire app and will not break some other queries.
Maybe, another found solution is here and if I'm right, I can still use colNames but need to get them together with colTypes, then inside resultSet.next() to call getType method for each as:
// this part of code is not tested
// this idea came to me writing this topic
while (resultSet.next()) {
val row = columns.toList.map(x => {
x.colType match {
case "string" => resultSet.getString(x.colName)
case "integer" => resultSet.getInt(x.colName)
case "decimal" => resultSet.getDecimal(x.colName)
case _ => resultSet.getString(x.colName)
})
item = row
}

You may try to use getColumnLabel instead of getColumnName
as documented
Gets the designated column's suggested title for use in printouts and displays. The suggested title is usually specified by the SQL AS clause. If a SQL AS is not specified, the value returned from getColumnLabel will be the same as the value returned by the getColumnName method.
Note that this is highly dependent on the used RDBM.
For Oracle both methods return the alias and there is no chance to get the original column name.

Related

How to map a query result to case class using Anorm in scala

I have 2 case classes like this :
case class ClassTeacherWrapper(
success: Boolean,
classes: List[ClassTeacher]
)
2nd one :
case class ClassTeacher(
clid: String,
name: String
)
And a query like this :
val query =
SQL"""
SELECT
s.section_sk::text AS clid,
s.name AS name
from
********************
"""
P.S. I put * in place of query for security reasons :
So my query is returning 2 values. How do i map it to case class ClassTeacher
currently I am doing something like this :
def getClassTeachersByInstructor(instructor: String, section: String): ClassTeacherWrapper = {
implicit var conn: Connection = null
try {
conn = datamartDatasourceConnectionPool.getDBConnection()
// Define query
val query =
SQL"""
SELECT
s.section_sk::text AS clid,
s.name AS name
********
"""
logger.info("Read from DB: " + query)
// create a List containing all the datasets from the resultset and return
new ClassTeacherWrapper(
success =true,
query.as(Macro.namedParser[ClassTeacher].*)
)
//Trying new approch
//val users = query.map(user => new ClassTeacherWrapper(true, user[Int]("clid"), user[String]("name")).tolist
}
catch {
case NonFatal(e) =>
logger.error("getGradebookScores: error getting/parsing data from DB", e)
throw e
}
}
with is I am getting this exception :
{
"error": "ERROR: operator does not exist: uuid = character varying\n
Hint: No operator matches the given name and argument type(s). You
might need to add explicit type casts.\n Position: 324"
}
Can anyone help where am I going wrong. I am new to scala and Anorm
What should I modify in query.as part of code
Do you need the success field? Often an empty list would suffice?
I find parsers very useful (and reusable), so something like the following in the ClassTeacher singleton (or similar location):
val fields = "s.section_sk::text AS clid, s.name"
val classTeacherP =
get[Int]("clid") ~
get[String]("name") map {
case clid ~ name =>
ClassTeacher(clid,name)
}
def allForInstructorSection(instructor: String, section: String):List[ClassTeacher] =
DB.withConnection { implicit c => //-- or injected db
SQL(s"""select $fields from ******""")
.on('instructor -> instructor, 'section -> section)
.as(classTeacherP *)
}

Iterate and trim string based on condition in spark Scala

I have dataframe 'regexDf' like below
id,regex
1,(.*)text1(.*)text2(.*)text3(.*)text4(.*)|(.*)text2(.*)text5(.*)text6(.*)
2,(.*)text1(.*)text5(.*)text6(.*)|(.*)text2(.*)
If the length of the regex exceeds some max length for example 50, then i want to remove the last text token in splitted regex string separated by '|' for the exceeded id. In the above data frame, id 1 length is more than 50 so that last tokens 'text4(.)' and 'text6(.)' from each splitted regex string should be removed. Even after removing that also length of the regex string in id 1 still more than 50, so that again last tokens 'text3(.)' and 'text5(.)' should be removed.so the final dataframe will be
id,regex
1,(.*)text1(.*)text2(.*)|(.*)text2(.*)
2,(.*)text1(.*)text5(.*)text6(.*)|(.*)text2(.*)
I am able to trim the last tokens using the following code
val reducedStr = regex.split("|").foldLeft(List[String]()) {
(regexStr,eachRegex) => {
regexStr :+ eachRegex.replaceAll("\\(\\.\\*\\)\\w+\\(\\.\\*\\)$", "\\(\\.\\*\\)")
}
}.mkString("|")
I tried using while loop to check the length and trim the text tokens in iteration which is not working. Also i want to avoid using var and while loop. Is it possible to achieve without while loop.
val optimizeRegexString = udf((regex: String) => {
if(regex.length >= 50) {
var len = regex.length;
var resultStr: String = ""
while(len >= maxLength) {
val reducedStr = regex.split("|").foldLeft(List[String]()) {
(regexStr,eachRegex) => {
regexStr :+ eachRegex
.replaceAll("\\(\\.\\*\\)\\w+\\(\\.\\*\\)$", "\\(\\.\\*\\)")
}
}.mkString("|")
len = reducedStr.length
resultStr = reducedStr
}
resultStr
} else {
regex
}
})
regexDf.withColumn("optimizedRegex", optimizeRegexString(col("regex")))
As per SathiyanS and Pasha suggestion, I changed the recursive method as function.
def optimizeRegex(regexDf: DataFrame): DataFrame = {
val shrinkString= (s: String) => {
if (s.length > 50) {
val extractedString: String = shrinkString(s.split("\\|")
.map(s => s.substring(0, s.lastIndexOf("text"))).mkString("|"))
extractedString
}
else s
}
def shrinkUdf = udf((regex: String) => shrinkString(regex))
regexDf.withColumn("regexString", shrinkUdf(col("regex")))
}
Now i am getting exception as "recursive value shrinkString needs type"
Error:(145, 39) recursive value shrinkString needs type
val extractedString: String = shrinkString(s.split("\\|")
.map(s => s.substring(0, s.lastIndexOf("text"))).mkString("|"));
Recursion:
def shrink(s: String): String = {
if (s.length > 50)
shrink(s.split("\\|").map(s => s.substring(0, s.lastIndexOf("text"))).mkString("|"))
else s
}
Looks like issues with function calling, some additional info.
Can be called as static function:
object ShrinkContainer {
def shrink(s: String): String = {
if (s.length > 50)
shrink(s.split("\\|").map(s => s.substring(0, s.lastIndexOf("text"))).mkString("|"))
else s
}
}
Link with dataframe:
def shrinkUdf = udf((regex: String) => ShrinkContainer.shrink(regex))
df.withColumn("regex", shrinkUdf(col("regex"))).show(truncate = false)
Drawbacks: Just basic example (approach) provided. Some edge cases (if regexp does not contains "text", if too many parts separated by "|", for ex. 100; etc.) have to be resolved by author of question, for avoid infinite recursion loop.
This is how I would do it.
First, a function for removing the last token from a regex:
def deleteLastToken(s: String): String =
s.replaceFirst("""[^)]+\(\.\*\)$""", "")
Then, a function that shortens the entire regex string by deleting the last token from all the |-separated fields:
def shorten(r: String) = {
val items = r.split("[|]").toSeq
val shortenedItems = items.map(deleteLastToken)
shortenedItems.mkString("|")
}
Then, for a given input regex string, create the stream of all the shortened strings you get by applying the shorten function repeatedly. This is an infinite stream, but it's lazily evaluated, so only as few elements as required will be actually computed:
val regex = "(.*)text1(.*)text2(.*)text3(.*)text4(.*)|(.*)text2(.*)text5(.*)text6(.*)"
val allShortened = Stream.iterate(regex)(shorten)
Finally, you can treat allShortened as any other sequence. For solving our problem, you can drop all elements while they don't satisfy the length requirement, and then keep only the first one of the remaining ones:
val result = allShortened.dropWhile(_.length > 50).head
You can see all the intermediate values by printing some elements of allShortened:
allShortened.take(10).foreach(println)
// Prints:
// (.*)text1(.*)text2(.*)text3(.*)text4(.*)|(.*)text2(.*)text5(.*)text6(.*)
// (.*)text1(.*)text2(.*)text3(.*)|(.*)text2(.*)text5(.*)
// (.*)text1(.*)text2(.*)|(.*)text2(.*)
// (.*)text1(.*)|(.*)
// (.*)|(.*)
// (.*)|(.*)
// (.*)|(.*)
// (.*)|(.*)
// (.*)|(.*)
// (.*)|(.*)
Just to add to #pasha701 answer. Here is the solution that works in spark.
val df = sc.parallelize(Seq((1,"(.*)text1(.*)text2(.*)text3(.*)text4(.*)|(.*)text2(.*)text5(.*)text6(.*)"),(2,"(.*)text1(.*)text5(.*)text6(.*)|(.*)text2(.*)"))).toDF("ID", "regex")
df.show()
//prints
+---+------------------------------------------------------------------------+
|ID |regex |
+---+------------------------------------------------------------------------+
|1 |(.*)text1(.*)text2(.*)text3(.*)text4(.*)|(.*)text2(.*)text5(.*)text6(.*)|
|2 |(.*)text1(.*)text5(.*)text6(.*)|(.*)text2(.*) |
+---+------------------------------------------------------------------------+
Now you can use the #pasha701 shrink function using udf
val shrink: String => String = (s: String) => if (s.length > 50) shrink(s.split("\\|").map(s => s.substring(0,s.lastIndexOf("text"))).mkString("|")) else s
def shrinkUdf = udf((regex: String) => shrink(regex))
df.withColumn("regex", shrinkUdf(col("regex"))).show(truncate = false)
//prints
+---+---------------------------------------------+
|ID |regex |
+---+---------------------------------------------+
|1 |(.*)text1(.*)text2(.*)|(.*)text2(.*) |
|2 |(.*)text1(.*)text5(.*)text6(.*)|(.*)text2(.*)|
+---+---------------------------------------------+

Do Aggregation with Slick

My database structure looks like this:
id | content
I what to get the entry with max id (not just id).
I read the answer How to make aggregations with slick, but I found there is no first method in the statement: Query(Coffees.map(_.price).max).first. How to do that now?
What if I need the content of the item with the max id?
To retrieve another column, you could do something like the following. The below example calculates the max of one column, finds the row with that maximum value, and returns the value of another column in that row:
val coffees = TableQuery[Coffees]
val mostExpensiveCoffeeQuery =
for {
maxPrice <- coffees.map(_.price).max.result
c <- maxPrice match {
case Some(p) => coffees.filter(_.price === p).result
case None => DBIO.successful(Seq())
}
} yield c.headOption.map(_.name)
val mostExpensiveCoffee = db.run(mostExpensiveCoffeeQuery)
// Future[Option[String]]
Alternatively, to return a full Coffees object:
val mostExpensiveCoffeeQuery =
for {
...
} yield c.headOption
val mostExpensiveCoffee = db.run(mostExpensiveCoffeeQuery)
// Future[Option[Coffees]]

Recursive method call in Apache Spark

I'm building a family tree from a database on Apache Spark, using a recursive search to find the ultimate parent (i.e. the person at the top of the family tree) for each person in the DB.
It is assumed that the first person returned when searching for their id is the correct parent
val peopleById = peopleRDD.keyBy(f => f.id)
def findUltimateParentId(personId: String) : String = {
if((personId == null) || (personId.length() == 0))
return "-1"
val personSeq = peopleById.lookup(personId)
val person = personSeq(0)
if(person.personId == "0 "|| person.id == person.parentId) {
return person.id
}
else {
return findUltimateParentId(person.parentId)
}
}
val ultimateParentIds = peopleRDD.foreach(f => f.findUltimateParentId(f.parentId))
It is giving the following error
"Caused by: org.apache.spark.SparkException: RDD transformations and actions can only be invoked by the driver, not inside of other transformations; for example, rdd1.map(x => rdd2.values.count() * x) is invalid because the values transformation and count action cannot be performed inside of the rdd1.map transformation. For more information, see SPARK-5063."
I understand from reading other similar questions that the problem is that I'm calling the findUltimateParentId from within the foreach loop, and if I call the method from the shell with a person's id, it returns the correct ultimate parent id
However, none of the other suggested solutions work for me, or at least I can't see how to implement them in my program, can anyone help?
If I understood you correctly - here's a solution that would work for any size of input (although performance might not be great) - it performs N iterations over the RDD where N is the "deepest family" (largest distance from ancestor to child) in the input:
// representation of input: each person has an ID and an optional parent ID
case class Person(id: Int, parentId: Option[Int])
// representation of result: each person is optionally attached its "ultimate" ancestor,
// or none if it had no parent id in the first place
case class WithAncestor(person: Person, ancestor: Option[Person]) {
def hasGrandparent: Boolean = ancestor.exists(_.parentId.isDefined)
}
object RecursiveParentLookup {
// requested method
def findUltimateParent(rdd: RDD[Person]): RDD[WithAncestor] = {
// all persons keyed by id
def byId = rdd.keyBy(_.id).cache()
// recursive function that "climbs" one generation at each iteration
def climbOneGeneration(persons: RDD[WithAncestor]): RDD[WithAncestor] = {
val cached = persons.cache()
// find which persons can climb further up family tree
val haveGrandparents = cached.filter(_.hasGrandparent)
if (haveGrandparents.isEmpty()) {
cached // we're done, return result
} else {
val done = cached.filter(!_.hasGrandparent) // these are done, we'll return them as-is
// for those who can - join with persons to find the grandparent and attach it instead of parent
val withGrandparents = haveGrandparents
.keyBy(_.ancestor.get.parentId.get) // grandparent id
.join(byId)
.values
.map({ case (withAncestor, grandparent) => WithAncestor(withAncestor.person, Some(grandparent)) })
// call this method recursively on the result
done ++ climbOneGeneration(withGrandparents)
}
}
// call recursive method - start by assuming each person is its own parent, if it has one:
climbOneGeneration(rdd.map(p => WithAncestor(p, p.parentId.map(i => p))))
}
}
Here's a test to better understand how this works:
/**
* Example input tree:
*
* 1 5
* | |
* ----- 2 ----- 6
* | |
* 3 4
*
*/
val person1 = Person(1, None)
val person2 = Person(2, Some(1))
val person3 = Person(3, Some(2))
val person4 = Person(4, Some(2))
val person5 = Person(5, None)
val person6 = Person(6, Some(5))
test("find ultimate parent") {
val input = sc.parallelize(Seq(person1, person2, person3, person4, person5, person6))
val result = RecursiveParentLookup.findUltimateParent(input).collect()
result should contain theSameElementsAs Seq(
WithAncestor(person1, None),
WithAncestor(person2, Some(person1)),
WithAncestor(person3, Some(person1)),
WithAncestor(person4, Some(person1)),
WithAncestor(person5, None),
WithAncestor(person6, Some(person5))
)
}
It should be easy to map your input into these Person objects, and to map the output WithAncestor objects into whatever it is you need. Note that this code assumes that if any person has parentId X - another person with that id actually exists in the input
fixed this by using SparkContext.broadcast:
val peopleById = peopleRDD.keyBy(f => f.id)
val broadcastedPeople = sc.broadcast(peopleById.collectAsMap())
def findUltimateParentId(personId: String) : String = {
if((personId == null) || (personId.length() == 0))
return "-1"
val personOption = broadcastedPeople.value.get(personId)
if(personOption.isEmpty) {
return "0";
}
val person = personOption.get
if(person.personId == 0 || person.orgId == person.personId) {
return person.id
}
else {
return findUltimateParentId(person.parentId)
}
}
val ultimateParentIds = peopleRDD.foreach(f => f.findUltimateParentId(f.parentId))
working great now!

Play2's anorm can't work on postgresql

I found that the row parsers of play2's anorm depend on the meta data returned by jdbc driver.
So in the built-in sample "zentasks" provided by play, I can find such code:
object Project {
val simple = {
get[Pk[Long]]("project.id") ~
get[String]("project.folder") ~
get[String]("project.name") map {
case id~folder~name => Project(id, folder, name)
}
}
}
Please notice that the fields all have a project. prefix.
It works well on h2 database, but not on postgresql. If I use portgresql, I should write it as:
object Project {
val simple = {
get[Pk[Long]]("id") ~
get[String]("folder") ~
get[String]("name") map {
case id~folder~name => Project(id, folder, name)
}
}
}
I've asked this in play's google group, and Guillaume Bort said:
Yes if you are using postgres it's probably the cause. The postgresql
jdbc driver is broken and doesn't return table names.
If the postgresql's jdbc driver really have this issue, I think there will be a problem for anorm:
If two tables have fields with the same name, and I query them with join, anorm won't get the correct values, since it can't find out which name belongs to which table.
So I write a test.
1. create tables on postgresql
create table a (
id text not null primary key,
name text not null
);
create table b (
id text not null primary key,
name text not null,
a_id text,
foreign key(a_id) references a(id) on delete cascade
);
2. create anorm models
case class A(id: Pk[String] = NotAssigned, name: String)
case class B(id: Pk[String] = NotAssigned, name: String, aId: String)
object A {
val simple = {
get[Pk[String]]("id") ~
get[String]("name") map {
case id ~ name =>
A(id, name)
}
}
def create(a: A): A = {
DB.withConnection { implicit connection =>
val id = newId()
SQL("""
insert into a (id, name)
values (
{id}, {name}
)
""").on('id -> id, 'name -> a.name).executeUpdate()
a.copy(id = Id(id))
}
}
def findAll(): Seq[(A, B)] = {
DB.withConnection { implicit conn =>
SQL("""
select a.*, b.* from a as a left join b as b on a.id=b.a_id
""").as(A.simple ~ B.simple map {
case a ~ b => a -> b
} *)
}
}
}
object B {
val simple = {
get[Pk[String]]("id") ~
get[String]("name") ~
get[String]("a_id") map {
case id ~ name ~ aId =>
B(id, name, aId)
}
}
def create(b: B): B = {
DB.withConnection { implicit conneciton =>
val id = UUID.randomUUID().toString
SQL("""
insert into b (id, name, a_id)
values (
{id}, {name}, {aId}
)
""").on('id -> id, 'name -> b.name, 'aId -> b.aId).executeUpdate()
b.copy(id = Id(id))
}
}
}
3. test cases with scalatest
class ABTest extends DbSuite {
"AB" should "get one-to-many" in {
running(fakeApp) {
val a = A.create(A(name = "AAA"))
val b1 = B.create(B(name = "BBB1", aId = a.id.get))
val b2 = B.create(B(name = "BBB2", aId = a.id.get))
val ab = A.findAll()
ab foreach {
case (a, b) => {
println("a: " + a)
println("b: " + b)
}
}
}
}
}
4. the output
a: A(dbc52793-0f6f-4910-a954-940e508aab26,BBB1)
b: B(dbc52793-0f6f-4910-a954-940e508aab26,BBB1,4a66ebe7-536e-4bd5-b1bd-08f022650f1f)
a: A(d1bc8520-b4d1-40f1-af92-52b3bfe50e9f,BBB2)
b: B(d1bc8520-b4d1-40f1-af92-52b3bfe50e9f,BBB2,4a66ebe7-536e-4bd5-b1bd-08f022650f1f)
You can see that the "a"s have name of "BBB1/BBB2", but not "AAA".
I tried to redefine the parsers with prefixes as:
val simple = {
get[Pk[String]]("a.id") ~
get[String]("a.name") map {
case id ~ name =>
A(id, name)
}
}
But it will report errors that they can't find specified fields.
Is it a big issue of anorm? Or do I miss something?
The latest play2(RC3) has solved this problem by checking the class name of meta object:
// HACK FOR POSTGRES
if (meta.getClass.getName.startsWith("org.postgresql.")) {
meta.asInstanceOf[{ def getBaseTableName(i: Int): String }].getBaseTableName(i)
} else {
meta.getTableName(i)
}
But be careful if you want to use it with p6spy, it doesn't work because the class name of meta will be "com.p6spy....", not "org.postgresql....".