passing Dataframe contents into sql stored procedure - scala

I am trying to pass the contents in the dataframe into my sql stored procedure. I use a map function to iterate it through the dataframe contents and send them into the db. I have an error when trying to do it.
I am getting an error called No Encoder found for Any
- field (class: "java.lang.Object", name: "_1")
- root class: "scala.Tuple2"
Could anybody help me to correct this.
Below is my code
val savedDataFrame = dataFrame.map(m => sendDataFrameToDB(m.get(0), m.get(1), m.get(2), m.get(3)))
savedDataFrame.collect()
def sendDataFrameToDB(firstName : String, lastName : String, address : String, age : Long) = {
var jdbcConnection: java.sql.Connection = null
try {
val jdbcTemplate = new JDBCTemplate()
jdbcTemplate.getConfiguration()
jdbcConnection = jdbcTemplate.getConnection
if (jdbcConnection != null) {
val statement = "{call insert_user_details (?,?,?,?)}"
val callableStatement = jdbcConnection.prepareCall(statement)
callableStatement.setString(1, firstName)
callableStatement.setString(2, lastName)
callableStatement.setString(3, address)
callableStatement.setLong(4, age)
callableStatement.executeUpdate
}
} catch {
case e: SQLException => logger.error(e.getMessage)
}
}

passing Dataframe contents into sql stored procedure
dataFrame.map(m => sendDataFrameRDBMS(f.getAs("firstname").toString, f.getAs("lastname").toString, f.getAs("address").toString, f.getAs("age").toString.toLong))
m.get(0) belongs to the type of Any and it cannot be passed to String typed firstName directly according to your example. Datframe is different from RDD. "DataFrame is a Dataset organized into named columns. It is conceptually equivalent to a table in a relational database or a data frame in R/Python, but with richer optimizations under the hood" link
When u make the Dataframe makes the columns such as
val dataFrame = dataSet.toDF("firstname", "lastName", "address", "age")
Then you can access elements in the dataframe as below and pass into whatever your method
dataFrame.map(m => sendDataFrameRDBMS(f.getAs("firstname").toString, f.getAs("lastname").toString, f.getAs("address").toString, f.getAs("age").toString.toLong))

Related

Is there any way to specify type in scala dynamically

I'm new in Spark, Scala, so sorry for stupid question. So I have a number of tables:
table_a, table_b, ...
and number of corresponding types for these tables
case class classA(...), case class classB(...), ...
Then I need to write a methods that read data from these tables and create dataset:
def getDataFromSource: Dataset[classA] = {
val df: DataFrame = spark.sql("SELECT * FROM table_a")
df.as[classA]
}
The same for other tables and types. Is there any way to avoid routine code - I mean individual fucntion for each table and get by with one? For example:
def getDataFromSource[T: Encoder](table_name: String): Dataset[T] = {
val df: DataFrame = spark.sql(s"SELECT * FROM $table_name")
df.as[T]
}
Then create list of pairs (table_name, type_name):
val tableTypePairs = List(("table_a", classA), ("table_b", classB), ...)
Then to call it using foreach:
tableTypePairs.foreach(tupl => getDataFromSource[what should I put here?](tupl._1))
Thanks in advance!
Something like this should work
def getDataFromSource[T](table_name: String, encoder: Encoder[T]): Dataset[T] =
spark.sql(s"SELECT * FROM $table_name").as(encoder)
val tableTypePairs = List(
"table_a" -> implicitly[Encoder[classA]],
"table_b" -> implicitly[Encoder[classB]]
)
tableTypePairs.foreach {
case (table, enc) =>
getDataFromSource(table, enc)
}
Note that this is a case of discarding a value, which is a bit of a code smell. Since Encoder is invariant, tableTypePairs isn't going to have that useful of a type, and neither would something like
tableTypePairs.map {
case (table, enc) =>
getDataFromSource(table, enc)
}
One option is to pass the Class to the method, this way the generic type T will be inferred:
def getDataFromSource[T: Encoder](table_name: String, clazz: Class[T]): Dataset[T] = {
val df: DataFrame = spark.sql(s"SELECT * FROM $table_name")
df.as[T]
}
tableTypePairs.foreach { case (table name, clazz) => getDataFromSource(tableName, clazz) }
But then I'm not sure of how you'll be able to exploit this list of Dataset without .asInstanceOf.

How do I insert JSON into a postgres table using Anorm?

I'm getting a runtime exception when trying to insert a JSON string into a JSON column. The string I have looks like """{"Events": []}""", the table has a column defined as status JSONB NOT NULL. I can insert the string into the table from the command line no problem. I've defined a method to do the insert as:
import play.api.libs.json._
import anorm._
import anorm.postgresql._
def createStatus(
status: String,
created: LocalDateTime = LocalDateTime.now())(implicit c: SQLConnection): Unit = {
SQL(s"""
|INSERT INTO status_feed
| (status, created)
|VALUES
| ({status}, {created})
|""".stripMargin)
.on(
'status -> Json.parse("{}"), // n.b. would be Json.parse(status) but this provides a concise error message
'created -> created)
.execute()
}
and calling it gives the following error:
TypeDoesNotMatch(Cannot convert {}: org.postgresql.util.PGobject to String for column ColumnName(status_feed.status,Some(status)))
anorm.AnormException: TypeDoesNotMatch(Cannot convert {}: org.postgresql.util.PGobject to String for column ColumnName(status_feed.status,Some(status)))
I've done loads of searching for this issue but there's nothing about this specific use case that I could find - most of it is pulling out json columns into case classes. I've tried slightly different formats using spray-json's JsValue, play's JsValue, simply passing the string as-is and casting in the query with ::JSONB and they all give the same error.
Update: here is the SQL which created the table:
CREATE TABLE status_feed (
id SERIAL PRIMARY KEY,
status JSONB NOT NULL,
created TIMESTAMP WITHOUT TIME ZONE NOT NULL DEFAULT NOW()
)
The error is not on values given to .executeInsert, but on the parsing of the INSERT result (inserted key).
import java.sql._
// postgres=# CREATE TABLE test(foo JSONB NOT NULL);
val jdbcUrl = "jdbc:postgresql://localhost:32769/postgres"
val props = new java.util.Properties()
props.setProperty("user", "postgres")
props.setProperty("password", "mysecretpassword")
implicit val con = DriverManager.getConnection(jdbcUrl, props)
import anorm._, postgresql._
import play.api.libs.json._
SQL"""INSERT INTO test(foo) VALUES(${Json.obj("foo" -> 1)})""".
executeInsert(SqlParser.scalar[JsValue].singleOpt)
// Option[play.api.libs.json.JsValue] = Some({"foo":1})
/*
postgres=# SELECT * FROM test ;
foo
------------
{"foo": 1}
*/
BTW, the plain string interpolation is useless.
Turns out cchantep was right, it was the parser I was using. The test framework I am using swallowed the stack trace and I assumed the problem was on the insert, but what's actually blowing up is the next line in the test where I use the parser.
The case class and parser were defined as:
case class StatusFeed(
status: String,
created: LocalDateTime) {
val ItemsStatus: Status = status.parseJson.convertTo[Status]
}
object StatusFeed extends DefaultJsonProtocol {
val fields: String = sqlFields[StatusFeed]() // helper function that results in "created, status"
// used in SQL as RETURNING ${StatusFeed.fields}
val parser: RowParser[StatusFeed] =
Macro.namedParser[StatusFeed](Macro.ColumnNaming.SnakeCase)
// json formatter for Status
}
As defined the parser attempts to read a JSONB column from the result set into the String status. Changing fields to val fields: String = "created, status::TEXT" resolves the issue, though the cast may be expensive. Alternatively, defining status as a JsValue instead of a String and providing an implicit for anorm (adapted from this answer to use spray-json) fixes the issue:
implicit def columnToJsValue: Column[JsValue] = anorm.Column.nonNull[JsValue] { (value, meta) =>
val MetaDataItem(qualified, nullable, clazz) = meta
value match {
case json: org.postgresql.util.PGobject => Right(json.getValue.parseJson)
case _ =>
Left(TypeDoesNotMatch(
s"Cannot convert $value: ${value.asInstanceOf[AnyRef].getClass} to Json for column $qualified"))
}
}

How to map a query result to case class using Anorm in scala

I have 2 case classes like this :
case class ClassTeacherWrapper(
success: Boolean,
classes: List[ClassTeacher]
)
2nd one :
case class ClassTeacher(
clid: String,
name: String
)
And a query like this :
val query =
SQL"""
SELECT
s.section_sk::text AS clid,
s.name AS name
from
********************
"""
P.S. I put * in place of query for security reasons :
So my query is returning 2 values. How do i map it to case class ClassTeacher
currently I am doing something like this :
def getClassTeachersByInstructor(instructor: String, section: String): ClassTeacherWrapper = {
implicit var conn: Connection = null
try {
conn = datamartDatasourceConnectionPool.getDBConnection()
// Define query
val query =
SQL"""
SELECT
s.section_sk::text AS clid,
s.name AS name
********
"""
logger.info("Read from DB: " + query)
// create a List containing all the datasets from the resultset and return
new ClassTeacherWrapper(
success =true,
query.as(Macro.namedParser[ClassTeacher].*)
)
//Trying new approch
//val users = query.map(user => new ClassTeacherWrapper(true, user[Int]("clid"), user[String]("name")).tolist
}
catch {
case NonFatal(e) =>
logger.error("getGradebookScores: error getting/parsing data from DB", e)
throw e
}
}
with is I am getting this exception :
{
"error": "ERROR: operator does not exist: uuid = character varying\n
Hint: No operator matches the given name and argument type(s). You
might need to add explicit type casts.\n Position: 324"
}
Can anyone help where am I going wrong. I am new to scala and Anorm
What should I modify in query.as part of code
Do you need the success field? Often an empty list would suffice?
I find parsers very useful (and reusable), so something like the following in the ClassTeacher singleton (or similar location):
val fields = "s.section_sk::text AS clid, s.name"
val classTeacherP =
get[Int]("clid") ~
get[String]("name") map {
case clid ~ name =>
ClassTeacher(clid,name)
}
def allForInstructorSection(instructor: String, section: String):List[ClassTeacher] =
DB.withConnection { implicit c => //-- or injected db
SQL(s"""select $fields from ******""")
.on('instructor -> instructor, 'section -> section)
.as(classTeacherP *)
}

How to get datatype of column in spark dataframe dynamically

I have a dataframe - converted dtypes to map.
val dfTypesMap:Map[String,String]] = df.dtypes.toMap
Output:
(PRODUCT_ID,StringType)
(PRODUCT_ID_BSTP_MAP,MapType(StringType,IntegerType,false))
(PRODUCT_ID_CAT_MAP,MapType(StringType,StringType,true))
(PRODUCT_ID_FETR_MAP_END_FR,ArrayType(StringType,true))
When I use type [String] hardcoding in row.getAS[String], there is no compilation error.
df.foreach(row => {
val prdValue = row.getAs[String]("PRODUCT_ID")
})
I want to iterate above map dfTypesMap and get corresponding value type. Is there any way to convert dt column types to general types like below?
StringType --> String
MapType(StringType,IntegerType,false) ---> Map[String,Int]
MapType(StringType,StringType,true) ---> Map[String,String]
ArrayType(StringType,true) ---> List[String]
As mentioned, Datasets make it easier to work with types.
Dataset is basically a collection of strongly-typed JVM objects.
You can map your data to case classes like so
case class Foo(PRODUCT_ID: String, PRODUCT_NAME: String)
val ds: Dataset[Foo] = df.as[Foo]
Then you can safely operate on your typed objects. In your case you could do
ds.foreach(foo => {
val prdValue = foo.PRODUCT_ID
})
For more on Datasets, check out
https://spark.apache.org/docs/latest/sql-programming-guide.html#creating-datasets

Selecting specific columns in Slick 3.x throws a type mismatch

In this Slick function I read from a User table and return a SessionUser object (SessionUser has fewer columns than User).
The problem is that this code does not compile, for each field in SessionUser it gives me the error type mismatch; found : slick.lifted.Rep[String] required: String. What's the meaning of this error and how to fix it?
def readByUserid (userid: String) : Option[SessionUser] = {
val db = Database.forConfig(Constant.dbBank)
try {
val users = TableQuery[UserDB]
val action = users.filter(_.userid === userid)
.map(u => SessionUser(u.userid, u.firstName, u.lastName)).result
val future = db.run(action)
val result = Await.result(future, Duration.Inf)
result
}
finally db.close
}
You're using map operation in a wrong place: mapping over a Query object is an equivalent of the SELECT statement in SQL. You should place your map operation right after the result invocation:
val action = users.filter(_.userid === userid).result.map(_.headOption.map(u => SessionUser(u.userid, u.firstName, u.lastName)))
Since we are using only three columns from users table, we could express that by using map operation on a Query object. This way the underlying SQL statement will select only the columns we need:
val action = users.filter(_.userid === userid).map(u => (u.userid, u.firstName, u.lastName)).result.map(_.headOption.map {
case (userid, firstName, lastName) => SessionUser(userid, firstName, lastName)
})