ResultSetExtractorException in SQL-Interpolation - scala

I'm facing some issues with SQL-Interpolation in ScalikeJdbc. Upon trying to run following piece of code
val dbTableSQLSyntax: SQLSyntax = SQLSyntax.createUnsafely(dbTableName)
sql"""
SELECT
COUNT(*) AS count,
MIN($distributionColumn) AS min,
MAX($distributionColumn) AS max
FROM
$dbTableSQLSyntax
""".stripMargin.
map(mapResult).
single().
apply().
get()
I get this error
scalikejdbc.ResultSetExtractorException: Failed to retrieve value because For input string: "tab_id". If you're using SQLInterpolation, you may mistake u.id for u.resultName.id.
at scalikejdbc.WrappedResultSet.wrapIfError(WrappedResultSet.scala:27)
at scalikejdbc.WrappedResultSet.get(WrappedResultSet.scala:479)
at scalikejdbc.WrappedResultSet.longOpt(WrappedResultSet.scala:233)
...
How can I get rid of this error without having to use Query-DSL?
Is there anything I can improve upon (in terms of performance / security) in above code-snippet?
Frameworks / Libraries
Scala 2.11.11
"org.scalikejdbc" %% "scalikejdbc" % "3.2.0"
EDIT-1
In response to #Kazuhiro Sera's answer, I'm providing my mapResult method
def mapResult(rs: WrappedResultSet): (Long, Long, Long) = {
val count: Long = rs.long("count")
val minOpt: Option[Long] = rs.longOpt("min")
val maxOpt: Option[Long] = rs.longOpt("max")
(count, minOpt.getOrElse(0), maxOpt.getOrElse(Long.MaxValue))
}

It depends on your mapResult function. I am afraid that mapResult tries to fetch tag_id from the ResultSet value. In this case, your SQL query returns only count, min, and max.

Related

scala mongodb document getList

I would like to get groups attribute as Seq[Int] from the given mongodb Document. How to do it? The method getList catches a runtime exception, and I would like to understand and fix it.
n: Document((_id,BsonObjectId{value=613645d689898b7d4ac2b1b2}), (groups,BsonArray{values=[BsonInt32{value=2}, BsonInt32{value=3}]}))
I tried this way that compiles, but I get the runtime error "Caused by: java.lang.ClassCastException: List element cannot be cast to scala.Int$"
val groups = n.getList("groups", Int.getClass)
Some sbt library dependencies:
scalaVersion := "2.12.14"
libraryDependencies += "org.mongodb.scala" %% "mongo-scala-driver" % "4.3.1"
Setup code:
val collection = db.getCollection("mylist")
Await.result(collection.drop.toFuture, Duration.Inf)
val groupsIn = Seq[Int](2, 3)
val doc = Document("groups" -> groupsIn)
Await.result(collection.insertOne(doc).toFuture, Duration.Inf)
println("see mongosh to verify that a Seq[Int] has been added")
val result = Await.result(collection.find.toFuture, Duration.Inf)
for(n <- result) {
println("n: " + n)
val groups = n.getList("groups", Int.getClass)
println("groups: " + groups)
}
Comments: result is of type Seq[Document], n is of type Document.
The getList hover-on description in VSCODE:
def getList[T](key: Any, clazz: Class[T]): java.util.List[T]
Gets the list value of the given key, casting the list elements to the given Class. This is useful to avoid having casts in client code, though the effect is the same.
With the help of sarveshseri and Gael J, the solution is reached:
import collection.JavaConverters._
val groups = n.getList("groups", classOf[Integer]).asScala.toSeq.map(p => p.toInt)

How to pass an array to a slick SQL plain query?

How to pass an array to a slick SQL plain query?
I tried as follows but it fails:
// "com.typesafe.slick" %% "slick" % "3.3.2", // latest version
val ids = Array(1, 2, 3)
db.run(sql"""select name from person where id in ($ids)""".as[String])
Error: could not find implicit value for parameter e: slick.jdbc.SetParameter[Array[Int]]
However this ticket seems to say that it should work:
https://github.com/tminglei/slick-pg/issues/131
Note: I am not interested in the following approach:
db.run(sql"""select name from person where id in #${ids.mkString("(", ",", ")")}""".as[Int])
The issue you linked points to a commit which adds this:
def mkArraySetParameter[T: ClassTag](/* ... */): SetParameter[Seq[T]]
def mkArrayOptionSetParameter[T: ClassTag](/* ... */): SetParameter[Option[Seq[T]]]
Note that they are not implicit.
You'll need to do something like
implicit val setIntArray: SetParameter[Array[Int]] = mkArraySetParameter[Int](...)
and make sure that is in scope when you try to construct your sql"..." string.
I meet same problem and searched it.
And I resolved it with a implicit val like this:
implicit val strListParameter: slick.jdbc.SetParameter[List[String]] =
slick.jdbc.SetParameter[List[String]]{ (param, pointedParameters) =>
pointedParameters.setString(f"{${param.mkString(", ")}}")
}
put it into your slick-pg profile and import it with other val at where needed.
Or more strict, like this:
implicit val strListParameter: slick.jdbc.SetParameter[List[String]] =
slick.jdbc.SetParameter[List[String]]{ (param, pointedParameters) =>
pointedParameters.setObject(param.toArray, java.sql.Types.ARRAY)
}
implicit val strSeqParameter: slick.jdbc.SetParameter[Seq[String]] =
slick.jdbc.SetParameter[Seq[String]]{ (param, pointedParameters) =>
pointedParameters.setObject(param.toArray, java.sql.Types.ARRAY)
}
and use the val like:
val entries: Seq[String]
val query = {
sql"""select ... from xxx
where entry = ANY($entries)
order by ...
""".as[(Column, Types, In, Here)]
}

scala insert to redis gives task not serializable

I have a following code :-
case class event(imei: String, date: String, gpsdt: String,
entrygpsdt: String,lastgpsdt: String)
val result = rdd.map(row => {
val imei = row.getString(0)
val date = row.getString(1)
val gpsdt = row.getString(2)
event(imei, date, gpsdt, lastgpsdt ,"2018-04-06 10:10:10")
}).collect()
val collection = sc.parallelize(result)
collection.saveToCassandra("db", "table", SomeColumns("imei", "date", "gpsdt", "lastgpsdt", "dt")
This works fine . So, now I'm inserting this result value into cassandra but I want to insert part of each rdd into Redis as well . When, I'm trying to use redis insert inside loop it gives an error that Task is not serializable
I want something like this :-
case class event(imei: String, date: String, gpsdt: String,
entrygpsdt: String,lastgpsdt: String)
val result = rdd.map(row => {
val imei = row.getString(0)
val date = row.getString(1)
val gpsdt = row.getString(2)
val zscore = Calendar.getInstance().getTimeInMillis
val value = row.getString(0) + ',' + row.getString(2)
val key = row.getString(1)
client.zadd(key , zscore, value)
event(imei, date, gpsdt, lastgpsdt ,"2018-04-06 10:10:10")
}).collect()
val collection = sc.parallelize(result)
collection.saveToCassandra("db", "table", SomeColumns("imei", "date", "gpsdt", "lastgpsdt", "dt")
So, How Can I do that , "client" is object of scala redis library.
Thanks,
Since no answer was provided by anyone. I found the solution for my case. Don't know whether the approach is good or not but it worked for me. So, idea is collect data by iterating over RDD . You'll be given a result of Array[event]. So, now again start a loop on result and insert each row in Redis. and finally "result" in cassandra. This flow is solving my both purpose that I was looking for.
Thanks,
The serializable exception is generally caused due to the connection object creation.
However your code does not include, I guess you have created the client object outside the foreachRDD
If so the client object is created in driver and foreach is executed in executor where it cannot find the client object and occurs exception task not serializable.
What you can do is create the client object inside foreach, But this creates a connection for each record, which is also not good for performance.
So what you can do is
rdd.foreachPartition(partition => {
//Create a connection here for redis
partition.foreach(record => {
//send the data here
})
})
Hope this helps!

Trouble getting Spark aggregators to work

I am wanting to try out Aggregators in Scala Spark, but I cannot seem to get them to work using both the select function and the groupBy/agg functions (with my current implementation the agg function fails to compile). My aggregator is written below and should be self explanatory.
import org.apache.spark.sql.expressions.Aggregator
import org.apache.spark.sql.{Encoder, Encoders}
/** Stores the number of true counts (tc) and false counts (fc) */
case class Counts(var tc: Long, var fc: Long)
/** Count the number of true and false occurances of a function */
class BooleanCounter[A](f: A => Boolean) extends Aggregator[A, Counts, Counts] with Serializable {
// Initialize both counts to zero
def zero: Counts = Counts(0L, 0L)
// Sum counts for intermediate value and new value
def reduce(acc: Counts, other: A): Counts = {
if (f(other)) acc.tc += 1 else acc.fc += 1
acc
}
// Sum counts for intermediate values
def merge(acc1: Counts, acc2: Counts): Counts = {
acc1.tc += acc2.tc
acc1.fc += acc2.fc
acc1
}
// Return results
def finish(acc: Counts): Counts = acc
// Encoder for intermediate value type
def bufferEncoder: Encoder[Counts] = Encoders.product[Counts]
// Encoder for return type
def outputEncoder: Encoder[Counts] = Encoders.product[Counts]
}
Below is my test code.
val ds: Dataset[Employee] = Seq(
Employee("John", 110),
Employee("Paul", 100),
Employee("George", 0),
Employee("Ringo", 80)
).toDS()
val salaryCounter = new BooleanCounter[Employee]((r: Employee) => r.salary < 10).toColumn
// Usage works fine
ds.select(salaryCounter).show()
// Causes an error
ds.groupBy($"name").agg(salaryCounter).show()
The first usage of salaryCounter works fine but the second results in the following compilation error.
java.lang.ClassCastException: org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema cannot be cast to Employee
Databricks has a tutorial that is rather complicated but appears to be Spark 2.3. There is also this older tutorial that uses an experimental feature from Spark 1.6.
You're incorrectly mixing "statically typed" and "dynamically typed" APIs. To use the former version you should call agg on KeyValueGroupedDataset, not RelationalGroupedDataset:
ds.groupByKey(_.name).agg(salaryCounter)

How to stream Anorm large query results to client in chunked response with Play 2.5

I have a pretty large result set (60k+ records columns) that I am pulling from a database and parsing with Anorm (though I can use play's default data access module that returns a ResultSet if needed). I need to transform and stream these results directly to the client (without holding them in a big list in memory) where they will then be downloaded directly to a file on the client's machine.
I have been referring to what is demonstrated in the Chunked Responses section in the ScalaStream 2.5.x Play documentation. I am having trouble implementing the "getDataStream" portion of what it shows there.
I've also been referencing what is demoed in the Streaming Results and Iteratee sections in the ScalaAnorm 2.5.x Play documentation. I have tried piping the results as an enumerator like what is returned here:
val resultsEnumerator = Iteratees.from(SQL"SELECT * FROM Test", SqlParser.str("colName"))
into
val dataContent = Source.fromPublisher(Streams.enumeratorToPublisher(resultsEnumerator))
Ok.chunked(dataContent).withHeaders(("ContentType","application/x-download"),("Content-disposition","attachment; filename=myDataFile.csv"))
But the resulting file/content is empty.
And I cannot find any sample code or references on how to convert a function in the data service that returns something like this:
#annotation.tailrec
def go(c: Option[Cursor], l: List[String]): List[String] = c match {
case Some(cursor) => {
if (l.size == 10000000) l // custom limit, partial processing
else {
go(cursor.next, l :+ cursor.row[String]("VBU_NUM"))
}
}
case _ => l
}
val sqlString = s"select colName FROM ${tableName} WHERE ${whereClauseStr}"
val results : Either[List[Throwable], List[String]] = SQL(sqlString).withResult(go(_, List.empty[String]))
results
into something i can pass to Ok.chunked().
So basically my question is, how should I feed each record fetch from the database into a stream that I can do a transformation on and send to the client as a chunked response that can be downloaded to a file?
I would prefer not to use Slick for this. But I can go with a solution that does not use Anorm, and just uses the play dbApi objects that returns the raw java.sql.ResultSet object and work with that.
After referencing the Anorm Akka Support documentation and much trial and error, I was able to achieve my desired solution. I had to add these dependencies
"com.typesafe.play" % "anorm_2.11" % "2.5.2",
"com.typesafe.play" % "anorm-akka_2.11" % "2.5.2",
"com.typesafe.akka" %% "akka-stream" % "2.4.4"
to by build.sbt file for Play 2.5.
and I implemented something like this
//...play imports
import anorm.SqlParser._
import anorm._
import akka.actor.ActorSystem
import akka.stream.ActorMaterializer
import akka.stream.scaladsl.{Sink, Source}
...
private implicit val akkaActorSystem = ActorSystem("MyAkkaActorSytem")
private implicit val materializer = ActorMaterializer()
def streamedAnormResultResponse() = Action {
implicit val connection = db.getConnection()
val parser: RowParser[...] = ...
val sqlQuery: SqlQuery = SQL("SELECT * FROM table")
val source: Source[Map[String, Any] = AkkaStream.source(sqlQuery, parser, ColumnAliaser.empty).alsoTo(Sink.onComplete({
case Success(v) =>
connection.close()
case Failure(e) =>
println("Info from the exception: " + e.getMessage)
connection.close()
}))
Ok.chunked(source)
}