Value sql is not a member of StringContext - scala

Why, having the following code Scala returns value sql is not a member of StringContext?
I'm using Slick with Play Framework.
val db = Database.forConfig("db")
val query = sql"""select ID from TEACHER""".as[String]
val people = db.withSession{ implicit session =>
Ok(query.list)

You can use the import import driver.api._ from the library com.typesafe.play:play-slick_2.11:2.0.0. This should work:
import driver.api._
val db = Database.forConfig("db")
val query = sql"""select ID from TEACHER""".as[String]
val people = db.withSession{ implicit session =>
Ok(query.list)

As mentioned in a comment above by "code4j" (but worthy of a separate answer), I use the following:
import anorm.SqlStringInterpolation
Note: this works with Anorm, independent of Slick.

Try this one:
import Q.interpolation
val db = Database.forConfig("db")
val query = sql"""select ID from TEACHER""".as[String]
val people = db.withSession{ implicit session =>
Ok(query.list)

Related

Issue with mapping in Scala using Cassandra DB

I am trying to connect several Cassandra tables and display them using Scala.
I am getting the error:
error: value map is not a member of model.UserMap
mapResult.map(x => x.map(xx => xx.map(xxx =>
Here is my code:
import database.UserConnProvider
import database.PhantomUserRepository
import database.PhantomUserMapRepository
import com.softwaremill.macwire._
lazy val cassConn = wire[UserConnProvider]
val user = cassConn
import com.outworkers.phantom.connectors.CassandraConnection
import com.datastax.driver.core._
import com.typesafe.config.ConfigFactory
val connection = user.get()
implicit val session: Session = connection.session
implicit val keySpace = connection.provider.space //provider.space
val config: com.typesafe.config.Config = ConfigFactory.load()
val tableName: String = "map"
import controllers.UserController
import com.outworkers.phantom.connectors.CassandraConnection
import scala.concurrent.{ExecutionContext, Future}
lazy val phantomUserMapRepo = new PhantomUserMapRepository(cassConn.get() , ExecutionContext.Implicits.global)
val userMapRepo = phantomUserMapRepo
import scala.concurrent.ExecutionContext.Implicits.global
import play.api.libs.json._
val mapResult = userMapRepo.findMap("tc")
mapResult.map(x => x.map(xx => xx.map(xxx =>
for(iter <- xxx.ticker.toArray) {
println(iter)
}
)))
Database definitions:
Keyspace: users
Tables: user_info, map, tc_codes_map.
user_info connects to map through company_code.
map has ticker column which connects to tc_codes_map
Would appreciate any tips!
Thank you!

Processing a big table with Slick fails with OutOfMemoryError

I am querying a big MySQL table with Akka Streams and Slick, but it fails with an OutOfMemoryError. It seems that Slick is loading all the results into memory (it does not fail if the query is limited to a few rows). Why is this the case, and what is the solution?
val dbUrl = "jdbc:mysql://..."
import akka.NotUsed
import akka.actor.ActorSystem
import akka.stream.alpakka.slick.scaladsl.SlickSession
import akka.stream.alpakka.slick.scaladsl.Slick
import akka.stream.scaladsl.Source
import akka.stream.{ActorMaterializer, Materializer}
import com.typesafe.config.ConfigFactory
import slick.jdbc.GetResult
import scala.concurrent.Await
import scala.concurrent.duration.Duration
val slickDbConfig = s"""
|profile = "slick.jdbc.MySQLProfile$$"
|db {
| dataSourceClass = "slick.jdbc.DriverDataSource"
| properties = {
| driver = "com.mysql.jdbc.Driver",
| url = "$dbUrl"
| }
|}
|""".stripMargin
implicit val actorSystem: ActorSystem = ActorSystem()
implicit val materializer: Materializer = ActorMaterializer()
implicit val slickSession: SlickSession = SlickSession.forConfig(ConfigFactory.parseString(slickDbConfig))
import slickSession.profile.api._
val responses: Source[String, NotUsed] = Slick.source(
sql"select my_text from my_table".as(GetResult(r => r.nextString())) // limit 100
)
val future = responses.runForeach((myText: String) =>
println("my_text: " + myText.length)
)
Await.result(future, Duration.Inf)
From the Slick documentation:
Note: Some database systems may require session parameters to be set in a certain way to support streaming without caching all data at once in memory on the client side. For example, PostgreSQL requires both .withStatementParameters(rsType = ResultSetType.ForwardOnly, rsConcurrency = ResultSetConcurrency.ReadOnly, fetchSize = n) (with the desired page size n) and .transactionally for proper streaming.
In other words, to prevent the database from loading all the query results into memory, one might need additional configuration. This configuration is database dependent. The MySQL documentation states the following:
By default, ResultSets are completely retrieved and stored in memory. In most cases this is the most efficient way to operate and, due to the design of the MySQL network protocol, is easier to implement. If you are working with ResultSets that have a large number of rows or large values and cannot allocate heap space in your JVM for the memory required, you can tell the driver to stream the results back one row at a time.
To enable this functionality, create a Statement instance in the following manner:
stmt = conn.createStatement(java.sql.ResultSet.TYPE_FORWARD_ONLY,
java.sql.ResultSet.CONCUR_READ_ONLY);
stmt.setFetchSize(Integer.MIN_VALUE);
The combination of a forward-only, read-only result set, with a fetch size of Integer.MIN_VALUE serves as a signal to the driver to stream result sets row-by-row.
To set the above configuration in Slick:
import slick.jdbc._
val query =
sql"select my_text from my_table".as(GetResult(r => r.nextString()))
.withStatementParameters(
rsType = ResultSetType.ForwardOnly,
rsConcurrency = ResultSetConcurrency.ReadOnly,
fetchSize = Int.MinValue
)//.transactionally <-- I'm not sure whether you need ".transactionally"
val responses: Source[String, NotUsed] = Slick.source(query)

Writing DataFrame to MemSQL Table in Spark

Im trying to load a .parquet file into a MemSQL Database with Spark and MemSQL Connector.
package com.memsql.spark
import com.memsql.spark.context._
import org.apache.spark._
import org.apache.spark.sql._
import org.apache.spark.sql.types._
import com.memsql.spark.connector._
import com.mysql.jdbc._
object readParquet {
def main(args: Array[String]){
val conf = new SparkConf().setAppName("ReadParquet")
val sc = new SparkContext(conf)
sc.addJar("/data/applications/spark-1.5.1-bin-hadoop2.6/lib/mysql-connector-java-5.1.37-bin.jar")
sc.addJar("/data/applications/spark-1.5.1-bin-hadoop2.6/lib/memsql-connector_2.10-1.1.0.jar")
Class.forName("com.mysql.jdbc.Driver")
val host = "xxxx"
val port = 3306
val dbName = "WP1"
val user = "root"
val password = ""
val tableName = "rt_acc"
val memsqlContext = new com.memsql.spark.context.MemSQLContext(sc, host, port, user, password)
val rt_acc = memsqlContext.read.parquet("tachyon://localhost:19998/rt_acc.parquet")
val func_rt_acc = new com.memsql.spark.connector.DataFrameFunctions(rt_acc)
func_rt_acc.saveToMemSQL(dbName, tableName, host, port, user, password)
}
}
I'm fairly certain that Tachyon is not causing the problem, as the same exceptions occur if loaded from disk and i can use sql-queries on the dataframe.
I've seen people suggest df.saveToMemSQL(..) however it seems this method is in DataFrameFunctions now.
Also the table doesnt exist yet but saveToMemSQL should do CREATE TABLE as documentation and source code tell me.
Edit: Ok i guess i misread something. saveToMemSQL doesn't create the table. Thanks.
Try using createMemSQLTableAs instead of saveToMemSQL.
saveToMemSQL loads a dataframe into an existing table, where as createMemSQLTableAs creates the table and then loads it.
It also returns a handy dataframe wrapping that MemSQL table :).

Wiki xml parser - org.apache.spark.SparkException: Task not serializable

I am newbie to both scala and spark, and trying some of the tutorials, this one is from Advanced Analytics with Spark. The following code is supposed to work:
import com.cloudera.datascience.common.XmlInputFormat
import org.apache.hadoop.conf.Configuration
import org.apache.hadoop.io._
val path = "/home/petr/Downloads/wiki/wiki"
val conf = new Configuration()
conf.set(XmlInputFormat.START_TAG_KEY, "<page>")
conf.set(XmlInputFormat.END_TAG_KEY, "</page>")
val kvs = sc.newAPIHadoopFile(path, classOf[XmlInputFormat],
classOf[LongWritable], classOf[Text], conf)
val rawXmls = kvs.map(p => p._2.toString)
import edu.umd.cloud9.collection.wikipedia.language._
import edu.umd.cloud9.collection.wikipedia._
def wikiXmlToPlainText(xml: String): Option[(String, String)] = {
val page = new EnglishWikipediaPage()
WikipediaPage.readPage(page, xml)
if (page.isEmpty) None
else Some((page.getTitle, page.getContent))
}
val plainText = rawXmls.flatMap(wikiXmlToPlainText)
But it gives
scala> val plainText = rawXmls.flatMap(wikiXmlToPlainText)
org.apache.spark.SparkException: Task not serializable
at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:166)
at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:158)
at org.apache.spark.SparkContext.clean(SparkContext.scala:1622)
at org.apache.spark.rdd.RDD.flatMap(RDD.scala:295)
...
Running Spark v1.3.0 on a local (and I have loaded only about a 21MB of the wiki articles, just to test it).
All of https://stackoverflow.com/search?q=org.apache.spark.SparkException%3A+Task+not+serializable didn't get me any clue...
Thanks.
try
import com.cloudera.datascience.common.XmlInputFormat
import org.apache.hadoop.conf.Configuration
import org.apache.hadoop.io._
val path = "/home/terrapin/Downloads/enwiki-20150304-pages-articles1.xml-p000000010p000010000"
val conf = new Configuration()
conf.set(XmlInputFormat.START_TAG_KEY, "<page>")
conf.set(XmlInputFormat.END_TAG_KEY, "</page>")
val kvs = sc.newAPIHadoopFile(path, classOf[XmlInputFormat],
classOf[LongWritable], classOf[Text], conf)
val rawXmls = kvs.map(p => p._2.toString)
import edu.umd.cloud9.collection.wikipedia.language._
import edu.umd.cloud9.collection.wikipedia._
val plainText = rawXmls.flatMap{line =>
val page = new EnglishWikipediaPage()
WikipediaPage.readPage(page, line)
if (page.isEmpty) None
else Some((page.getTitle, page.getContent))
}
The first guess which comes to mind is that: all your code is wrapped in the object where SparkContext is defined. Spark tries to serialize this object to transfer wikiXmlToPlainText function to nodes. Try to create different object with the only one function wikiXmlToPlainText.

Unable to convert Spark RDD to Schema RDD

I am trying to execute the example provided in Spark programming guide.
https://spark.apache.org/docs/1.1.0/sql-programming-guide.html
But I am facing the compilation error.
(I am a Scala newbie)
Below is my code:
import org.apache.spark.{SparkContext,SparkConf}
import org.apache.spark.sql._
import org.apache.spark.sql
object Temp {
def main(args: Array[String]) {
val sparkConf = new SparkConf().setMaster("local").setAppName("SPARK SQL example")
val sc= new SparkContext(sparkConf)
val sqlContext = new org.apache.spark.sql.SQLContext(sc)
import sqlContext.createSchemaRDD
case class Person(name: String, age: Int)
val people = sc.textFile("examples/src/main/resources/people.txt").map(_.split(",")).map(p => Person(p(0), p(1).trim.toInt))
people.registerTempTable("people")
val teenagers = sqlContext.sql("SELECT name FROM people WHERE age >= 13 AND age <= 19")
teenagers.map(t => "Name: " + t(0)).collect().foreach(println)
}
}
I am facing the compilation error No TypeTag available for Person at the line people.registerTempTable("people").
How to resolve this error?
It is failing because the Person class is defined inside of the function and as such the Scala compiler will not create a TypeTag for the class. As Paul suggested you can move it out of the function to the top level.
I'll add that there is a JIRA to relax this restriction: https://issues.apache.org/jira/browse/SPARK-4842