I recently updated my Spark version from 2.2 to 2.4.0
I started having an error in this block (which was working fine with 2.2 version):
object Crud_mod {
def f(df: DataFrame,
options: JDBCOptions,
conditions: List[String]) {
val url = options.url
val tables = options.table
val dialect = JdbcDialects_mod.get(url)
error: value table is not a member of org.apache.spark.sql.execution.datasources.jdbc.JDBCOptions
[ERROR] val tables = options.table
So I took a look inside Spark sources and value table seems to exist in JDBCOptions class.
What am I missing please?
Your sources link is pointing to a constructor, that accepts table as an argument, but I can't find table value in class itself.
However, there's tableOrQuery (here) method, that can be used for your needs, I think.
Related
We're migrating our MLpipeline from Spark 2.4(scala 2.11.11) to Spark 3.3.0(scala 2.12.17) we were not able to read the existing MLModel with spark 3.
This is because scala won’t support BC with major release upgrade.
Seems this is not a Spark issue, As I was able to load the PipelineModel.
The issue that I am facing now is we're storing a case class as binary object which is having Map datatype in scala 2.11.11 and trying to read that object file back with scala 2.12.17.
While doing that I am getting the below exception,
java.io.InvalidClassException: scala.collection.immutable.Map$Map4; local class incompatible: stream classdesc serialVersionUID = -7746713615561179240, local class serialVersionUID = -7992135791595275193
Sample case class(just adding one column here):
case class LearningModelOutput(
transformerState: Map[String, Any])
The object that we're storing is like
0 = {Tuple2#8285} (targetColumn,)
1 = {Tuple2#8286} (consideredColumns,[Ljava.lang.String;#35b49c11)
2 = {Tuple2#8287} (schema,StructType(StructField(col1,IntegerType,true), StructField(col2,LongType,true
3 = {Tuple2#8288} (transformerStages,Map())
When trying to read this objectfile using spark.sparkcontext.objectfileMap[String, Any]
getting the above mentioned serialization error.
The following snippet works fine in Spark 2.2.1 but gives a rather cryptic runtime exception in Spark 2.3.0:
import sparkSession.implicits._
import org.apache.spark.sql.functions._
case class X(xid: Long, yid: Int)
case class Y(yid: Int, zid: Long)
case class Z(zid: Long, b: Boolean)
val xs = Seq(X(1L, 10)).toDS()
val ys = Seq(Y(10, 100L)).toDS()
val zs = Seq.empty[Z].toDS()
val j = xs
.join(ys, "yid")
.join(zs, Seq("zid"), "left")
.withColumn("BAM", when('b, "B").otherwise("NB"))
j.show()
In Spark 2.2.1 it prints to the console
+---+---+---+----+---+
|zid|yid|xid| b|BAM|
+---+---+---+----+---+
|100| 10| 1|null| NB|
+---+---+---+----+---+
In Spark 2.3.0 it results in:
org.apache.spark.sql.catalyst.analysis.UnresolvedException: Invalid call to dataType on unresolved object, tree: 'BAM
at org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute.dataType(unresolved.scala:105)
at org.apache.spark.sql.types.StructType$$anonfun$fromAttributes$1.apply(StructType.scala:435)
at org.apache.spark.sql.types.StructType$$anonfun$fromAttributes$1.apply(StructType.scala:435)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.immutable.List.foreach(List.scala:392)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
at scala.collection.immutable.List.map(List.scala:296)
at org.apache.spark.sql.types.StructType$.fromAttributes(StructType.scala:435)
at org.apache.spark.sql.catalyst.plans.QueryPlan.schema$lzycompute(QueryPlan.scala:157)
...
The culprit really seems to be Dataset being created from an empty Seq[Z]. When you change that to something that will also result in an empty Dataset[Z] it works as in Spark 2.2.1, e.g.
val zs = Seq(Z(10L, true)).toDS().filter('zid === 999L)
In the migration guide from 2.2 to 2.3 is mentioned:
Since Spark 2.3, the Join/Filter’s deterministic predicates that are after the first non-deterministic predicates are also pushed down/through the child operators, if possible. In prior Spark versions, these filters are not eligible for predicate pushdown.
Is this related, or a (known) bug?
#user9613318 there is a bug create by the OP but it is closed as "Cannot Reproduce" because the dev says that
I am not able to reproduce on current master. This must have been fixed.
but there is no reference to another underlying issue so it might remain a mystery.
I have worked this around on 2.3.0 by issuing someEmptyDataset.cache() right after the empty Dataset was created. OP's example didn't fail anymore like that (with zs.cache()), also the actual problem at work went away with this trick.
(As a side note, OP-s code doesn't fail for me on Spark 2.3.2 run locally. Though I don't see the related fix in 2.3.2 change log, so it's maybe due to some other differences in the environment...)
I'm trying to set up database columns in Slick with non-primitive objects. I've spent the past day researching MappedColumnType for mapping custom objects to columns, and as far as I can tell I'm implementing them as people recommend. Unfortunately, the following code produces an error:
implicit val localDateMapper = MappedColumnType.base[LocalDate, String]
(
//map date to String
d => d.toString,
//map String to date
s => LocalDate.parse(s)
)
And here is the error:
could not find implicit value for evidence parameter of type slick.driver.H2Driver.BaseColumnType[String]
I've seen multiple examples where people map custom objects to and from Strings. I figure there must be something I'm missing?
For reference, I'm using Play Slick 1.1.1 and Scala 2.11.6. The former supports Slick 3.1.
You can import a BaseColumnType[String] with:
import slick.driver.H2Driver.api.stringColumnType
I am encountering a compile time error while attempting to get Squeryl example code running. The following code is based on the My Adventures in Coding blog post about connecting to SQLServer using Squeryl.
import org.squeryl.adapters.MSSQLServer
import org.squeryl.{ SessionFactory, Session}
import com.company.model.Consumer
class SandBox {
def tester() = {
val databaseConnectionUrl = "jdbc:jtds:sqlserver://myservername;DatabaseName=mydatabasename"
val databaseUsername = "userName"
val databasePassword = "password"
Class.forName("net.sourceforge.jtds.jdbc.Driver")
SessionFactory.concreteFactory = Some(()=>
Session.create(
java.sql.DriverManager.getConnection(databaseConnectionUrl, databaseUsername, databasePassword),
new MSSQLServer))
val consumers = table[Consumer]("Consumer")
}
}
I believe I have the build.sbt file configured correctly to import the Squeryl & JTDS libraries. When running SBT after adding the dependencies it appeared to download the libraries need.
libraryDependencies ++= List (
"org.squeryl" %% "squeryl" % "0.9.5-6",
"net.sourceforge.jtds" % "jtds" % "1.2.4",
Company.teamcityDepend("company-services-libs"),
Company.teamcityDepend("ssolibrary")
) ::: Company.teamcityConfDepend("company-services-common", "test,gatling")
I am certain that at least some of the dependencies were successfully installed. I base this on the fact that the SessionFactory code block compiles successfully. It is only the line that attempts to setup a map from the Consumer class to the Consumer SQLServer table.
val consumers = table[Consumer]("Consumer")
This line causes a compile time error to be thrown. The compile is not able to find the table object.
[info] Compiling 8 Scala sources to /Users/pbeacom/Company/integration/target/scala-2.10/classes...
[error] /Users/pbeacom/Company/integration/src/main/scala/com/company/code/SandBox.scala:25: not found: value table
[error] val consumers = table[Consumer]("Consumer")
The version of Scala in use is 2.10 and if the table line is commented the code compiles successfully. Use of the table object to accomplish data model mappings is nearly ubiquitous in the Squeryl examples I'm been researching online and no one else seems to have encountered a similar problem.
Shortly after posting this and reviewing it I finally noticed my problem. I was not being conscious enough of the heavy use of mixins in Scala. I failed to extend the Schema class. That's why table is unknown in the scope of the SandBox class. I was able to solve the problem with the following class definition:
class SandBox extends Schema {
def tester() = {
...
I work with Scala (2.10.4 version) and Spark - I have moved to Spark 1.0.1. version and noticed one of my scripts is not working correctly now. It uses k-means method from the MLlib library in the following manner.
Assume I have a KMeansModel object named clusters:
scala> clusters.toString
res8: String = org.apache.spark.mllib.clustering.KMeansModel#689eab53
Here is my method in question and an error I receive while trying to compile it:
scala> def clustersSize(normData: RDD[Array[Double]]) = {
| normData.map(r => clusters.predict(r))
| }
<console>:28: error: overloaded method value predict with alternatives:
(points: org.apache.spark.api.java.JavaRDD[org.apache.spark.mllib.linalg.Vector])org.apache.spark.api.java.JavaRDD[Integer] <and>
(points: org.apache.spark.rdd.RDD[org.apache.spark.mllib.linalg.Vector])org.apache.spark.rdd.RDD[Int] <and>
(point: org.apache.spark.mllib.linalg.Vector)Int
cannot be applied to (Array[Double])
normData.map(r => clusters.predict(r))
The KMeansModel documentation clearly says that the predict function needs an argument of Array[Double] type and I think I do put (don't I?) an argument of such type to it. Thank you in advance for any suggestions on what am I doing wrong.
You're using Spark 1.0.1 but the documentation page you cite is for 0.9.0. Check the current documentation and you'll see that the API has changed. See the migration guide for background.