Getting error after upgarding scala to version 2.13.6 - scala

I am getting this error after i upgraded scala version to 2.13.6:
value ++ is not a member of java.util.LinkedHashMap[K,V]
Following is the line which is throwing an error:
case l => Dsl.map(new LinkedHashMap() ++ l)

You probably need to check the import statements in your file and make sure that you're importing the appropriate Scala collection. The ++ operator is defined on the Scala implementation of Map.

Related

type mismatch errors when upgrading from scala 2.9 to 2.13.2

I recently revived an old library that was written in scala 2.9, and I created a new scala project using scala 2.13.2
I am getting errors like the following:
type mismatch;
found : scala.collection.mutable.Buffer[Any]
[error] required: Seq[Any]
Was there a specific change between 2.9 to 2.13.2 that involved not implicitly casting sequences or something that might solve many of these types of compile errors?
I had to add .toSeq to many of my function return statements that were vals of Buffer[Any] that needed to be passed as an arguement to a function expected a Sequence.
Quite a lot things happened in the last 7+ years (including rewrite of the collections library).
If adding .toSeq solves your problem - just go for it.
If you want to know what exactly has changed - try upgrading version-by version: first upgrade to scala-2.10., then to 2.11., then 2.12.*, then, finally, to 2.13.2.
At each upgrade you'll probably see deprecation warnings. Fix them before upgrading to the next version.
Brave, but perhaps bad form, to disturb the dead. Nevertheless, maybe pass mutable.Buffer as mutable.Seq instead of Seq which is by default immutable.Seq. Consider
val mb = mutable.Buffer(11, Some(42))
val ms: mutable.Seq[Any] = mb // OK
val is: Seq[Any] = mb // NOK

UnresolvedException: Invalid call to dataType on unresolved object when using DataSet constructed from Seq.empty (since Spark 2.3.0)

The following snippet works fine in Spark 2.2.1 but gives a rather cryptic runtime exception in Spark 2.3.0:
import sparkSession.implicits._
import org.apache.spark.sql.functions._
case class X(xid: Long, yid: Int)
case class Y(yid: Int, zid: Long)
case class Z(zid: Long, b: Boolean)
val xs = Seq(X(1L, 10)).toDS()
val ys = Seq(Y(10, 100L)).toDS()
val zs = Seq.empty[Z].toDS()
val j = xs
.join(ys, "yid")
.join(zs, Seq("zid"), "left")
.withColumn("BAM", when('b, "B").otherwise("NB"))
j.show()
In Spark 2.2.1 it prints to the console
+---+---+---+----+---+
|zid|yid|xid| b|BAM|
+---+---+---+----+---+
|100| 10| 1|null| NB|
+---+---+---+----+---+
In Spark 2.3.0 it results in:
org.apache.spark.sql.catalyst.analysis.UnresolvedException: Invalid call to dataType on unresolved object, tree: 'BAM
at org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute.dataType(unresolved.scala:105)
at org.apache.spark.sql.types.StructType$$anonfun$fromAttributes$1.apply(StructType.scala:435)
at org.apache.spark.sql.types.StructType$$anonfun$fromAttributes$1.apply(StructType.scala:435)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.immutable.List.foreach(List.scala:392)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
at scala.collection.immutable.List.map(List.scala:296)
at org.apache.spark.sql.types.StructType$.fromAttributes(StructType.scala:435)
at org.apache.spark.sql.catalyst.plans.QueryPlan.schema$lzycompute(QueryPlan.scala:157)
...
The culprit really seems to be Dataset being created from an empty Seq[Z]. When you change that to something that will also result in an empty Dataset[Z] it works as in Spark 2.2.1, e.g.
val zs = Seq(Z(10L, true)).toDS().filter('zid === 999L)
In the migration guide from 2.2 to 2.3 is mentioned:
Since Spark 2.3, the Join/Filter’s deterministic predicates that are after the first non-deterministic predicates are also pushed down/through the child operators, if possible. In prior Spark versions, these filters are not eligible for predicate pushdown.
Is this related, or a (known) bug?
#user9613318 there is a bug create by the OP but it is closed as "Cannot Reproduce" because the dev says that
I am not able to reproduce on current master. This must have been fixed.
but there is no reference to another underlying issue so it might remain a mystery.
I have worked this around on 2.3.0 by issuing someEmptyDataset.cache() right after the empty Dataset was created. OP's example didn't fail anymore like that (with zs.cache()), also the actual problem at work went away with this trick.
(As a side note, OP-s code doesn't fail for me on Spark 2.3.2 run locally. Though I don't see the related fix in 2.3.2 change log, so it's maybe due to some other differences in the environment...)

Scala/Spark can't match function

I'm trying to run the following command:
df = df.withColumn("DATATmp", to_date($"DATA", "yyyyMMdd"))
And getting this error:
<console>:34: error: too many arguments for method to_date: (e: org.apache.spark.sql.Column)org.apache.spark.sql.Column
How could I specify the exactly function to import? Has another way to avoid this error?
EDIT: Spark version 2.1
As can be seen in the detailed scaladoc, the to_date function with two parameters has been added in 2.2.0, whereas the one-argument version existed since 1.5.
If you are working with an older Spark version, either upgrade, or don't use this function.

Where does Scala's 'NotInferedT' come from?

In Scala, we can get a compile error with a message containing 'NotInferedT'. For example :
expected: (NotInferedT, NotInferedT) => Boolean, actual: (Nothing, Nothing)
(as seen here ).
This messge is coming from the Scala compiler, and it appears to mean that Scala cannot infer a type. But is 'NotInferedT' itself a type ? And is it described in the Scala documentation somewhere ?
I can't find 'NotInferedT' in the Scala API docs .
It's the way the Scala plugin (which is basically a Scala compiler) for IntelliJ IDEA names an undefined type it can't resolve:
case UndefinedType(tpt, _) => "NotInfered" + tpt.name

Cannot resolve symbol mapReduceTriplets

I am using Spark 2.2, Scala 2.11 and GraphX. When I try to compile the folloiwing code in Intellij, I get the error Cannot resolve symbol mapReduceTriplets:
val nodeWeightMapFunc = (e:EdgeTriplet[VD,Long]) => Iterator((e.srcId,e.attr), (e.dstId,e.attr))
val nodeWeightReduceFunc = (e1:Long,e2:Long) => e1+e2
val nodeWeights = graph.mapReduceTriplets(nodeWeightMapFunc,nodeWeightReduceFunc)
I was reading here that it's possible to substitute mapReduceTriplets with aggregateMessages, but it's unclear how exactly can I do it?
mapReduceTriplets belonged to legacy API and has been removed from the public API. Specifically if you check the current documentation:
In earlier versions of GraphX neighborhood aggregation was accomplished using the mapReduceTriplets operator