Cannot resolve symbol mapReduceTriplets - scala

I am using Spark 2.2, Scala 2.11 and GraphX. When I try to compile the folloiwing code in Intellij, I get the error Cannot resolve symbol mapReduceTriplets:
val nodeWeightMapFunc = (e:EdgeTriplet[VD,Long]) => Iterator((e.srcId,e.attr), (e.dstId,e.attr))
val nodeWeightReduceFunc = (e1:Long,e2:Long) => e1+e2
val nodeWeights = graph.mapReduceTriplets(nodeWeightMapFunc,nodeWeightReduceFunc)
I was reading here that it's possible to substitute mapReduceTriplets with aggregateMessages, but it's unclear how exactly can I do it?

mapReduceTriplets belonged to legacy API and has been removed from the public API. Specifically if you check the current documentation:
In earlier versions of GraphX neighborhood aggregation was accomplished using the mapReduceTriplets operator

Related

Equivalent of scala.concurrent.util.Unsafe in Scala 2.12

I have created empty instance of my object and then initialise it using run time values. Implementation was based on scala.concurrent.util.Unsafe in Scala 2.11 and it worked fine.
I understand Unsafe is bad and hence has been deprecated in Scala 2.12.
If it's deprecated then what's equivalent of Unsafe in Scala 2.12?
Assuming you're running on a JVM where sun.misc.Unsafe is still available (this will limit which JVMs you can run on, but so did using scala.concurrent.util.Unsafe so no immediate loss):
val unsafeInstance = // use in place of Scala 2.11 usages of scala.concurrent.util.Unsafe.instance
classOf[sun.misc.Unsafe]
.getDeclaredFields
.filter(_.getType == classOf[sun.misc.Unsafe])
.headOption
.map { field =>
field.setAccessible(true)
field.get(null).asInstanceOf[sun.misc.Unsafe]
}
.getOrElse { throw new IllegalStateException("Can't find instance of sun.misc.Unsafe") }
Code is very slightly adapted from the Scala 2.11 source.
It's possible that this is an instance of spending so much time thinking about "could" that one didn't think about "should".

When would request.attrs.get in a Scala Play custom filter return None?

In an answer to "How to run filter on demand scala play framework", the following code is suggested:
// in your filter
val handlerDef: Option[HandlerDef] = request.attrs.get(Router.Attrs.HandlerDef)
I'm not sure what's happening here - is it safe to .get on this val (to get it out of the Option)? In what scenarios would this code result in a None (ie, when would Router.Attrs.HandlerDef not be present)?
I'm working with Scala and PlayFramework 2.6.
According to Route modifier tags
Please be aware that the HandlerDef request attribute exists only when
using a router generated by Play from a routes file. This attribute is
not added when the routes are defined in code, for example using the
Scala SIRD or Java RoutingDsl. In this case
request.attrs.get(HandlerDef) will return None in Scala or null in
Java. Keep this in mind when creating filters.
Hence if you are using routes file then Router.Attrs.HandlerDef should always be available. As a shorthand instead of
val handlerDef: HandlerDef = request.attrs.get(Router.Attrs.HandlerDef).get
your can use apply sugar like so
val handlerDef: HandlerDef = request.attrs(Router.Attrs.HandlerDef)

UnresolvedException: Invalid call to dataType on unresolved object when using DataSet constructed from Seq.empty (since Spark 2.3.0)

The following snippet works fine in Spark 2.2.1 but gives a rather cryptic runtime exception in Spark 2.3.0:
import sparkSession.implicits._
import org.apache.spark.sql.functions._
case class X(xid: Long, yid: Int)
case class Y(yid: Int, zid: Long)
case class Z(zid: Long, b: Boolean)
val xs = Seq(X(1L, 10)).toDS()
val ys = Seq(Y(10, 100L)).toDS()
val zs = Seq.empty[Z].toDS()
val j = xs
.join(ys, "yid")
.join(zs, Seq("zid"), "left")
.withColumn("BAM", when('b, "B").otherwise("NB"))
j.show()
In Spark 2.2.1 it prints to the console
+---+---+---+----+---+
|zid|yid|xid| b|BAM|
+---+---+---+----+---+
|100| 10| 1|null| NB|
+---+---+---+----+---+
In Spark 2.3.0 it results in:
org.apache.spark.sql.catalyst.analysis.UnresolvedException: Invalid call to dataType on unresolved object, tree: 'BAM
at org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute.dataType(unresolved.scala:105)
at org.apache.spark.sql.types.StructType$$anonfun$fromAttributes$1.apply(StructType.scala:435)
at org.apache.spark.sql.types.StructType$$anonfun$fromAttributes$1.apply(StructType.scala:435)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.immutable.List.foreach(List.scala:392)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
at scala.collection.immutable.List.map(List.scala:296)
at org.apache.spark.sql.types.StructType$.fromAttributes(StructType.scala:435)
at org.apache.spark.sql.catalyst.plans.QueryPlan.schema$lzycompute(QueryPlan.scala:157)
...
The culprit really seems to be Dataset being created from an empty Seq[Z]. When you change that to something that will also result in an empty Dataset[Z] it works as in Spark 2.2.1, e.g.
val zs = Seq(Z(10L, true)).toDS().filter('zid === 999L)
In the migration guide from 2.2 to 2.3 is mentioned:
Since Spark 2.3, the Join/Filter’s deterministic predicates that are after the first non-deterministic predicates are also pushed down/through the child operators, if possible. In prior Spark versions, these filters are not eligible for predicate pushdown.
Is this related, or a (known) bug?
#user9613318 there is a bug create by the OP but it is closed as "Cannot Reproduce" because the dev says that
I am not able to reproduce on current master. This must have been fixed.
but there is no reference to another underlying issue so it might remain a mystery.
I have worked this around on 2.3.0 by issuing someEmptyDataset.cache() right after the empty Dataset was created. OP's example didn't fail anymore like that (with zs.cache()), also the actual problem at work went away with this trick.
(As a side note, OP-s code doesn't fail for me on Spark 2.3.2 run locally. Though I don't see the related fix in 2.3.2 change log, so it's maybe due to some other differences in the environment...)

Scala/Spark can't match function

I'm trying to run the following command:
df = df.withColumn("DATATmp", to_date($"DATA", "yyyyMMdd"))
And getting this error:
<console>:34: error: too many arguments for method to_date: (e: org.apache.spark.sql.Column)org.apache.spark.sql.Column
How could I specify the exactly function to import? Has another way to avoid this error?
EDIT: Spark version 2.1
As can be seen in the detailed scaladoc, the to_date function with two parameters has been added in 2.2.0, whereas the one-argument version existed since 1.5.
If you are working with an older Spark version, either upgrade, or don't use this function.

Why KMeansModel.predict error has started to appear since Spark 1.0.1.?

I work with Scala (2.10.4 version) and Spark - I have moved to Spark 1.0.1. version and noticed one of my scripts is not working correctly now. It uses k-means method from the MLlib library in the following manner.
Assume I have a KMeansModel object named clusters:
scala> clusters.toString
res8: String = org.apache.spark.mllib.clustering.KMeansModel#689eab53
Here is my method in question and an error I receive while trying to compile it:
scala> def clustersSize(normData: RDD[Array[Double]]) = {
| normData.map(r => clusters.predict(r))
| }
<console>:28: error: overloaded method value predict with alternatives:
(points: org.apache.spark.api.java.JavaRDD[org.apache.spark.mllib.linalg.Vector])org.apache.spark.api.java.JavaRDD[Integer] <and>
(points: org.apache.spark.rdd.RDD[org.apache.spark.mllib.linalg.Vector])org.apache.spark.rdd.RDD[Int] <and>
(point: org.apache.spark.mllib.linalg.Vector)Int
cannot be applied to (Array[Double])
normData.map(r => clusters.predict(r))
The KMeansModel documentation clearly says that the predict function needs an argument of Array[Double] type and I think I do put (don't I?) an argument of such type to it. Thank you in advance for any suggestions on what am I doing wrong.
You're using Spark 1.0.1 but the documentation page you cite is for 0.9.0. Check the current documentation and you'll see that the API has changed. See the migration guide for background.