Mismatch in Spark Shell - scala

I am trying to count some data from a specific field in the Spark Shell using this:
dfEquipmenttorecover.where($"key_number"==="12884612884").count
But I get this error
<console>:51: error: type mismatch;
found : StringContext
required: ?{def $: ?}
Note that implicit conversions are not applicable because they are ambiguous:
both method StringToColumn in class SQLImplicits of type (sc: StringContext)spark.implicits.StringToColumn
and method StringToAttributeConversionHelper in trait ExpressionConversions of type (sc: StringContext)org.apache.spark.sql.catalyst.dsl.expressions.StringToAttributeConversionHelper
are possible conversion functions from StringContext to ?{def $: ?}
dfEquipmenttorecover.where($"key_number"==="12884612884").count
I'm using Scala version 2.11.12 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_201)
and Spark version 2.4.7.7.1.7.0-551 and imported org.apache.spark.sql.catalyst.dsl.expressions.StringToAttributeConversionHelper and org.apache.spark.sql.catalyst.dsl.expressions._
I have more libraries imported, but I don't know if this bug comes from that.

You need use org.apache.spark.sql.functions.lit for your string value. Becose === compare columns, and on the right you need column created by literal "12884612884".
import org.apache.spark.sql.functions.lit
dfEquipmenttorecover.where($"key_number"===lit("12884612884")).count

Related

value YpartialUnification is not a member of scala.tools.nsc.Settings

I'm trying to run scala cats in REPL. Following cat's instructions I have installed ammonite REPL and put following imports in predef.sc
nterp.configureCompiler(_.settings.YpartialUnification.value = true)
import $ivy.`org.typelevel::cats-core:2.2.0-M1`, cats.implicits._
I got this error when run amm.
predef.sc:1: value YpartialUnification is not a member of scala.tools.nsc.Settings
val res_0 = interp.configureCompiler(_.settings.YpartialUnification.value = true)
^
Compilation Failed
In Scala 2.13 partial unification is enabled by default and -Ypartial-unification flag has been removed by Partial unification unconditional; deprecate -Xexperimental #6309
Partial unification is now enabled unless -Xsource:2.12 is specified.
The -Ypartial-unification flag has been removed and the -Xexperimental
option, which is now redundant, has been deprecated.
thus the compiler no longer accepts -Ypartial-unification.

Trouble invoking command with quoted string using Scala's sys.process API

As can be seen in the following console session, the same command invoked from Scala produces different results than when run in the terminal.
~> scala
Welcome to Scala 2.12.6 (OpenJDK 64-Bit Server VM, Java 1.8.0_172).
Type in expressions for evaluation. Or try :help.
scala> import sys.process._
import sys.process._
scala> """emacsclient --eval '(+ 4 5)'""".!
*ERROR*: End of file during parsingres0: Int = 1
scala> :quit
~> emacsclient --eval '(+ 4 5)'
9
Has anyone encountered this issue and/or know of a work around?
I thought this may have been a library bug, so opened an issue as well: https://github.com/scala/bug/issues/10897
It seems that Scala's sys.process api doesn't support quoting. The following works: Seq("emacsclient", "--eval", "(+ 4 5)").!.

flink: sortPartition(0, Order.ASCENDING ) error: "not found: value Order"

I am running the following code and getting "error: not found: value Order"
I am not able to figure out a reason. What am I doing wrong?
version : Flink v 0.9.1 (hadoop 1) not using hadoop: Local execution shell: scala shell
Scala-Flink> val data_avg = data_split.map{x=> ((x._1), (x._2._2/x._2._1))}.sortPartition(1, Order.ASCENDING).setParallelism(1)
<console>:16: error: not found: value Order
val data_avg = data_split.map{x=> ((x._1), (x._2._2/x._2._1))}.sortPartition(0, Order.ASCENDING).setParallelism(1)
The problem is that the enum Order is not automatically imported by Flink's Scala shell. Therefore, you have to add the following import manually.
import org.apache.flink.api.common.operators.Order

Type mismatch when utilising a case class in a package object

I am receiving the following error when I try to run my code:
Error:(104, 63) type mismatch;
found : hydrant.spark.hydrant.spark.IPPortPair
required: hydrant.spark.(some other)hydrant.spark.IPPortPair
IPPortPair(it.getHeader.getDestinationIP, it.getHeader.getDestinationPort))
My code uses a case class defined in the package object spark to set up the IP/Port map for each connection.
The package object looks like this:
package object spark{
case class IPPortPair(ip:Long,port:Long)
}
And the code using the package object like the below:
package hydrant.spark
import java.io.{File,PrintStream}
object identifyCustomers{
……………
def mapCustomers(suspectTraffic:RDD[Generic])={
suspectTraffic.filter(
it => !it.getHeader.isEmtpy
).map(
it => IPPortPair(it.getHeader.getDestinationIP,it.getHeader.getDestinationPort)
) ^`
}
I am concious about the strange way that my packages are being displayed as the error makes it seem that I am in hydrant.spark.hydrant.spark which does not exist.
I am also using Intellij if that makes a difference.
You need to run sbt clean (or the IntelliJ equivalent). You changed something in the project (e.g. Scala version) and this is how the incompatibility manifests.

How to use JDBC from within the Spark/Scala interpreter (REPL)?

I'm attempting to access a database in the Scala interpreter for Spark, but am having no success.
First, I have imported the DriverManager, and I have added my SQL Server JDBC driver to the class path with the following commands:
scala> import java.sql._
import java.sql._
scala> :cp sqljdbc41.jar
The REPL crashes with a long dump message:
Added 'C:\spark\sqljdbc41.jar'. Your new classpath is:
";;C:\spark\bin\..\conf;C:\spark\bin\..\lib\spark-assembly-1.1.1-hadoop2.4.0.jar;;C:\spark\bin\..\lib\datanucleus-api-jdo-3.2.1.jar;C:\spark\bin\..\lib\datanucleus-core-3.2.2.jar;C:\spark\bin\..\lib\datanucleus-rdbms-3.2.1.jar;;C:\spark\sqljdbc41.jar"
Replaying: import java.sql._
error:
while compiling: <console>
during phase: jvm
library version: version 2.10.4
compiler version: version 2.10.4
reconstructed args:
last tree to typer: Apply(constructor $read)
symbol: constructor $read in class $read (flags: <method> <triedcooking>)
symbol definition: def <init>(): $line10.$read
tpe: $line10.$read
symbol owners: constructor $read -> class $read -> package $line10
context owners: class iwC -> package $line10
== Enclosing template or block ==
Template( // val <local $iwC>: <notype>, tree.tpe=$line10.iwC
"java.lang.Object", "scala.Serializable" // parents
ValDef(
private
"_"
<tpt>
<empty>
)
...
== Expanded type of tree ==
TypeRef(TypeSymbol(class $read extends Serializable))
uncaught exception during compilation: java.lang.AssertionError
java.lang.AssertionError: assertion failed: Tried to find '$line10' in 'C:\Users\Username\AppData\Local\Temp\spark-28055904-e7d2-4052-9354-ae3769266cb4' but it is not a directory
That entry seems to have slain the compiler. Shall I replay
your session? I can re-run each line except the last one.
I am able to run a Scala program with the driver and everything works just fine.
How can I initialize my REPL to allow me to access data from SQL Server through JDBC?
It looks like the interactive :cp command does not work in Windows. But I found that if I launch the spark shell using the following command, the JDBC driver is loaded and available:
C:\spark> .\bin\spark-shell --jars sqljdbc41.jar
In this case, I had copied my jar file into the C:\spark folder.
(Also, one can also get help to see other commands available at launch using --help.)