I am trying to count some data from a specific field in the Spark Shell using this:
dfEquipmenttorecover.where($"key_number"==="12884612884").count
But I get this error
<console>:51: error: type mismatch;
found : StringContext
required: ?{def $: ?}
Note that implicit conversions are not applicable because they are ambiguous:
both method StringToColumn in class SQLImplicits of type (sc: StringContext)spark.implicits.StringToColumn
and method StringToAttributeConversionHelper in trait ExpressionConversions of type (sc: StringContext)org.apache.spark.sql.catalyst.dsl.expressions.StringToAttributeConversionHelper
are possible conversion functions from StringContext to ?{def $: ?}
dfEquipmenttorecover.where($"key_number"==="12884612884").count
I'm using Scala version 2.11.12 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_201)
and Spark version 2.4.7.7.1.7.0-551 and imported org.apache.spark.sql.catalyst.dsl.expressions.StringToAttributeConversionHelper and org.apache.spark.sql.catalyst.dsl.expressions._
I have more libraries imported, but I don't know if this bug comes from that.
You need use org.apache.spark.sql.functions.lit for your string value. Becose === compare columns, and on the right you need column created by literal "12884612884".
import org.apache.spark.sql.functions.lit
dfEquipmenttorecover.where($"key_number"===lit("12884612884")).count
Related
I'm trying to run scala cats in REPL. Following cat's instructions I have installed ammonite REPL and put following imports in predef.sc
nterp.configureCompiler(_.settings.YpartialUnification.value = true)
import $ivy.`org.typelevel::cats-core:2.2.0-M1`, cats.implicits._
I got this error when run amm.
predef.sc:1: value YpartialUnification is not a member of scala.tools.nsc.Settings
val res_0 = interp.configureCompiler(_.settings.YpartialUnification.value = true)
^
Compilation Failed
In Scala 2.13 partial unification is enabled by default and -Ypartial-unification flag has been removed by Partial unification unconditional; deprecate -Xexperimental #6309
Partial unification is now enabled unless -Xsource:2.12 is specified.
The -Ypartial-unification flag has been removed and the -Xexperimental
option, which is now redundant, has been deprecated.
thus the compiler no longer accepts -Ypartial-unification.
As can be seen in the following console session, the same command invoked from Scala produces different results than when run in the terminal.
~> scala
Welcome to Scala 2.12.6 (OpenJDK 64-Bit Server VM, Java 1.8.0_172).
Type in expressions for evaluation. Or try :help.
scala> import sys.process._
import sys.process._
scala> """emacsclient --eval '(+ 4 5)'""".!
*ERROR*: End of file during parsingres0: Int = 1
scala> :quit
~> emacsclient --eval '(+ 4 5)'
9
Has anyone encountered this issue and/or know of a work around?
I thought this may have been a library bug, so opened an issue as well: https://github.com/scala/bug/issues/10897
It seems that Scala's sys.process api doesn't support quoting. The following works: Seq("emacsclient", "--eval", "(+ 4 5)").!.
I am running the following code and getting "error: not found: value Order"
I am not able to figure out a reason. What am I doing wrong?
version : Flink v 0.9.1 (hadoop 1) not using hadoop: Local execution shell: scala shell
Scala-Flink> val data_avg = data_split.map{x=> ((x._1), (x._2._2/x._2._1))}.sortPartition(1, Order.ASCENDING).setParallelism(1)
<console>:16: error: not found: value Order
val data_avg = data_split.map{x=> ((x._1), (x._2._2/x._2._1))}.sortPartition(0, Order.ASCENDING).setParallelism(1)
The problem is that the enum Order is not automatically imported by Flink's Scala shell. Therefore, you have to add the following import manually.
import org.apache.flink.api.common.operators.Order
I am receiving the following error when I try to run my code:
Error:(104, 63) type mismatch;
found : hydrant.spark.hydrant.spark.IPPortPair
required: hydrant.spark.(some other)hydrant.spark.IPPortPair
IPPortPair(it.getHeader.getDestinationIP, it.getHeader.getDestinationPort))
My code uses a case class defined in the package object spark to set up the IP/Port map for each connection.
The package object looks like this:
package object spark{
case class IPPortPair(ip:Long,port:Long)
}
And the code using the package object like the below:
package hydrant.spark
import java.io.{File,PrintStream}
object identifyCustomers{
……………
def mapCustomers(suspectTraffic:RDD[Generic])={
suspectTraffic.filter(
it => !it.getHeader.isEmtpy
).map(
it => IPPortPair(it.getHeader.getDestinationIP,it.getHeader.getDestinationPort)
) ^`
}
I am concious about the strange way that my packages are being displayed as the error makes it seem that I am in hydrant.spark.hydrant.spark which does not exist.
I am also using Intellij if that makes a difference.
You need to run sbt clean (or the IntelliJ equivalent). You changed something in the project (e.g. Scala version) and this is how the incompatibility manifests.
I'm attempting to access a database in the Scala interpreter for Spark, but am having no success.
First, I have imported the DriverManager, and I have added my SQL Server JDBC driver to the class path with the following commands:
scala> import java.sql._
import java.sql._
scala> :cp sqljdbc41.jar
The REPL crashes with a long dump message:
Added 'C:\spark\sqljdbc41.jar'. Your new classpath is:
";;C:\spark\bin\..\conf;C:\spark\bin\..\lib\spark-assembly-1.1.1-hadoop2.4.0.jar;;C:\spark\bin\..\lib\datanucleus-api-jdo-3.2.1.jar;C:\spark\bin\..\lib\datanucleus-core-3.2.2.jar;C:\spark\bin\..\lib\datanucleus-rdbms-3.2.1.jar;;C:\spark\sqljdbc41.jar"
Replaying: import java.sql._
error:
while compiling: <console>
during phase: jvm
library version: version 2.10.4
compiler version: version 2.10.4
reconstructed args:
last tree to typer: Apply(constructor $read)
symbol: constructor $read in class $read (flags: <method> <triedcooking>)
symbol definition: def <init>(): $line10.$read
tpe: $line10.$read
symbol owners: constructor $read -> class $read -> package $line10
context owners: class iwC -> package $line10
== Enclosing template or block ==
Template( // val <local $iwC>: <notype>, tree.tpe=$line10.iwC
"java.lang.Object", "scala.Serializable" // parents
ValDef(
private
"_"
<tpt>
<empty>
)
...
== Expanded type of tree ==
TypeRef(TypeSymbol(class $read extends Serializable))
uncaught exception during compilation: java.lang.AssertionError
java.lang.AssertionError: assertion failed: Tried to find '$line10' in 'C:\Users\Username\AppData\Local\Temp\spark-28055904-e7d2-4052-9354-ae3769266cb4' but it is not a directory
That entry seems to have slain the compiler. Shall I replay
your session? I can re-run each line except the last one.
I am able to run a Scala program with the driver and everything works just fine.
How can I initialize my REPL to allow me to access data from SQL Server through JDBC?
It looks like the interactive :cp command does not work in Windows. But I found that if I launch the spark shell using the following command, the JDBC driver is loaded and available:
C:\spark> .\bin\spark-shell --jars sqljdbc41.jar
In this case, I had copied my jar file into the C:\spark folder.
(Also, one can also get help to see other commands available at launch using --help.)