i tried to install spark in Mac using Homebrew. I did all the steps as mentioned in https://sparkbyexamples.com/spark/install-apache-spark-on-mac/ . However, when i want to vallidate the spark installation from shell cmd, i got this following output: How to do this, i already did reinstallation but nothing changed. Thank you.
scala> import spark.implicits._
import spark.implicits._
scala> val data = Seq(("Java", "20000"), ("Python", "100000"), ("Scala", "3000"))
data: Seq[(String, String)] = List((Java,20000), (Python,100000), (Scala,3000))
scala> val df = data.toDF()
java.lang.NoSuchMethodError: 'boolean org.apache.spark.util.Utils$.isInRunningSparkTask()'
at org.apache.spark.sql.internal.SQLConf$.get(SQLConf.scala:201)
at org.apache.spark.sql.types.DataType.sameType(DataType.scala:99)
at org.apache.spark.sql.catalyst.analysis.TypeCoercionBase.$anonfun$haveSameType$1(TypeCoercion.scala:157)
at org.apache.spark.sql.catalyst.analysis.TypeCoercionBase.$anonfun$haveSameType$1$adapted(TypeCoercion.scala:157)
at scala.collection.LinearSeqOptimized.forall(LinearSeqOptimized.scala:85)
at scala.collection.LinearSeqOptimized.forall$(LinearSeqOptimized.scala:82)
at scala.collection.immutable.List.forall(List.scala:91)
at org.apache.spark.sql.catalyst.analysis.TypeCoercionBase.haveSameType(TypeCoercion.scala:157)
at org.apache.spark.sql.catalyst.expressions.ComplexTypeMergingExpression.dataTypeCheck(Expression.scala:1124)
at org.apache.spark.sql.catalyst.expressions.ComplexTypeMergingExpression.dataTypeCheck$(Expression.scala:1119)
at org.apache.spark.sql.catalyst.expressions.If.dataTypeCheck(conditionalExpressions.scala:39)
at org.apache.spark.sql.catalyst.expressions.ComplexTypeMergingExpression.org$apache$spark$sql$catalyst$expressions$ComplexTypeMergingExpression$$internalDataType(Expression.scala:1130)
at org.apache.spark.sql.catalyst.expressions.ComplexTypeMergingExpression.org$apache$spark$sql$catalyst$expressions$ComplexTypeMergingExpression$$internalDataType$(Expression.scala:1129)
at org.apache.spark.sql.catalyst.expressions.If.org$apache$spark$sql$catalyst$expressions$ComplexTypeMergingExpression$$internalDataType$lzycompute(conditionalExpressions.scala:39)
at org.apache.spark.sql.catalyst.expressions.If.org$apache$spark$sql$catalyst$expressions$ComplexTypeMergingExpression$$internalDataType(conditionalExpressions.scala:39)
at org.apache.spark.sql.catalyst.expressions.ComplexTypeMergingExpression.dataType(Expression.scala:1134)
at org.apache.spark.sql.catalyst.expressions.ComplexTypeMergingExpression.dataType$(Expression.scala:1134)
at org.apache.spark.sql.catalyst.expressions.If.dataType(conditionalExpressions.scala:39)
at org.apache.spark.sql.catalyst.encoders.ExpressionEncoder.isSerializedAsStruct(ExpressionEncoder.scala:306)
at org.apache.spark.sql.catalyst.encoders.ExpressionEncoder.isSerializedAsStructForTopLevel(ExpressionEncoder.scala:316)
at org.apache.spark.sql.catalyst.encoders.ExpressionEncoder.<init>(ExpressionEncoder.scala:245)
at org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$.apply(ExpressionEncoder.scala:61)
at org.apache.spark.sql.Encoders$.product(Encoders.scala:300)
at org.apache.spark.sql.LowPrioritySQLImplicits.newProductEncoder(SQLImplicits.scala:261)
at org.apache.spark.sql.LowPrioritySQLImplicits.newProductEncoder$(SQLImplicits.scala:261)
at org.apache.spark.sql.SQLImplicits.newProductEncoder(SQLImplicits.scala:32)
... 49 elided
scala> df.show()
<console>:26: error: not found: value df
df.show()
^
The output of df.show().
Related
I have defined a variable like this in my scala notebook .
import java.time.{LocalDate, LocalDateTime, ZoneId, ZoneOffset, Duration}
val fiscalYearStartDate = LocalDate.of(fiscalStartYear,7,1);
I would like to add this as column to my dataFrame.
SomeDF.lit(fiscalYearStartDate ).cast("date").as("fiscalYearStartDate")
This is throwing an error .
java.lang.RuntimeException: Unsupported literal type class java.time.LocalDate 2020-10-01
Spark SQL DateType eqivalent in Scala is java.sql.Date and as result solution could be on of:
val finalDF = SomeDF.withColumn("fiscalYearStartDate", lit(fiscalYearStartDate.toString).cast("Date"))
or
val finalDF = SomeDF.withColumn("fiscalYearStartDate", lit(fiscalYearStartDate.format(DateTimeFormatter.ofPattern("yyyy-MM-dd")).cast("Date"))
or
import java.sql.Date
val finalDF = SomeDF.withColumn("fiscalYearStartDate", lit(Date.valueOf(fiscalYearStartDate)))
This question already has answers here:
Spark 2.0 Scala - RDD.toDF()
(4 answers)
Closed 2 years ago.
I am new to Spark and I am trying to run the below commands both from spark-shell and spark scala eclipse ide
When I ran it from shell , it perfectly works .
But in ide , it gives the compilation error.
Please help
package sparkWCExample.spWCExample
import org.apache.log4j.Level
import org.apache.spark.sql.{ Dataset, SparkSession, DataFrame, Row }
import org.apache.spark.sql.functions._
import org.apache.spark.SparkContext
import org.apache.spark.SparkConf
import org.apache.spark.sql._
object TwitterDatawithDataset {
def main(args: Array[String]) {
val conf = new SparkConf()
.setAppName("Spark Scala WordCount Example")
.setMaster("local[1]")
val spark = SparkSession.builder()
.config(conf)
.appName("CsvExample")
.master("local")
.getOrCreate()
val csvData = spark.sparkContext
.textFile("C:\\Sankha\\Study\\data\\bank_data.csv", 3)
val sqlContext = new org.apache.spark.sql.SQLContext(sc)
import sqlContext.implicits._
case class Bank(age: Int, job: String)
val bankDF = dfData.map(x => Bank(x(0).toInt, x(1)))
val df = bankDF.toDF()
}
}
Exception is as below on compile time itself
Description Resource Path Location Type
value toDF is not a member of org.apache.spark.rdd.RDD[Bank] TwitterDatawithDataset.scala /spWCExample/src/main/java/sparkWCExample/spWCExample line 35 Scala Problem
To toDF(), you must enable implicit conversions:
import spark.implicits._
In spark-shell, it is enabled by default and that's why the code works there. :imports command can be used to see what imports are already present in your shell:
scala> :imports
1) import org.apache.spark.SparkContext._ (70 terms, 1 are implicit)
2) import spark.implicits._ (1 types, 67 terms, 37 are implicit)
3) import spark.sql (1 terms)
4) import org.apache.spark.sql.functions._ (385 terms)
This works fine for me in Eclipse Scala IDE:
case class Bank(age: Int, job: String)
val u = Array((1, "manager"), (2, "clerk"))
import spark.implicits._
spark.sparkContext.makeRDD(u).map(r => Bank(r._1, r._2)).toDF().show()
I am trying to run wordcount program in scala. Here's how my code looks like.
package myspark;
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf
import org.apache.spark.implicits._
object WordCount {
def main(args: Array[String]) {
val sc = new SparkContext( "local", "Word Count", "/home/hadoop/spark-2.2.0-bin-hadoop2.7/bin", Nil, Map(), Map())
val input = sc.textFile("/myspark/input.txt")
Val count = input.flatMap(line ⇒ line.split(" "))
.map(word ⇒ (word, 1))
.reduceByKey(_ + _)
count.saveAsTextFile("outfile")
System.out.println("OK");
}
}
Then I tried to execute it in spark.
spark-shell -i /myspark/WordCount.scala
And I get this error.
... 149 more
<console>:14: error: not found: value spark
import spark.implicits._
^
<console>:14: error: not found: value spark
import spark.sql
^
That file does not exist
Can someone please explain the error in this code? I am very new to Spark and Scala both. I have verified that the input.txt file is in the mentioned location.
You can take a look here to get started : Learning Spark-WordCount
Other than that there are many a errors that I can see
import org.apache.spark..implicits._: the two dots wont work
Other than that have you added spark-dependency in your project ? Maybe even as provided ? You must do that atleast to run the spark code.
First of all check whether you have added the right dependencies . An i can see you did few mistake in your code .
create Sparksession not Sparkcontext SparkSessionAPI
import org.apache.spark.sql.SparkSession
val spark = SparkSession
.builder()
.appName("Spark SQL basic example")
.config("spark.some.config.option", "some-value")
.getOrCreate()
Then use this spark variable
import spark.implicits._
I am not sure why you have mentioned import org.apache.spark..implicits._ 2 dot between the spark..implicits
I am using scala, spark, IntelliJ and maven.
I have used below code :
val joinCondition = when($"exp.fnal_expr_dt" >= $"exp.nonfnal_expr_dt",
$"exp.manr_cd"===$"score.MANR_CD")
val score = exprDF.as("exp").join(scoreDF.as("score"),joinCondition,"inner")
and
val score= list.withColumn("scr", lit(0))
But when try to build using maven, getting below errors -
error: not found: value when
and
error: not found: value lit
For $ and === I have used import sqlContext.implicits.StringToColumn and it is working fine. No error occurred at the time of maven build.But for lit(0) and when what I need to import or is there any other way resolve the issue.
Let's consider the following context :
val spark : SparkSession = _ // or val sqlContext: SQLContext = new SQLContext(sc) for 1.x
val list: DataFrame = ???
To use when and lit, you'll need to import the proper functions :
import org.apache.spark.sql.functions.{col, lit, when}
Now you can use them as followed :
list.select(when(col("column_name").isNotNull, lit(1)))
Now you can use lit also in your code :
val score = list.withColumn("scr", lit(0))
I am trying to use spark-csv to read a csv from aws s3 in spark-shell.
Below are the steps that I did. Started spark-shell using below command
bin/spark-shell --packages com.databricks:spark-csv_2.10:1.2.0
In the shell, executed the following scala code
scala> val hadoopConf = sc.hadoopConfiguration
scala> hadoopConf.set("fs.s3.impl", "org.apache.hadoop.fs.s3native.NativeS3FileSystem")
scala> hadoopConf.set("fs.s3.awsAccessKeyId", "****")
scala> hadoopConf.set("fs.s3.awsSecretAccessKey", "****")
scala> val s3path = "s3n://bucket/sample.csv"
scala> val df = sqlContext.read.format("com.databricks.spark.csv").option("header", "true").load(s3path)
Getting the below error
java.io.IOException: No FileSystem for scheme: s3n
at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2584)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2591)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2630)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2612)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296)
at org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:256)
at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:228)
at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:313)
at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:207)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217)
at scala.Option.getOrElse(Option.scala:120)
What is that I am missing here? Please note that I am able to read the csv using
scala> sc.textFile(s3path)
The same scala code is working fine in databricks notebook as well
Created a issue in spark-csv github. I'll update here when I get answer for the issue
For the URL s3n://bucket/sample.csv, all properties for s3n has to be set. So setting the below properties makes me to read the CSV using spark-csv
scala> val hadoopConf = sc.hadoopConfiguration
scala> hadoopConf.set("fs.s3n.impl", "org.apache.hadoop.fs.s3native.NativeS3FileSystem")
scala> hadoopConf.set("fs.s3n.awsAccessKeyId", "****")
scala> hadoopConf.set("fs.s3n.awsSecretAccessKey", "****")
Refer https://github.com/databricks/spark-csv/issues/137