why to_date is working in Databricks but not in Intellij?

why to_date is working in Databricks but not in Intellij? - scala

My code is
import org.apache.spark.sql.functions.to_date
val df2=df1.withColumn("Order Date",to_date($"Order Date","dd-MMM-yy"))
Error - Cannot resolve overloaded method 'to_date'
The same code is working fine in Databricks notebook.

Related

how to rewrite this Scala withColumn when condition

I have some Scala code that works when I run manually in Spark EMR, but I get errors when I try to compile in Eclipse.
val tmp_df2 = tmp_df1.withColumn("col_one", when($"col_two" === "good", "bad").otherwise($"col_one"))
When I run "Maven install" it says "error: not found: value when". But I know the code works in EMR.
So, is there another to specify that condition without using "when"?

You may need to import the spark function as follows:
import org.apache.spark.sql.functions.when
or
import org.apache.spark.sql.functions._

How to import Delta Lake module in Zeppelin notebook and pyspark?

I am trying to use Delta Lake in a Zeppelin notebook with pyspark and seems it cannot import the module successfully. e.g.
%pyspark
from delta.tables import *
It fails with the following error:
ModuleNotFoundError: No module named 'delta'
However, there is no problem to save/read the data frame using delta format. And the module can be loaded successfully if using scala spark %spark
Is there any way to use Delta Lake in Zeppelin and pyspark?

Finally managed to load it on zeppelin pyspark. Have to explicitly include the jar file
%pyspark
sc.addPyFile("**LOCATION_OF_DELTA_LAKE_JAR_FILE**")
from delta.tables import *

error not found value spark import spark.implicits._ import spark.sql

I am using hadoop 2.7.2 , hbase 1.4.9, spark 2.2.0, scala 2.11.8 and java 1.8 on a hadoop cluster which is composed of one master and two slave.
when I run spark-shell after starting the cluster , it works fine.
I am trying to connect to hbase using scala by following this tutorial : [https://www.youtube.com/watch?v=gGwB0kCcdu0][1] .
But when I try like he does to run the spark-shell by adding those jars like argument I have this error:
spark-shell --jars
"hbase-annotations-1.4.9.jar,hbase-common-1.4.9.jar,hbase-protocol-1.4.9.jar,htrace-core-3.1.0-incubating.jar,zookeeper-3.4.6.jar,hbase-client-1.4.9.jar,hbase-hadoop2-compat-1.4.9.jar,metrics-json-3.1.2.jar,hbase-server-1.4.9.jar"
<console>:14: error: not found: value spark
import spark.implicits._
^
<console>:14: error: not found: value spark
import spark.sql
^
and after that even I log out and run spark-shell another time I have the same issue.
Can any one tell me please what is the cause and how to fix it .

In your import statement spark should be an object of type SparkSession. That object should have been created previously for you. Or you need to create it yourself (read spark docs). I didn't watch your tutorial video.
The point is it doesn't have to be called spark. It could be for instance called sparkSession and then you can do import sparkSession.implicits._

EMR Notebook Scala kernel import graphframes library

Running spark-shell --packages "graphframes:graphframes:0.7.0-spark2.4-s_2.11" in the bash shell works and I can successfully import graphframes 0.7, but when I try to use it in a scala jupyter notebook like this:
import scala.sys.process._
"spark-shell --packages \"graphframes:graphframes:0.7.0-spark2.4-s_2.11\""!
import org.graphframes._
gives error message:
<console>:53: error: object graphframes is not a member of package org
import org.graphframes._
Which from what I can tell means that it runs the bash command, but then still cannot find the retrieved package.
I am doing this on an EMR Notebook running a spark scala kernel.
Do I have to set some sort of spark library path in the jupyter environment?

That simply shouldn't work. What your code does is a simple attempt to start a new independent Spark shell. Furthermore Spark packages have to loaded when the SparkContext is initialized for the first time.
You should either add (assuming these are correct versions)
spark.jars.packages graphframes:graphframes:0.7.0-spark2.4-s_2.11
to your Spark configuration files, or use equivalent in your SparkConf / SparkSessionBuilder.config before SparkSession is initialized.

Why does from_json fail with “not found : value from_json"? (2)

Have already read the answer to this question that is on SO. None of those fixes are my problem.
I am unable to call the function "from_json".
I already had below in my code:
import org.apache.spark.sql.functions._
I also tried adding:
import org.apache.spark.sql.Column
I am running Scala/Spark through Eclipse. Scala Version 2.11.11, Spark Version 2.0.0.
Any ideas?

from_json function isn't available in Spark 2.0
It is available from Spark 2.1
Release notes of spark 2.1 mentions about adding from_json functionality

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

why to_date is working in Databricks but not in Intellij? - scala

My code is import org.apache.spark.sql.functions.to_date val df2=df1.withColumn("Order Date",to_date($"Order Date","dd-MMM-yy")) Error - Cannot resolve overloaded method 'to_date' The same code is working fine in Databricks notebook.

Related

how to rewrite this Scala withColumn when condition

How to import Delta Lake module in Zeppelin notebook and pyspark?

error not found value spark import spark.implicits._ import spark.sql

EMR Notebook Scala kernel import graphframes library

Why does from_json fail with “not found : value from_json"? (2)

Categories

Resources