How to import Delta Lake module in Zeppelin notebook and pyspark?

How to import Delta Lake module in Zeppelin notebook and pyspark? - pyspark

I am trying to use Delta Lake in a Zeppelin notebook with pyspark and seems it cannot import the module successfully. e.g.
%pyspark
from delta.tables import *
It fails with the following error:
ModuleNotFoundError: No module named 'delta'
However, there is no problem to save/read the data frame using delta format. And the module can be loaded successfully if using scala spark %spark
Is there any way to use Delta Lake in Zeppelin and pyspark?

Finally managed to load it on zeppelin pyspark. Have to explicitly include the jar file
%pyspark
sc.addPyFile("**LOCATION_OF_DELTA_LAKE_JAR_FILE**")
from delta.tables import *

Related

why to_date is working in Databricks but not in Intellij?

My code is
import org.apache.spark.sql.functions.to_date
val df2=df1.withColumn("Order Date",to_date($"Order Date","dd-MMM-yy"))
Error - Cannot resolve overloaded method 'to_date'
The same code is working fine in Databricks notebook.

error not found value spark import spark.implicits._ import spark.sql

I am using hadoop 2.7.2 , hbase 1.4.9, spark 2.2.0, scala 2.11.8 and java 1.8 on a hadoop cluster which is composed of one master and two slave.
when I run spark-shell after starting the cluster , it works fine.
I am trying to connect to hbase using scala by following this tutorial : [https://www.youtube.com/watch?v=gGwB0kCcdu0][1] .
But when I try like he does to run the spark-shell by adding those jars like argument I have this error:
spark-shell --jars
"hbase-annotations-1.4.9.jar,hbase-common-1.4.9.jar,hbase-protocol-1.4.9.jar,htrace-core-3.1.0-incubating.jar,zookeeper-3.4.6.jar,hbase-client-1.4.9.jar,hbase-hadoop2-compat-1.4.9.jar,metrics-json-3.1.2.jar,hbase-server-1.4.9.jar"
<console>:14: error: not found: value spark
import spark.implicits._
^
<console>:14: error: not found: value spark
import spark.sql
^
and after that even I log out and run spark-shell another time I have the same issue.
Can any one tell me please what is the cause and how to fix it .

In your import statement spark should be an object of type SparkSession. That object should have been created previously for you. Or you need to create it yourself (read spark docs). I didn't watch your tutorial video.
The point is it doesn't have to be called spark. It could be for instance called sparkSession and then you can do import sparkSession.implicits._

EMR Notebook Scala kernel import graphframes library

Running spark-shell --packages "graphframes:graphframes:0.7.0-spark2.4-s_2.11" in the bash shell works and I can successfully import graphframes 0.7, but when I try to use it in a scala jupyter notebook like this:
import scala.sys.process._
"spark-shell --packages \"graphframes:graphframes:0.7.0-spark2.4-s_2.11\""!
import org.graphframes._
gives error message:
<console>:53: error: object graphframes is not a member of package org
import org.graphframes._
Which from what I can tell means that it runs the bash command, but then still cannot find the retrieved package.
I am doing this on an EMR Notebook running a spark scala kernel.
Do I have to set some sort of spark library path in the jupyter environment?

That simply shouldn't work. What your code does is a simple attempt to start a new independent Spark shell. Furthermore Spark packages have to loaded when the SparkContext is initialized for the first time.
You should either add (assuming these are correct versions)
spark.jars.packages graphframes:graphframes:0.7.0-spark2.4-s_2.11
to your Spark configuration files, or use equivalent in your SparkConf / SparkSessionBuilder.config before SparkSession is initialized.

import uploaded library to Databricks

I uploaded the spark time series spark-ts library to DataBricks using maven coordinate option in the Create Library. I was able to successfully create the library and attach it to my cluster. But when I tried to import the spark-ts library in DataBricks using org.apache.spark.spark-ts. But it throws an error stating that notebook:1: error: object ts is not a member of package org.apache.spark Please let me know how to handle this issue.

How to Solve "Cannot Import Name UNIX_TIMESTAMP" in PySpark?

Spark version 1.3.0
Python Version: 2.7.8
I am trying to add a module called
from pyspark.sql.functions import unix_timestamp
However, it gives me an error:
ImportError: cannot import name SparkSession
How can I solve this?

Your pyspark python library is incompatible with the Spark Version that you are using( version == 1.3.0) and SparkSession was introduced in 2.0.0.
Try updating Spark to latest version 2.3.0.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

How to import Delta Lake module in Zeppelin notebook and pyspark? - pyspark

Finally managed to load it on zeppelin pyspark. Have to explicitly include the jar file %pyspark sc.addPyFile("LOCATION_OF_DELTA_LAKE_JAR_FILE") from delta.tables import *

Related

why to_date is working in Databricks but not in Intellij?

error not found value spark import spark.implicits._ import spark.sql

EMR Notebook Scala kernel import graphframes library

import uploaded library to Databricks

How to Solve "Cannot Import Name UNIX_TIMESTAMP" in PySpark?

Categories

Resources

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

How to import Delta Lake module in Zeppelin notebook and pyspark? - pyspark

Finally managed to load it on zeppelin pyspark. Have to explicitly include the jar file %pyspark sc.addPyFile("**LOCATION_OF_DELTA_LAKE_JAR_FILE**") from delta.tables import *

Related

why to_date is working in Databricks but not in Intellij?

error not found value spark import spark.implicits._ import spark.sql

EMR Notebook Scala kernel import graphframes library

import uploaded library to Databricks

How to Solve "Cannot Import Name UNIX_TIMESTAMP" in PySpark?

Categories

Resources

Finally managed to load it on zeppelin pyspark. Have to explicitly include the jar file %pyspark sc.addPyFile("LOCATION_OF_DELTA_LAKE_JAR_FILE") from delta.tables import *