Can't import sqlContext.implicits._ without an error through Jupyter - scala

When I try to use the import sqlContext.implicits._ on my Jupyter notebook, I get the following error:
Name: Compile Error
Message: <console>:25: error: stable identifier required, but $iwC.this.$VAL10.sqlContext.implicits found.
import sqlContext.implicits._
^
I've tried this locally and it works, but this does not properly function when using it on my Jupyter Notebook server (which is hosted on ec2). I have tried importing different libraries involving that, but unfortunately can not get it to function.

You need to instantiate a sqlContext like so:
val sqlC = new org.apache.spark.sql.SQLContext(sc)
import sqlC.implicits._
You should have seen this error:
stable identifier required

you have to use val keyword instead of var keyword. Since val is equals to const or final keywords.

Related

Pyspark error: Cannot load class when registering a function, make sure it is on the classpath

I am trying to run the code below in a python notebook on anaconda but I am getting an error
spark = SparkSession.builder.enableHiveSupport().appName('test').getOrCreate()
spark.sql("SET spark.hadoop.hive.mapred.supports.subdirectories=true")
spark.sql("SET mapreduce.input.fileinputformat.input.dir.recursive=true")
spark.sql("create temporary function ptyUnprotectStr as 'com.protegrity.hive.udf.ptyUnprotectStr'")
Getting this on running the above code:
AnalysisException: "Can not load class 'com.protegrity.hive.udf.ptyUnprotectStr' when regisitering the function 'ptyUnprotectStr' please make sure it is on the classpath"
How can I resolve it?

HttpClient not found in elastic4s?

I am getting below error while using HttpClient. Can you let me know how to use HttpClient exactly. I am new with elastic4s.
I want to connect scala with ssl configured elasticsearch. I also want to know how I can pass SSL details with link such as keystore path, truststore path and user name , password.
scala> import com.sksamuel.elastic4s.http.{HttpClient, HttpResponse}
import com.sksamuel.elastic4s.http.{HttpClient, HttpResponse}
scala> import com.sksamuel.elastic4s.http.ElasticDsl._
import com.sksamuel.elastic4s.http.ElasticDsl._
scala> val client = HttpClient(ElasticsearchClientUri(uri))
<console>:39: error: not found: value HttpClient
val client = HttpClient(ElasticsearchClientUri(uri))
HttpClient appears to be a trait in the codebase.You seem to be using the same as an object. You can check the implementation Here. For your use case i think the better approach would be to use ElasticClient. Code would look something like this
import com.sksamuel.elastic4s.http._
import com.sksamuel.elastic4s.{ElasticClient, ElasticDsl, ElasticsearchClientUri}
val client = elastic4s.ElasticClient(ElasticsearchClientUri(uri))
I got the same problem, i.e. in my setup I got errors (not found) when trying to use HttpClient (elastic4s-core,elastic4s-http-streams and elastic4s-client-esjava version 7.3.1 on scala 2.12.10).
The solution: you should be able to find and use JavaClient, an implementation of HttpClient that wraps the Elasticsearch Java Rest Client.
An example of how to use the JavaClient can be found here.
Thus, your code should look like the following:
import com.sksamuel.elastic4s.http.JavaClient
import com.sksamuel.elastic4s.{ElasticClient, ElasticDsl, ElasticProperties}
...
val client = ElasticClient(JavaClient(ElasticProperties(uri)))

Scala Notebook Type Mismatch error when creating Event Hub for streaming tweets

I want to send messages from a Twitter application to an Azure event hub. However, I am getting an error that says:
notebook:20: error: type mismatch;
found : java.util.concurrent.ExecutorService
required: java.util.concurrent.ScheduledExecutorService
val eventHubClient = EventHubClient.create(connStr.toString(), pool)
I do not know how to create the EventHubClient.create now. Please help.
I am referring to code from the link
https://learn.microsoft.com/en-us/azure/azure-databricks/databricks-stream-from-eventhubs.
Also, I have tried the solution from link:
Stream data into Azure Databricks using Event Hubs and it doesn't work for me.
The version of the cluster is 5.2 (includes Apache Spark 2.4.0, Scala 2.11) which should include the Java SE 8 libraries that have the new ScheduledExecutorService member. Also, the libraries attached are com.microsoft.azure:azure-eventhubs-spark_2.11:2.3.9 and org.twitter4j:twitter4j-core:4.0.7, so again all the prerequisites are met.
The code is:
import java._
import java.util._
import scala.collection.JavaConverters._
import com.microsoft.azure.eventhubs._
import java.util.concurrent._
import java.util.concurrent.ExecutorService
import java.util.concurrent.ScheduledExecutorService
val pool = Executors.newFixedThreadPool(1)
val eventHubClient = EventHubClient.create(connStr.toString(), pool)

Error while using truncateTable method of JDBCUtils in PostgreTable using Spark

I am trying to truncate table postgre_table from Spark using JDBCUtils, but it is throwing below error
< console>:71: error: value truncateTable is not a member of object org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils
val trucate_table = JdbcUtils.truncateTable()
I am using the below code:
import org.apache.spark.sql.execution.datasources.jdbc._
import java.sql.DriverManager
import java.sql.Connection
val connection : Connection = DriverManager.getConnection(postgres_host + postgres_database,postgres_username,postgres_password)
val table_existing = JdbcUtils.tableExists(connection, postgres_host + postgres_database, postgre_table)
JdbcUtils.truncateTable(connection, postgres_host + postgres_database, postgre_table)
I am able to drop the table but not truncate it. I can see truncateTable method in https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala
Please suggest a solution and how to use it in databricks.
Looks like your compile time and runtime spark libraries are of different versions. Please make sure the runtime version match compile time version of spark. Seems the method is available from the version 2.1 alone.
available from this release :
https://github.com/apache/spark/blob/branch-2.1/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala

Spark Scala error while loading BytesWritable, invalid LOC header (bad signature)

Using sbt package I have the following error
Spark Scala error while loading BytesWritable, invalid LOC header (bad signature)
My code is
....
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf
......
object Test{
def main(args: Array[String]) {
val conf = new SparkConf().setAppName("Test")
val sc = new SparkContext(conf) // the error is due by this
......
}
}
Pls re-load your JARs and / or library dependencies as they might be corrupted while building jar through sbt - could be issue with one of their update. Second alternative is that you have too many temp files open, check your 4040-9 ports on master if there are any jobs hanging and kill them if so, you can also check how increase open files you have on linux:/etc/security/limits.conf where hard nofile ***** and soft nofile ***** then reboot and ulimit -n ****
I was using spark-mllib_2.11 and it gave me the same error. I had to use version 2.10 of Spark MLIB to get rid of it.
Using Maven:
<artifactId>spark-mllib_2.10</artifactId>