createOrReplaceTempView is not a member of org.apache.spark.rdd.RDD - scala

I am using hadoop 2.7.2 , hbase 1.4.9, spark 2.2.0, scala 2.11.8 and java 1.8 .
I run this command whithout having any error:
val Patterns_fromHbase = mimic_PatternsFromHbase.mapPartitions(f=> f.map(row1 => (Bytes.toString(row1._2.getRow), Bytes.toString(row1._2.getValue(Bytes.toBytes("sepsiscategories"),Bytes.toBytes("subject_id")))))).toDF("id","subject_id")
Then I run this command :
mimic_PatternsFromHbase.createOrReplaceTempView("subject_id_table")
and I just have this error:
:57: error: value createOrReplaceTempView is not a member of
org.apache.spark.rdd.RDD[(org.apache.hadoop.hbase.io.ImmutableBytesWritable,
org.apache.hadoop.hbase.client.Result)]
mimic_PatternsFromHbase.createOrReplaceTempView("subject_id_table")
what is the cause of this error and how to fix it please

I found my fault, it is a fault of bad attention.
in the place to call createOrReplaceView method by calling them by Patterns_fromHbase I called her by mimic_PatternsFromHbase

Related

Pyspark error: Cannot load class when registering a function, make sure it is on the classpath

I am trying to run the code below in a python notebook on anaconda but I am getting an error
spark = SparkSession.builder.enableHiveSupport().appName('test').getOrCreate()
spark.sql("SET spark.hadoop.hive.mapred.supports.subdirectories=true")
spark.sql("SET mapreduce.input.fileinputformat.input.dir.recursive=true")
spark.sql("create temporary function ptyUnprotectStr as 'com.protegrity.hive.udf.ptyUnprotectStr'")
Getting this on running the above code:
AnalysisException: "Can not load class 'com.protegrity.hive.udf.ptyUnprotectStr' when regisitering the function 'ptyUnprotectStr' please make sure it is on the classpath"
How can I resolve it?

How to run sample Scala vertx project

I started new project using sbt new vert-x3/vertx-scala.g8 command. In sbt console entered following command :
vertx.deployVerticle(nameForVerticle[HttpVerticle])
The following error is reported:
vertx.deployVerticle(nameForVerticle[HttpVerticle])
<console>:12: error: not found: value vertx
vertx.deployVerticle(nameForVerticle[HttpVerticle])
^
<console>:12: error: not found: value nameForVerticle
vertx.deployVerticle(nameForVerticle[HttpVerticle])
^
<console>:12: error: not found: type HttpVerticle
Followed the steps specified on this page: https://github.com/vert-x3/vertx-sbt-starter
How to get sample project running?
I think g8 template is a little bit broken. I made it work using below tricks:
Use latest SBT version 1.2.8 in file: project/build.properties
When you run console, import HttpVerticle class manually. In my case, I have test.HttpVerticle as class name. Because I used package name as 'test', when I was running SBT new command to initialise the project
scala> import test.HttpVerticle
scala> vertx.deployVerticle(nameForVerticle[HttpVerticle])
// This stuff is going to printed in a second:
scala> Thread Thread[vert.x-eventloop-thread-0,5,run-main-group-0] has been blocked for 2377 ms, time limit is 2000
Thread Thread[vert.x-eventloop-thread-0,5,run-main-group-0] has been blocked for 3378 ms, time limit is 2000
Thread Thread[vert.x-eventloop-thread-0,5,run-main-group-0] has been blocked for 4384 ms, time limit is 2000
And then try to trigger the server:
curl http://localhost:8666/hello
It should reply with "world".
Again, as for class name. If you didn't use any package name when running sbt new initialisation process, then try just import class like this: import HttpVerticle

Assertion Error while using Scala future with Spark

I have been using spark code inside scala future.
I am facing string error like Assertion Failed <None>.
This comes encapsulated on Boxed error.
Each time stack trace starts with sqlContext.udf.register line.
When I put synchonized block around udf register statement, Error goes away.
Scala version - 2.10.8
Spark - 1.6.0

The initialization of the DataSource's outputs caused an error: The UDF class is not a proper subclass

I have this issue
The initialization of the DataSource's outputs caused an error: The UDF class is not a proper subclass of org.apache.flink.api.common.functions.MapFunction
generated by this code:
val probes: DataSet[Probe] = env.createInput[InputProbe](new ProbesInputFormat).map { i =>
new Probe(
i.rssi,
0,
i.macHash,
i.deviceId,
0,
i.timeStamp)
}
I'm using scala 2.11 on flink 1.4.0 with IDEA.
On Dev machine i have no issue and the job runs properly, while on a Flink Standalone Cluster of 3 nodes i encountered the above error.
Can you help me please ;(
UPDATE:
I resolved implementing a class that extends from RichMapFunction, i don't know why but seems that lambda function => are not supported properly.
Now i have a new issue:
java.lang.ClassCastException: hk.huko.aps2.entities.Registry cannot be cast to scala.Product
Should i open a new POST?
I resolved the issue. It happened because flink load my job JAR many times (classloader) and in somehow it produced that error.
The solution is to not creare a JAR including all external JARs dependencies, but to copy into flink/lib folder those libraries plus your job JAR.

Error while using truncateTable method of JDBCUtils in PostgreTable using Spark

I am trying to truncate table postgre_table from Spark using JDBCUtils, but it is throwing below error
< console>:71: error: value truncateTable is not a member of object org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils
val trucate_table = JdbcUtils.truncateTable()
I am using the below code:
import org.apache.spark.sql.execution.datasources.jdbc._
import java.sql.DriverManager
import java.sql.Connection
val connection : Connection = DriverManager.getConnection(postgres_host + postgres_database,postgres_username,postgres_password)
val table_existing = JdbcUtils.tableExists(connection, postgres_host + postgres_database, postgre_table)
JdbcUtils.truncateTable(connection, postgres_host + postgres_database, postgre_table)
I am able to drop the table but not truncate it. I can see truncateTable method in https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala
Please suggest a solution and how to use it in databricks.
Looks like your compile time and runtime spark libraries are of different versions. Please make sure the runtime version match compile time version of spark. Seems the method is available from the version 2.1 alone.
available from this release :
https://github.com/apache/spark/blob/branch-2.1/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala