I have spark code which connects to Netezza and reads a table.
conf = SparkConf().setAppName("app").setMaster("yarn-client")
sc = SparkContext(conf=conf)
hc = HiveContext(sc)
nz_df=hc.load(source="jdbc",url=address dbname";username=;password=",dbtable="")
I do spark-submit and run the code in the following way..
spark-submit -jars nzjdbc.jar filename.py
And I get the following exception:
py4j.protocol.Py4JJavaError: An error occurred while calling o55.load.
: java.sql.SQLException: No suitable driver
Am I doing anything wrong over here?? is the jar not suitable or is it not able to recgonize the jar?? please let me know the correct way if this is not and also can anyone provide the link to get the jar for connecting netezza from spark.
I am using the 1.6.0 version of spark.
Related
I am invoking spark-shell like this
spark-shell --jars kafka-clients-0.10.2.1.jar,spark-sql-kafka-0-10_2.11-2.3.0.cloudera1.jar,spark-streaming-kafka-0-10_2.11-2.3.0.jar,spark-avro_2.11-2.4.0.jar,avro-1.9.1.jar
After That I read from a Kafka Topic using readStream()
val df = spark.readStream.format("kafka").option("kafka.bootstrap.servers",
"kafka-1.x:9093,kafka-2.x:9093,kafka-0.x:9093").option("kafka.security.protocol","
SASL_SSL").option("kafka.ssl.protocol","TLSv1.2").option("kafka.sasl.mechanism","PLAIN").option("kafka.sasl.jaas.config","org.apache.kafka.common.security.plain.PlainLoginModule
required username=\"token\" password=\"XXXXXXXXXX\";").option("subscribe", "test-topic").option("startingOffsets",
"latest").load()
Then I read the AVRO Schema File
val jsonFormatSchema = new String(Files.readAllBytes(Paths.get("/root/avro_schema.json")))
Then I make the DataFrame which matches the AVRO schema
val DataLineageDF = df.select(from_avro(col("value"),jsonFormatSchema).as("DataLineage")).select("DataLineage.*")
This Throws an Error on me :
java.lang.NoSuchMethodError: org.apache.avro.Schema.getLogicalType()Lorg/apache/avro/LogicalType;
I could fix this Problem by replacing the jar spark-avro_2.11-2.4.0.jar with spark-avro_2.11-2.4.0-palantir.31.jar
Issue:
DataLineageDF.writeStream.format("console").outputMode("append").trigger(Trigger.ProcessingTime("10 seconds")).start
Fails, with this Error
Exception in thread "stream execution thread for [id = ad836d19-0f29-499a-adea-57c6d9c630b2, runId = 489b1123-a2b2-48ea-9d24-e6744e0959b0]" java.lang.NoSuchMethodError: org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.boxedType(Lorg/apache/spark/sql/types/DataType;)Ljava/lang/String;
which seems to be related to In-compatible jars. If anyone has any idea what's going wrong please comment
I'm using the below techstack and trying to connect Phoenix tables using PySpark code. I have downloaded the following jars from the url and tried executing the below code. In logs the connection to hbase is established but the console is stuck with out doing nothing. Please let me know if anybody encountered and fixed similar issue.
https://mvnrepository.com/artifact/org.apache.phoenix/phoenix-spark/4.11.0-HBase-1.2
jars:
phoenix-spark-4.11.0-HBase-1.2.jar
phoenix-client.jar
Tech Stack all running in same host:
Apache Spark 2.2.0 Version
Hbase 1.2 Version
Phoenix 4.11.0 Version
Copied the hbase-site.xml in the folder path /spark/conf/hbase-site.xml.
Command executed ->
usr/local/spark> spark-submit phoenix.py --jars /usr/local/spark/jars/phoenix-spark-4.11.0-HBase-1.2.jar --jars /usr/local/spark/jars/phoenix-client.jar
Phoenix.py:
from pyspark import SparkContext, SparkConf
from pyspark.sql import SQLContext
conf = SparkConf().setAppName("pysparkPhoenixLoad").setMaster("local")
sc = SparkContext(conf=conf)
sqlContext = SQLContext(sc)
df = sqlContext.read.format("org.apache.phoenix.spark").option("table",
"schema.table1").option("zkUrl", "localhost:2181").load()
df.show()
Error log: Hbase Connection is established, however in the console it is stuck and timing out error is thrown
18/07/30 12:28:15 WARN HBaseConfiguration: Config option "hbase.regionserver.lease.period" is deprecated. Instead, use "hbase.client.scanner.timeout.period"
18/07/30 12:28:54 INFO RpcRetryingCaller: Call exception, tries=10, retries=35, started=38367 ms ago, cancelled=false, msg=row 'SYSTEM:CATALOG,,' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, hostname=master01,16020,1532591192223, seqNum=0
Take a look at these answers :
phoenix jdbc doesn't work, no exceptions and stuck
HBase Java client - unknown host: localhost.localdomain
Both of the issues happened in Java (with JDBC), but it looks like it's a similar issue here.
Try to add ZooKeeper hostname (master01, as I see in the error message) to your /etc/hosts :
127.0.0.1 master01
if you are running all your stack locally.
I am using Spark 2.0.2 and Cassandra 3.11.2 I am using this code but it give me connection error.
./spark-shell --jars ~/spark/spark-cassandra-connector/spark-cassandra-connector/target/full/scala-2.10/spark-cassandra-connector-assembly-2.0.5-121-g1a7fa1f8.jar
import com.datastax.spark.connector._
val conf = new SparkConf(true).set("spark.cassandra.connection.host", "localhost")
val test = sc.cassandraTable("sensorkeyspace", "sensortable")
test.count
When I enter test.count command it give me this error.
java.io.IOException: Failed to open native connection to Cassandra at {127.0.0.1}:9042
at com.datastax.spark.connector.cql.CassandraConnector$.com$datastax$spark$connector$cql$CassandraConnector$$createSession(CassandraConnector.scala:168)
at com.datastax.spark.connector.cql.CassandraConnector$$anonfun$8.apply(CassandraConnector.scala:154)
Can you check the yaml file? It seems the number of enough concurrent connections are open at any instance of time.
I have a Scala Spark application that I'm trying to run on a Linux server using a shell script. I am getting the error:
Exception in thread "main" java.lang.IllegalArgumentException: Error
while instantiating 'org.apache.spark.sql.hive.HiveSessionState':
However, I don't understand what is wrong. I am doing this to instantiate Spark:
val sparkConf = new SparkConf().setAppName("HDFStoES").setMaster("local")
val spark: SparkSession = SparkSession.builder.enableHiveSupport().config(sparkConf).getOrCreate()
Am I doing this correctly, if so what could be the error?
sparkSession = SparkSession.builder().appName("Test App").master("local[*])
.config("hive.metastore.warehouse.dir", hiveWareHouseDir)
.config("spark.sql.warehouse.dir", hiveWareHouseDir).enableHiveSupport().getOrCreate();
Use above, you need to specify the "hive.metastore.warehouse.dir" directory to enable hive support in spark session.
I am new to spark and I am querying the below command and it is failing with the error:-
val cop_raw = sqlContext.sql("select * from cop.p_id")
cop_raw.show(5)
java.io.IOException:
shadehive.org.apache.hive.service.cli.HiveSQLException: java.io.IOException:
org.apache.hadoop.hive.ql.metadata.HiveException: Failed to compile query:
org.apache.hadoop.hive.ql.parse.ParseException: line 1:400
Failed to recognize predicate 'date'.
Failed rule: 'identifier' in table or column identifier
Can somebody suggest how to fix it?
I could see that by setting the below can fix the issue but I am not sure how to run this command on zeppelin when hive interpreter is not set.
SET hive.support.sql11.reserved.keywords=false
Have you tried:
sqlContext.sql("SET hive.support.sql11.reserved.keywords=false;")
For me this works in Spark2:
val spark = SparkSession.builder.enableHiveSupport().getOrCreate()
spark.sql("SET hive.support.sql11.reserved.keywords=false;")