ERROR HbaseConnector: Can't get the location for replica 0 - scala

I'm trying to perform some read/write operation with Hbase using Spark. When I'm running my spark code using spark-submit command
bin/spark-submit --master local[*] --class com.test.driver.Driver /home/deb/computation/target/computation-1.0-SNAPSHOT.jar "function=avg" "signals=('.tagname_qwewf')" "startTime=2018-10-10T13:51:47.135Z" "endTime=2018-10-10T14:36:11.073Z"
it's executing without any error.
But when I'm trying to do the same from Intellij I'm getting the below errors
8/12/17 01:51:45 ERROR HbaseConnector: An exception while reading dataframe from HBase
18/12/17 01:51:45 ERROR HbaseConnector: Can't get the location for replica 0
18/12/17 01:51:45 ERROR Driver: No historical data found for signals in the expression.
Any suggestion how to resolve this issue.

Related

spark-submit --py-files gives warning RuntimeWarning: Failed to add file <abc.py> speficied in 'spark.submit.pyFiles' to Python path:

We have a pyspark based application and we are doing a spark-submit as shown below. Application is working as expected, however we are seeing a weird warning message. Any way to handle this or why is this coming ?
Note: The cluster is Azure HDI Cluster.
spark-submit --master yarn --deploy-mode cluster --jars file:/<localpath>/* --py-files pyFiles/__init__.py,pyFiles/<abc>.py,pyFiles/<abd>.py --files files/<env>.properties,files/<config>.json main.py
warning seen is:
warnings.warn(
/usr/hdp/current/spark3-client/python/pyspark/context.py:256:
RuntimeWarning: Failed to add file
[file:///home/sshuser/project/pyFiles/abc.py] speficied in
'spark.submit.pyFiles' to Python path:
/mnt/resource/hadoop/yarn/local/usercache/sshuser/filecache/929
above warning coming for all files i.e abc.py, abd.py etc (which ever passed to --py-files to)

Container killed by YARN for exceeding memory limits in Spark Scala

I am Facing below Error while Running my Spark Scala code using Spark-submit command.
ERROR cluster.YarnClusterScheduler: Lost executor 14 on XXXX: Container killed by YARN for exceeding memory limits. 55.6 GB of 55 GB physical memory used.
The Code of the Line Number it throws the error is below...
df.write.mode("overwrite").parquet("file")
I am Writing a Parquet file.... It was working till yesterday not sure from last run only it is throwing the error with same input file.
Thanks,
Naveen
By Running with below conf in spark-submit command, the issues is resolved and code ran successfully.
--conf spark.dynamicAllocation.enabled=true
Thanks,
Naveen

Getting error in Spark Structured Streaming for using from_json function

I am reading the data from Kafka and my spark code contains the following code:
val hiveDf = parsedDf
.select(from_json(col("value"), schema).as("value"))
.selectExpr("value.*")
When I am running it from IntelliJ it is working but when I am running it as a jar it is throwing following error:
Exception in thread "main" java.lang.NoSuchMethodError: org.apache.spark.sql.functions$.from_json(Lorg/apache/spark/sql/Column;Lorg/apache/spark/sql/types/StructType;)Lorg/apache/spark/sql/Column;
my spark-submit command looks like this:
C:\spark>.\bin\spark-submit --jars C:\Users\namaagarwal\Desktop\Spark_FI\spark-sql-kafka-0-10_2.11-2.1.0.cloudera1.jar --class ClickStream C:\Users\namaagarwal\Desktop\Spark_FI\SparkStreamingFI\target\scala-2.11\sparkstreamingfi_2.11-0.1.jar

PySpark: java.sql.SQLException: No suitable driver

I have spark code which connects to Netezza and reads a table.
conf = SparkConf().setAppName("app").setMaster("yarn-client")
sc = SparkContext(conf=conf)
hc = HiveContext(sc)
nz_df=hc.load(source="jdbc",url=address dbname";username=;password=",dbtable="")
I do spark-submit and run the code in the following way..
spark-submit -jars nzjdbc.jar filename.py
And I get the following exception:
py4j.protocol.Py4JJavaError: An error occurred while calling o55.load.
: java.sql.SQLException: No suitable driver
Am I doing anything wrong over here?? is the jar not suitable or is it not able to recgonize the jar?? please let me know the correct way if this is not and also can anyone provide the link to get the jar for connecting netezza from spark.
I am using the 1.6.0 version of spark.

pyspark job on qubole fails with "Retrying exception reading mapper output"

I have a pyspark job running via qubole which fails with the following error.
Qubole > Shell Command failed, exit code unknown
Qubole > 2016-12-03 17:36:53,097 ERROR shellcli.py:231 - run - Retrying exception reading mapper output: (22, 'The requested URL returned error: 404 Not Found')
Qubole > 2016-12-03 17:36:53,358 ERROR shellcli.py:262 - run - Retrying exception reading mapper logs: (22, 'The requested URL returned error: 404 Not Found')
The job is run with the following configurations :
--num-executors 38 --executor-cores 2 --executor-memory 12288M --driver-memory 4000M --conf spark.storage.memoryFraction=0.3 --conf spark.yarn.executor.memoryOverhead=1024
Cluster contains 30 slave count. m2.2xlarge, 4 cores master and slave nodes.
Any insights on the root cause of the issue will be useful.
In many cases - above error is really not the main reason of failure. In qubole the spark job is submitted via a shellCli ( 1 mapper command which invokes the main pyspark job using spark-submit on one of the slave nodes ) - and since the same shellCli process invokes the driver in yarn-client mode - often times if this process goes bad due to any reason ( i.e. memory issues with driver ) then you might hit this issue.
Other less probable reason could be - network connectivity where qubole tier is unable to connect to the process/slave node where this 1 mapper invoker job is running.