Failed to %AddDeps HBase 1.3.1 from Jupyter-Toree-Scala - scala

I'm using this jupyter toree notebook in a docker container (https://github.com/jupyter/docker-stacks/tree/master/all-spark-notebook).
I tried to add HBASE dependency with this %AddDeps command in the notebook:
%AddDeps org.apache.hbase hbase 1.3.1 --transitive --verbose
All dependencies seem to be found, yet I still get this output (null error?):
Magic AddDeps failed to execute with error:
null
I can't call import org.apache.hadoop.hbase subsequently, meaning that the library isn't installed yet. I'd really appreciate any advice, thanks.

I solved this issue. Imported the wrong project-name. Should've been %AddDeps org.apache.hbase hbase-client 1.2.0 --transitive

Related

Error while running Scala code - Databricks 7.3LTS and above

I am running databricks 7.3LTS and having errors while trying to use scala bulk copy.
The error is:
object sqldb is not a member of package com.microsoft.
I have installed correct sqlconnector drivers but not sure how to fix this error.
The installed drivers are:
com.microsoft.azure:spark-mssql-connector_2.12:1.1.0.
also i have installed the JAR dependencies as below:
spark_mssql_connector_2_12_1_1_0.jar
i couldnt find any scala code example for the above configurations on the internet.
my scala code sample is as below:
%scala
import com.microsoft.azure.sqldb.spark.config.Config
as soon as i run this command i get the error
Object sqldb is not a member of package com.microsoft.azure
any help please
In the new connector you need to use com.microsoft.sqlserver.jdbc.spark.SQLServerBulkJdbcOptions class to specify bulk copy options.

Cannot import Cosmosdb in databricks

I setup a new cluster on databricks which using databricks runtime version 10.1 (includes Apache Spark 3.2.0, Scala 2.12). I also installed azure_cosmos_spark_3_2_2_12_4_6_2.jar in Libraries.
I create a new notebook with Scala
import com.microsoft.azure.cosmosdb.spark.schema._
import com.microsoft.azure.cosmosdb.spark.CosmosDBSpark
import com.microsoft.azure.cosmosdb.spark.config.Config
But I still get error: object cosmosdb is not a member of package com.microsoft.azure
Does anyone know which step I missing?
Thanks
Looks like the imports you are doing are for the older Spark Connector (https://github.com/Azure/azure-cosmosdb-spark).
For the Spark 3.2 Connector, you might want to follow the quickstart guides: https://learn.microsoft.com/azure/cosmos-db/sql/create-sql-api-spark
The official repository is: https://github.com/Azure/azure-sdk-for-java/tree/main/sdk/cosmos/azure-cosmos-spark_3-2_2-12
Complete Scala sample: https://github.com/Azure/azure-sdk-for-java/blob/main/sdk/cosmos/azure-cosmos-spark_3_2-12/Samples/Scala-Sample.scala
Here is the configuration reference: https://github.com/Azure/azure-sdk-for-java/blob/main/sdk/cosmos/azure-cosmos-spark_3_2-12/docs/configuration-reference.md
You may be missing the pip install step:
pip install azure-cosmos

could not access AnyRef in package Scala

I'm using apache toree (version from github). When i'm trying to execute a query against a postgresql table, i'm getting intermittent scala compiler errors (when i run the same cell twice, the errors are gone and the code runs fine).
I am looking for advice on how to debug these errors. The errors look weird (they appear in the notebook nog on stdout).
error: missing or invalid dependency detected while loading class file 'QualifiedTableName.class'.
Could not access type AnyRef in package scala,
because it (or its dependencies) are missing. Check your build definition for
missing or conflicting dependencies. (Re-run with `-Ylog-classpath` to see the problematic classpath.)
A full rebuild may help if 'QualifiedTableName.class' was compiled against an incompatible version of scala.
error: missing or invalid dependency detected while loading class file 'FunctionIdentifier.class'.
Could not access type AnyRef in package scala,
because it (or its dependencies) are missing. Check your build definition for
missing or conflicting dependencies. (Re-run with `-Ylog-classpath` to see the problematic classpath.)
A full rebuild may help if 'FunctionIdentifier.class' was compiled against an incompatible version of scala.
error: missing or invalid dependency detected while loading class file 'DefinedByConstructorParams.class'.
...
The code is simple: extract a dataset from a postgres table:
%AddDeps org.postgresql postgresql 42.1.4 --transitive
val props = new java.util.Properties();
props.setProperty("driver","org.postgresql.Driver");
val df = spark.read.jdbc(url = "jdbc:postgresql://postgresql/database?user=user&password=password",
table = "table", predicates = Array("1=1"), connectionProperties = props)
df.show()
i checked for the obvious (both toree and apache spark use scala 2.11.8, i built apache toree with APACHE_SPARK_VERSION=2.2.0 which is the same as the spark i donwloaded)
For reference, this is the part of the Dockerfile i used to set up toree and spark:
RUN wget https://d3kbcqa49mib13.cloudfront.net/spark-2.2.0-bin-hadoop2.7.tgz && tar -zxf spark-2.2.0-bin-hadoop2.7.tgz && chmod -R og+rw /opt/spark-2.2.0-bin-hadoop2.7 && chown -R a1414.a1414 /opt/spark-2.2.0-bin-hadoop2.7
RUN (curl https://bintray.com/sbt/rpm/rpm > /etc/yum.repos.d/bintray-sbt-rpm.repo)
RUN yum -y install --nogpgcheck sbt
RUN (unset http_proxy; unset https_proxy; yum -y install --nogpgcheck java-1.8.0-openjdk-devel.i686)
RUN (git clone https://github.com/apache/incubator-toree && cd incubator-toree && make clean release APACHE_SPARK_VERSION=2.2.0 ; exit 0)
RUN (. /opt/rh/rh-python35/enable; cd /opt/incubator-toree/dist/toree-pip ;python setup.py install)
RUN (. /opt/rh/rh-python35/enable; jupyter toree install --spark_home=/opt/spark-2.2.0-bin-hadoop2.7 --interpreters=Scala)
I had a similar issue, but it appeared to resolve itself by merely reevaluating the cell in the Jupyter notebook, or by restarting the kernel and then reevaluating the cell. Annoying.
As said in cchantep's comment, you are probably using a different Scala version than the one used to build Spark.
The easiest solution is to check which one is used by Spark, and switch to this one, eg on Mac:
brew switch scala 2.11.8

XGBoost4J successfully built but can't import ml.dmlc.xgboost4j.java.{DMatrix => JDMatrix}

I carefully followed the installation instructions from here, paid attention to set JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.8.0_65.jdk/Contents/Home/
The installation through mvn package returns BUILD SUCCESS. However, when I tried to run BasicWalkThrough.scala from the example folder I got an error on
<console>:14: error: not found: value ml
import ml.dmlc.xgboost4j.java.{DMatrix => JDMatrix}
Can anybody help to fix this please?
OS:OSX 10.12
You need to run spark-shell with conf option pointing to your built xboost jar:
./bin/spark-shell --conf spark.jars=/xgboost/jvm-packages/xgboost4j-example/target/xgboost4j-example-0.8-SNAPSHOT-jar-with-dependencies.jar

(run-main-0) java.lang.NoSuchMethodError

I got a problem when I used sbt to run a spark job, I have finish compile, but when I run the command run, I got the problem below
[error] (run-main-0) java.lang.NoSuchMethodError: scala.collection.immutable.HashSet$.empty()Lscala/collection/immutable/HashSet;
java.lang.NoSuchMethodError: scala.collection.immutable.HashSet$.empty()Lscala/collection/immutable/HashSet;
at akka.actor.ActorCell$.<init>(ActorCell.scala:305)
at akka.actor.ActorCell$.<clinit>(ActorCell.scala)
at akka.actor.RootActorPath.$div(ActorPath.scala:152)
at akka.actor.LocalActorRefProvider.<init>(ActorRefProvider.scala:465)
at akka.remote.RemoteActorRefProvider.<init>(RemoteActorRefProvider.scala:124)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
at akka.actor.ReflectiveDynamicAccess$$anonfun$createInstanceFor$2.apply(DynamicAccess.scala:78)
at scala.util.Try$.apply(Try.scala:191)
Anyone knows what should I do?
I met with the same error when I used scala-library-2.11 jar But when I replaced it with scala-library-2.10 jar . It runs fine
It is probably caused by using incompatible versions of Scala. When I downgraded from Scala 2.11 to 2.10, I forgot to modify one package version (so one package used 2.11, the rest 2.10), resulting in having the same error.
Note: I only had this problem when using IntelliJ.
If you are getting the error and here because you cannot run Jupiter notebooks with Spark 2.1 and Scala 2.11 below is how I was able to make it work. Assumes you installed Jupiter and toree
Pre-req -
Make sure Docker is running else Make fails.
Make sure gpg is installed else Make fails.
Build steps -
export SPARK_HOME=/Users/<path>/spark-2.1.0-hadoop2.7/
git clone https://github.com/apache/incubator-toree.git
cd incubator-toree
make clean release APACHE_SPARK_VERSION=2.1.0
pip install --upgrade ./dist/toree-pip/toree-0.2.0.dev1.tar.gz
pip freeze |grep toree
jupyter toree install --spark_home=$SPARK_HOME
========================================================================
To Start the notebook -
SPARK_OPTS='--master=local[4]' jupyter notebook
I used these versions and everything works now.
<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-library</artifactId>
<version>2.11.6</version>
</dependency>
<dependency>
<groupId>com.typesafe.akka</groupId>
<artifactId>akka-actor_2.11</artifactId>
<version>2.3.11</version>
</dependency>
Check whether the scala version you are using corresponds to the precompiled spark version.
The issue could be reproduced with version 2.11.8.
By the moment, no downgrade is required.
Just update scala-library version to 2.12.0.
I've the exactly the same problem and got it fixed by downgrading scala 2.11.8 to 2.10.6.
I've the same issue but where do i alter the scala-library version?
Installation (on Ubuntu 16.04):
sudo apt-get install oracle-java8-installer
wget http://d3kbcqa49mib13.cloudfront.net/spark-2.0.2-bin-hadoop2.7.tgz && tar xvf spark-2.0.2-bin-hadoop2.7.tgz
pip install toree && jupyter toree install
So when I start with a notebook it tells me that I use a different scala version. But I haven't installed anything else.
screenshot + scala version
My spark jars folder contains an scala-library-2.11.8.jar file. But how tell torree to use that (or another) file for scala.
To me both scala versions 2.11 and 2.12 dint work , downgrading to 2.10.3 worked