Cannot import Cosmosdb in databricks - scala

I setup a new cluster on databricks which using databricks runtime version 10.1 (includes Apache Spark 3.2.0, Scala 2.12). I also installed azure_cosmos_spark_3_2_2_12_4_6_2.jar in Libraries.
I create a new notebook with Scala
import com.microsoft.azure.cosmosdb.spark.schema._
import com.microsoft.azure.cosmosdb.spark.CosmosDBSpark
import com.microsoft.azure.cosmosdb.spark.config.Config
But I still get error: object cosmosdb is not a member of package com.microsoft.azure
Does anyone know which step I missing?
Thanks

Looks like the imports you are doing are for the older Spark Connector (https://github.com/Azure/azure-cosmosdb-spark).
For the Spark 3.2 Connector, you might want to follow the quickstart guides: https://learn.microsoft.com/azure/cosmos-db/sql/create-sql-api-spark
The official repository is: https://github.com/Azure/azure-sdk-for-java/tree/main/sdk/cosmos/azure-cosmos-spark_3-2_2-12
Complete Scala sample: https://github.com/Azure/azure-sdk-for-java/blob/main/sdk/cosmos/azure-cosmos-spark_3_2-12/Samples/Scala-Sample.scala
Here is the configuration reference: https://github.com/Azure/azure-sdk-for-java/blob/main/sdk/cosmos/azure-cosmos-spark_3_2-12/docs/configuration-reference.md

You may be missing the pip install step:
pip install azure-cosmos

Related

Error while running Scala code - Databricks 7.3LTS and above

I am running databricks 7.3LTS and having errors while trying to use scala bulk copy.
The error is:
object sqldb is not a member of package com.microsoft.
I have installed correct sqlconnector drivers but not sure how to fix this error.
The installed drivers are:
com.microsoft.azure:spark-mssql-connector_2.12:1.1.0.
also i have installed the JAR dependencies as below:
spark_mssql_connector_2_12_1_1_0.jar
i couldnt find any scala code example for the above configurations on the internet.
my scala code sample is as below:
%scala
import com.microsoft.azure.sqldb.spark.config.Config
as soon as i run this command i get the error
Object sqldb is not a member of package com.microsoft.azure
any help please
In the new connector you need to use com.microsoft.sqlserver.jdbc.spark.SQLServerBulkJdbcOptions class to specify bulk copy options.

How to install kafka module in pyspark

I have problem when import KafkaUtils is :
No module named 'pyspark.streaming.kafka' But i don't know how to install kafka module.
I use python 3.6.8, spark 2.2.0 and kafka_2.12-2.5.0
As it turns out, KafkaUtils is being deprecated and replaced with Spark Structured Streaming. Which means you have two paths forward:
Redesign your application to use Structured Streaming instead (see https://spark.apache.org/docs/2.2.0/structured-streaming-kafka-integration.html for a primer).
Downgrade your version of Spark to a version that still includes KafkaUtils as part of the distribution (you'll find that KafkaUtils won't need to be installed separately).

Failed to %AddDeps HBase 1.3.1 from Jupyter-Toree-Scala

I'm using this jupyter toree notebook in a docker container (https://github.com/jupyter/docker-stacks/tree/master/all-spark-notebook).
I tried to add HBASE dependency with this %AddDeps command in the notebook:
%AddDeps org.apache.hbase hbase 1.3.1 --transitive --verbose
All dependencies seem to be found, yet I still get this output (null error?):
Magic AddDeps failed to execute with error:
null
I can't call import org.apache.hadoop.hbase subsequently, meaning that the library isn't installed yet. I'd really appreciate any advice, thanks.
I solved this issue. Imported the wrong project-name. Should've been %AddDeps org.apache.hbase hbase-client 1.2.0 --transitive

SparkContext cannot be initialized in 'yarn-client' mode called from Scala-IDE

I have installed Cloudera VM (Single node) and inside this VM i have Spark running on top of Yarn. I would like to use Eclipse IDE (with scala plugin) for testing/learning with Spark.
If i instantiate SparkContext as following, everything works as i expected
import org.apache.spark.SparkConf
import org.apache.spark.SparkContext._
val sparkConf = new SparkConf().setAppName("TwitterPopularTags").setMaster("local[2]")
However, if i want now to connect to local server by changing the master to 'yarn-client' then it does not work:
val master = "yarn-client"
val sparkConf = new SparkConf().setAppName("TwitterPopularTags").setMaster(master)
Specifically im getting following errors:
Error details displayed in the Eclipse console:
Error details from the NodeManager logs:
Here are the things i have tried so far:
1. Dependencies
I added all the dependencies through Maven repository
Cloudera version is 5.5 and corresponding Hadoop version is 2.6.0 and Spark version is 1.5.0.
2. Configurations
I added 3 path variables into Eclipse classpath:
SPARK_CONF_DIR=/etc/spark/conf/
HADOOP_CONF_DIR=/usr/lib/hadoop/
YARN_CONF_DIR=/etc/hadoop/conf.cloudera.yarn/
Can anybody clarify me what is the problem here and ways to solve it?
I worked around it! I still don't understand what the exact problems is but i created a folder with my username in hadoop , i.e. /user/myusername directory and it worked. Anyway now i switched to Hortonworks distribution and i found it much more smoother to get started with than the Cloudera distribution.

Eclipse says "apache is not a member of package org"

I am using Scala on Eclipse Luna and trying to connect to Cassandra. My code is showing the error apache object is not a member of package org on the following line:
import org.apache.spark.SparkConf
I already imported the Scala and Spark libraries into the project. Does someone know how can I make my program import Spark libraries?