Run Scala code in remote server with IntelliJ IDEA - scala

I downloaded the Cloudera QuickStart VM and I want to run some code Scala. I've found in the Cloudera a tutorial to run scala code in shell.
Now I want to use an IDE (like IntilliJ IDEA) and connect to the server spark and run my code. I've found this tutorial but it didn't work for me. I can not find the same interface as described.
Thanks for any suggestion!

Related

How to integrate Eclipse IDE with Databricks Cluster

I am trying to integrate my Scala Eclipse IDE with my Azure Databricks Cluster so that I can directly run my Spark program through Eclipse IDE on my Databricks Cluster.
I followed the official documentation of Databricks Connect(https://docs.databricks.com/dev-tools/databricks-connect.html)
.
I have:
Installed Anaconda.
Installed Python Lib 3.7 and Databricks Connect library 6.0.1.
Did the Databricks Connect Configuration part(CLI part).
Also, added the client libraries in the Eclipse IDE.
Set the SPARK_HOME env. variable to the path which I get from running command in Anaconda, i.e. 'databricks-connect get-jar-dir'
I have not set any other environment variables apart from the one mentioned above.
Need help on finding what else is to be done to accomplish this integration, like how the ENV. variable related to connection works if running through IDE.
If someone has already done this successfully, guide me please.

Why does Appium python find_element_by_id fails in AWS device farm but works on local?

I am able to run my appium python test script on my local appium server. But when I run it on AWS device farm it fails with following error in the log -
I made sure I provided an empty desired capabilities object to the driver.
Please help me fix this.
TIA.
Switching from python 3.x to python 2.7.x seems to have resolved this issue.

XGBoost on databricks - outdated scala version

I am trying to follow along the xgboost example on databricks found here
Everything seems to work fine until I get to the actual training part:
val xgboostModelRDD = XGBoost.trainWithRDD(trainRDD, ...)
At this point I get an error. Since the stacktrace is rather short I'll paste it here:
java.lang.NoSuchMethodError: scala.Predef$.ArrowAssoc(Ljava/lang/Object;)Ljava/lang/Object;
at ml.dmlc.xgboost4j.scala.spark.XGBoost$.overrideParamsAccordingToTaskCPUs(XGBoost.scala:232)
at ml.dmlc.xgboost4j.scala.spark.XGBoost$.trainWithRDD(XGBoost.scala:293)
After doing some research, It appears that the reason for that error is incompatible scala version. The databricks community edition cluster comes preconfigured with scala version 2.10. This cannot be modified.
Does that mean that it is impossible to run xgboost using the community edition, or is there a way to resolve this issue?
I think that the forum post that you linked to is slightly outdated. Databricks Community edition actually does allow you to choose the cluster's Scala version.
First, navigate to the clusters page and click on the blue "Create Cluster" button:
From the "Databricks Runtime Version" dropdown menu, you can pick a runtime version which contains your desired Scala and Spark versions:

Spark setup on Windows

I am trying to setup Spark on my Windows 10 PC. After executing the spark-shell command, I got the following error:
java.lang.IllegalArgumentException: Error while instantiating 'org.apache.spark.sql.hive.HiveSessionState':
at rg.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$$reflect
Spark Installation on windows machine is not much difficult. We need to take care of some permissions and configurations during the installation. Please follow below link for step wise Spark and Scala installation and configuration on windows machine.
Apache Spark Installation on windows10

Yahoo hadoop tutorial

I am trying to follow the Yahoo hadoop tutorial:
http://developer.yahoo.com/hadoop/tutorial/module3.html#vm
Everything is fine until I try to connect my Eclipse IDE to the hadoop server process according to the "Getting Started With Eclipse" section. The short story is that my "map reduce location", my DFS Location keeps coming back with "Error:null". My VM is running and I can ping it from my PC. Hadoop server is running as I have run the Pi example.
My PC runs WindowsXP and there is no "hadoop.job.ugi" in the Advanced list for the hadoop location....What does "/hadoop/mapred/system" refer too. There is no such directory in the hadoop installation that you install from the tutorial. It seems like a pretty important directory from the name of the field. I have gone into the advanced settings and switched any reference to my WinXP login (Ben) over to "hadoop-user". It is easy to find in the VM the folder locations that it is looking for like "/tmp/hadoop-hadoop-user/mapred/temp".
Am I right in thinking I can run eclipse on the WinXP environment and connect to the VMWare process via its IP address? Isn't that the point of the article? It does not work.
You read it right. The eclipse plugin for hadoop has lot of caveats and there are couple of things that are not well documented. See the second answer by Icn over Installing Hadoop's Eclipse Plugin. Hopefully that would solve the problem.
"/hadoop/mapred/system" refers to the directories inside HDFS, so you don't see it from terminal using ls
I did see the "hadoop.job.ugi" in Advanced list, and succeed to connected to the VM following the instructions there.
are you using the recommended version of eclipse (3.3.1) ?