XGBoost on databricks - outdated scala version - scala

I am trying to follow along the xgboost example on databricks found here
Everything seems to work fine until I get to the actual training part:
val xgboostModelRDD = XGBoost.trainWithRDD(trainRDD, ...)
At this point I get an error. Since the stacktrace is rather short I'll paste it here:
java.lang.NoSuchMethodError: scala.Predef$.ArrowAssoc(Ljava/lang/Object;)Ljava/lang/Object;
at ml.dmlc.xgboost4j.scala.spark.XGBoost$.overrideParamsAccordingToTaskCPUs(XGBoost.scala:232)
at ml.dmlc.xgboost4j.scala.spark.XGBoost$.trainWithRDD(XGBoost.scala:293)
After doing some research, It appears that the reason for that error is incompatible scala version. The databricks community edition cluster comes preconfigured with scala version 2.10. This cannot be modified.
Does that mean that it is impossible to run xgboost using the community edition, or is there a way to resolve this issue?

I think that the forum post that you linked to is slightly outdated. Databricks Community edition actually does allow you to choose the cluster's Scala version.
First, navigate to the clusters page and click on the blue "Create Cluster" button:
From the "Databricks Runtime Version" dropdown menu, you can pick a runtime version which contains your desired Scala and Spark versions:

Related

How to integrate Eclipse IDE with Databricks Cluster

I am trying to integrate my Scala Eclipse IDE with my Azure Databricks Cluster so that I can directly run my Spark program through Eclipse IDE on my Databricks Cluster.
I followed the official documentation of Databricks Connect(https://docs.databricks.com/dev-tools/databricks-connect.html)
.
I have:
Installed Anaconda.
Installed Python Lib 3.7 and Databricks Connect library 6.0.1.
Did the Databricks Connect Configuration part(CLI part).
Also, added the client libraries in the Eclipse IDE.
Set the SPARK_HOME env. variable to the path which I get from running command in Anaconda, i.e. 'databricks-connect get-jar-dir'
I have not set any other environment variables apart from the one mentioned above.
Need help on finding what else is to be done to accomplish this integration, like how the ENV. variable related to connection works if running through IDE.
If someone has already done this successfully, guide me please.

Azure Durable Functions, Upgrading nuget package causes local testing to fail

I have a durable functions app that worked perfectly until I upgraded package Microsoft.Azure.WebJobs.Extensions.DurableTask from version 1.5.0 to 1.6.0.
Now running locally caused this error in the console:
[8/31/2018 9:35:58 PM] A ScriptHost error has occurred
[8/31/2018 9:35:58 PM] System.Private.CoreLib: No parameterless constructor defined for this object.
[8/31/2018 9:35:58 PM] Stopping Host
I have made absolutlely no code changes. What am I missing?
Thanks in advance for your help.
See comment on our GitHub here: If you're using the Functions V2 runtime, breaking changes were introduced into the Functions V2 host. Durable Functions 1.6.0 accommodates those changes and must be used with version 2.0.12050.0 or higher of the Functions runtime. There's a new version of the Azure Functions Core Tools out to accommodate these changes as well.
If you want to use Durable Functions 1.6.0, you'll need to follow these steps:
You will need to update your Azure Functions Core Tools to the latest version. (2.0.1-beta.37)
If your app is built using Visual Studio, you need to update your Microsoft.NET.Sdk.Functions NuGet package to v1.0.19.
You will need to migrate to the new Functions V2 host.json schema.
If you want to stay with Durable Functions 1.5.0, you will need to pin your core tools to an older version, and in Azure, pin your FUNCTIONS_EXTENSION_VERSION. More detailed information on pinning can be found in the runtime release announcement.
If your Functions app is running on the V1 runtime, Durable Functions 1.6.0 should work without incident. (Please let us know if it's not, that means we need to fix something.)
Functions Runtime 2.0.12050-alpha release
notes
Functions Runtime 2.0.12050-alpha
announcement
Durable Functions 1.6.0 release notes
I can't comment so I have to answer.
There are several problems with durable functions v2.
With the latest package versions I can't run locally with VisualStudio IDE.
I tried and checked all the infos in the previous answer; "Azure Functions and Web Job Tools" is changed so I tried also versions 15.10.2009.0 and 15.8.5023.0
Most relevant problem is that at this point if I go down with package versions I can't use Newtonsoft because of versioning constraints.

Run Scala code in remote server with IntelliJ IDEA

I downloaded the Cloudera QuickStart VM and I want to run some code Scala. I've found in the Cloudera a tutorial to run scala code in shell.
Now I want to use an IDE (like IntilliJ IDEA) and connect to the server spark and run my code. I've found this tutorial but it didn't work for me. I can not find the same interface as described.
Thanks for any suggestion!

Not able to find ./bin/spark-class for launching spark cluster on standalone mode

While launching standalone cluster on spark streaming, I am not able to find ./bin/spark-class command.
Please let me know, if I need to do any additional configurations for getting "spark-class".
Which version of Spark are you using? Starting with Spark 0.9.0, spark-class is located in the bin folder, but in earlier versions it was at the root of SPARK_HOME.
Perhaps you're following instructions for Spark 0.9.0+ even though you've installed an earlier version of Spark? You can find documentation for older releases of Spark on the Spark documentation overview page.

Binary distributions of old (1.0 - 2.5) versions of Scala?

The Scala website says:
For historical and testing purposes, we also keep an archive of
previous releases (currently since version 2.5). Prior versions of
Scala, from 0.9.x to 2.4.x, have been archived offline.
Is there any way to get these versions? The source code is available in the git repo, but binaries would be nice.
EDIT:
I found some old versions at archive.org, but the oldest that I could get was 1.1.1.3 from http://web.archive.org/web/20040603140225/http://scala.epfl.ch/downloads/index.html
It turns out, if one looks hard enough, everything from 1.0.0-b4 can be found on archive.org. Unfortunately, some version in between are nowhere to be found, for example 2.3.x and 2.0.x. The question remains why they aren't published on http://scala-lang.org.