AWS Glue - AWSGlueETL dependency not resolved - scala

I am trying to run Glue in my local using scala, so I added the below dependency as per the AWS Glue documentation(https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-libraries.html)
<dependency>
<groupId>com.amazonaws</groupId>
<artifactId>AWSGlueETL</artifactId>
<version>2.0</version>
<!-- A "provided" dependency, this will be ignored when you package your application -->
</dependency>
But this dependency is not found(not resolving)
Please let me know if this dependency moved to some other name.
Thank you
AWSGlueETL should be resolving in pom.xml

I found that https://aws-glue-etl-artifacts.s3.amazonaws.com/ has only 3.0.0, 0.9.0 and 1.0.0 deps. There is no 2.0.0 published version. I found this issue related to this. https://github.com/awslabs/aws-glue-libs/issues/15

Related

Implementing Isolation Forest in Spark Scala

I am trying to implement Isolation Forests algorithm using Spark Scala Maven project. It is explained on this link: iforest example.
My question is: when I try to implement the suggested code I collect this error :
object iforest is not a member of package org.apache.spark.ml
I tried to import org.apache.spark.ml and changed the Spark-core dependency to vesrion 2.2.0 as well.
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.11</artifactId>
<version>2.2.0</version>
</dependency>
Any suggestions please?
You can try this Spark/Scala implementation of the isolation forest algorithm, which has artifacts available in the public Maven Central repository.
You can declare the dependency in your project's pom.xml as:
<dependency>
<groupId>com.linkedin.isolation-forest</groupId>
<artifactId>isolation-forest_3.2.0_2.12</artifactId>
<version>2.0.8</version>
</dependency>
Other available artifact versions are listed here.
This spark-iforest artifact is not included in official distribution nor present in any centralized artifact distribution resource, so to use it you need to build it on your own, either as a separate library or inside your project.
This library should not use package name of external sources at first place, because it made a false offer that it is available within Spark itself.

Maven setup for spark project in Eclipse Scala IDE

Im using Eclipse for Scala IDE to develop a Spark application.
Im using a Maven project, but when i try to import sparkcon like :
import org.apache.spark.SparkConf
I have the error :
object apache is not a member of package org can you help me to setup
spark dependancies
You need to put the Spark dependency in your pom.xml file. As of now, the latest version of Spark can be obtained by putting:
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.10</artifactId>
<version>2.2.0</version>
</dependency>
in the dependencies in your pom.xml.
Please check the tutorial from
https://sparktutorials.github.io/2015/04/02/setting-up-a-spark-project-with-maven.html
Online tutorial
https://www.youtube.com/watch?v=4sO-VgqHLp4

sbt maven compatibility issues

I am facing an issue with sbt here https://github.com/dmlc/xgboost/issues/1858
strangely not even the maven variables are resolved
com.typesafe.akka#akka-actor_${scala.binary.version};2.3.11: not found
maven outputs these warnings during the build:
Expected all dependencies to require Scala version: 2.11.8
[WARNING] com.typesafe.akka:akka-actor_2.11:2.3.11 requires scala version: 2.11.5
[WARNING] Multiple versions of scala libraries detected!
On a mac hard coding the scala version of the akka dependency seems to be a workaround. For windows or ubuntu this workaround does not work.
edit
<scala.binary.version>2.11</scala.binary.version> in https://github.com/dmlc/xgboost/blob/master/jvm-packages/pom.xml
and
<dependency>
<groupId>com.typesafe.akka</groupId>
<artifactId>akka-actor_${scala.binary.version}</artifactId>
<version>2.3.11</version>
<scope>compile</scope>
</dependency>
in https://github.com/dmlc/xgboost/blob/e7fbc8591fa7277ee4c474b7371c48c11b34cbde/jvm-packages/xgboost4j/pom.xml
which I hard coded to
<dependency>
<groupId>com.typesafe.akka</groupId>
<artifactId>akka-actor_2.11</artifactId>
<version>2.3.11</version>
<scope>compile</scope>
</dependency>
The problem is xgboost is using properties from the pom.xml which are defined in a non-default per profile section. SBT does not seem to be able to handle that
see my pull request here https://github.com/dmlc/xgboost/issues/1858

can't create dependence of msqlconnector in pom.xml in eclipse

I can't create the dependency for mysql in eclipse.
error Missing artifact mysql:mysql-connector-java:jar:5.7.9
I use maven console to create the dependency, but It doesn't work
What can I do??
Most likely this is because the current version in 5.* series is 5.1.40.
<dependency>
<groupId>mysql</groupId>
<artifactId>mysql-connector-java</artifactId>
<version>5.1.40</version>
</dependency>
There is no version 5.7.9. You can check current version here or all available here.

java.lang.NoClassDefFoundError: scala/reflect/ClassManifest

I am getting an error when trying to run an example on spark. Can anybody please let me know what changes do i need to do to my pom.xml to run programs with spark.
Currently Spark only works with Scala 2.9.3. It does not work with later versions of Scala. I saw the error you describe when I tried to run the SparkPi example with SCALA_HOME pointing to a 2.10.2 installation. When I pointed SCALA_HOME at a 2.9.3 installation instead, things worked for me. Details here.
You should add dependecy for scala-reflect to your maven build:
<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-reflect</artifactId>
<version>2.10.2</version>
</dependency>
Ran into the same issue using the Scala-Redis 2.9 client (incompatible with Scala 2.10) and including a dependency to scala-reflect does not help. Indeed, scala-reflect is packaged as its own jar but does not include the Class missing which is deprecated since Scala 2.10.0 (see this thread).
The correct answer is to point to an installation of Scala which includes this class (In my case using the Scala-Redis client, the answer of McNeill helped. I pointed to Scala 2.9.3 using SBT and everything worked as expected)
In my case, the error is raised in Kafka's api. I change the dependency from
<dependency>
<groupId>org.apache.kafka</groupId>
<artifactId>kafka_2.9.2</artifactId>
<version>0.8.1.1</version>
</dependency>
to
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming-kafka_2.10</artifactId>
<version>1.6.1</version>
</dependency>
fixed the problem.