How to convert RDD[some case class] to csv file using scala? - scala

I have an RDD[some case class] and I want to convert it to a csv file. I am using spark 1.6 and scala 2.10.5 .
stationDetails.toDF.coalesce(1).write.format("com.databricks.spark.csv").save("data/myData.csv")
gives error
Exception in thread "main" java.lang.ClassNotFoundException: Failed to find data source: com.databricks.spark.csv. Please find packages at http://spark-packages.org
at org.apache.spark.sql.execution.datasources.ResolvedDataSource$.lookupDataSource(ResolvedDataSource.scala:77)
at org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:219)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:148)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:139)
I am not able to add the dependencies for "com.databricks.spark.csv" in my build.sbt file.
dependencies I added in build.sbt file are:
libraryDependencies ++= Seq(
"org.apache.commons" % "commons-csv" % "1.1",
"com.univocity" % "univocity-parsers" % "1.5.1",
"org.slf4j" % "slf4j-api" % "1.7.5" % "provided",
"org.scalatest" %% "scalatest" % "2.2.1" % "test",
"com.novocode" % "junit-interface" % "0.9" % "test"
)
I also tried this
stationDetails.toDF.coalesce(1).write.csv("data/myData.csv")
but it gives error : csv cannot be resolved.

Please change your build.sbt to below -
libraryDependencies ++= Seq(
"org.apache.commons" % "commons-csv" % "1.1",
"com.databricks" %% "spark-csv" % "1.4.0",
"com.univocity" % "univocity-parsers" % "1.5.1",
"org.slf4j" % "slf4j-api" % "1.7.5" % "provided",
"org.scalatest" %% "scalatest" % "2.2.1" % "test",
"com.novocode" % "junit-interface" % "0.9" % "test"
)

Related

Error:- java.lang.NoClassDefFoundError: scala/Product$class

Test project Build.sbt
scalaVersion := "2.12.8"
crossScalaVersions := Seq("2.11.12", "2.12.8")
val scalaTestVersion = "3.0.5"
val rocksDBVersion = "5.17.2"
val kafkaVersion = "2.2.0"
lazy val deps = Seq(
// "javax.ws.rs" % "javax.ws.rs-api" % "2.1" jar(),
"org.apache.kafka" % "kafka-clients" % kafkaVersion,
"org.apache.kafka" % "kafka-clients" % kafkaVersion classifier "test",
"org.apache.kafka" % "kafka-streams" % kafkaVersion,
"org.apache.kafka" % "kafka-streams" % kafkaVersion classifier "test",
"org.apache.kafka" %% "kafka" % kafkaVersion,
"org.apache.kafka" % "kafka-streams-test-utils" % kafkaVersion,
"org.scalatest" %% "scalatest" % scalaTestVersion % "test",
"org.rocksdb" % "rocksdbjni" % rocksDBVersion % "test"
)
Test Project plugins.sbt
addSbtPlugin("com.github.gseitz" % "sbt-release" % "1.0.8")
This is my Test application sbt configuration file and when I run sbt package it will create a jar file for me then I have to use that jar in my other project.
Other project sbt file
scalaVersion := "2.12.6"
object Dependencies {
val deps = Seq(
"org.json4s" %% "json4s-native" % "3.5.3",
"ch.qos.logback" % "logback-classic" % "1.2.3",
"cool.graph" % "cuid-java" % "0.1.1",
"org.apache.kafka" % "kafka-streams" % "2.2.0",
"com.typesafe" % "config" % "1.3.3",
"org.scalaz" %% "scalaz-core" % "7.2.20",
"org.apache.commons" % "commons-lang3" % "3.8.1",
//Test
"org.scalatest" %% "scalatest" % "3.0.4" % Test,
"org.mockito" % "mockito-all" % "1.10.19" % Test,
"com.abc" %% "kafka-streams-test-kit" % "2.2.0" % Test
)
}
Other Project plugins.sbt
addSbtPlugin("se.marcuslonnberg" % "sbt-docker" % "1.5.0")
addSbtPlugin("net.virtual-void" % "sbt-dependency-graph" % "0.9.0")
addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.14.6")
addSbtPlugin("com.github.gseitz" % "sbt-release" % "1.0.8")
As I'm new in Scala and I don't know what i did wrong in that as when i put my test scala jar into another project It will give me the error.
Error:-
java.lang.NoClassDefFoundError: scala/Product$class
[error] at com.shepherd.kafka.streams.test.kit.MockedStreams$Builder.<init>(MockedStreams.scala:17)
[error] at com.shepherd.kafka.streams.test.kit.MockedStreams$.apply(MockedStreams.scala:15)
[error] at com.shepherd.integration.WindowedThresholdSpec.$anonfun$new$4(WindowedThresholdSpec.scala:38)
[error] at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:12)
Can anyone please help me on this as I'm new also I have see this one java.lang.NoClassDefFoundError: scala/Product$class but this not help me so I have put my problem here.
Try with both the way like uploaded jar directly into the code and pull by S3 but the error is still exists.
resolvers += "Java.net Maven2 Repository" at "https://repo1.maven.org/maven2/"
resolvers += "S3" at "https://s3-eu-west-1.amazonaws.com/com.abc/" -> Test jar file
NOTE:- I have Set the same scala version in my both the application "2.12.6" but issue is still exists.

How to use TestNG in scala sbt framework?

I try to start the sample test NG test in the scala sbt framework. I am using below the dependencies.
"org.apache.spark" %% "spark-core" % sparkVer % Provided,
"org.apache.hadoop" % "hadoop-common" % sparkVer % Provided,
"org.apache.spark" % "spark-sql_2.11" % sparkVer % Provided,
"org.apache.spark" % "spark-hive_2.11" % sparkVer,
"org.scalactic" %% "scalactic" % scalatestVer,
"org.scalatest" %% "scalatest" % scalatestVer % Test,
"info.cukes" % "cucumber-scala_2.11" % cucumberVer,
"info.cukes" % "cucumber-junit" % cucumberVer,
"junit" % "junit" % "4.12",
"org.scalatest" % "scalatest_2.11" % "2.0" % "test",
"org.scalactic" %% "scalactic" % "3.0.1",
"org.scalatest" %% "scalatest" % "3.0.1" % "test",
[The image is ExampleSuite class as you can see I am not able to use it correctly1
Here is the link I follow up for this case http://www.scalatest.org/getting_started_with_testng_in_scala
Here is also sbt.version = 0.13.16
Any help really appreciate.
Add dependencies => "org.testng" % "testng" % "6.14.3" % Test, and "create TestNG XML" plugin upload from marketplace than resolve the issue.

java.lang.NoSuchMethodError: org.apache.http.conn.ssl.SSLConnectionSocketFactory

How can I list all file names of parquet files in the S3 directory in Amazon?
I found this way:
val s3 = AmazonS3ClientCuilder.standard.build()
var objs = s3.listObjects("bucketname","directory")
val summaries = objs.getObjectSummaries()
while (objs.isTruncated()) {
objs = s3.listNextBatchOfObjects(objs)
summaries.addAll(objs.getObjectSummaries())
}
val listOfFiles = summaries.toArray
But it throws the error:
java.lang.NoSuchMethodError: org.apache.http.conn.ssl.SSLConnectionSocketFactory
I added the dependency for httpclient 4.5.2 as indicated in many answers, but I still get the same error.
Also I did:
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % sparkVersion exclude("commons-httpclient", "commons-httpclient"),
"org.apache.spark" %% "spark-mllib" % sparkVersion exclude("commons-httpclient", "commons-httpclient"),
"org.sedis" %% "sedis" % "1.2.2",
"org.scalactic" %% "scalactic" % "3.0.0",
"org.scalatest" %% "scalatest" % "3.0.0" % "test",
"com.github.nscala-time" %% "nscala-time" % "2.14.0",
"com.amazonaws" % "aws-java-sdk-s3" % "1.11.53",
"org.apache.httpcomponents" % "httpclient" % "4.5.2",
"net.java.dev.jets3t" % "jets3t" % "0.9.3",
"org.apache.hadoop" % "hadoop-aws" % "2.6.0",
"com.github.scopt" %% "scopt" % "3.3.0"
)

PlaySpecification not found in Play 2.6.3

I've imported Specs2 and everything looks good but some things that are not imported, among them the PlaySpecification trait.
I've tried to reload in sbt, to invalidate caches in Intellij ... But this trait is missing!
My built.sbt
name := """web2"""
version := "1.0-SNAPSHOT"
scalaVersion := "2.12.2"
lazy val root = (project in file(".")).enablePlugins(PlayScala, LauncherJarPlugin)
pipelineStages := Seq(digest)
libraryDependencies ++= Seq(
evolutions,
jdbc,
ehcache,
ws,
"com.softwaremill.macwire" %% "macros" % "2.3.0" % "provided",
"org.postgresql" % "postgresql" % "42.1.1",
"org.scalikejdbc" %% "scalikejdbc" % "3.0.0",
"org.scalikejdbc" %% "scalikejdbc-config" % "3.0.0",
"ch.qos.logback" % "logback-classic" % "1.2.3",
"de.svenkubiak" % "jBCrypt" % "0.4.1",
//"org.scalatestplus.play" %% "scalatestplus-play" % "3.0.+" % "test",
"org.mockito" % "mockito-core" % "2.7.22" % "test",
"org.specs2" %% "specs2-core" % "3.9.+" % "test"
)
resolvers += "scalaz-bintray" at "http://dl.bintray.com/scalaz/releases"
plugins.sbt
// The Play plugin
addSbtPlugin("com.typesafe.play" % "sbt-plugin" % "2.6.3")
// Web plugins
addSbtPlugin("com.typesafe.sbt" % "sbt-digest" % "1.1.2")
According to the documentation, replace "org.specs2" %% "specs2-core" % "3.9.+" % "test" with specs2 % Test in your dependencies:
libraryDependencies ++= Seq(
evolutions,
jdbc,
ehcache,
...
"org.mockito" % "mockito-core" % "2.7.22" % "test",
specs2 % Test
)

IDEA and scalariform configuration - unresolved symbols even if scalariform works from command line

This question is partially related to sbt-scalariform plugin - can't resolve settings. I managed to run scalariform from command line as SBT task.
Now the problem is with IDEA. When I open my build.sbt, which looks like this:
import scalariform.formatter.preferences._
name := """scheduling-backend"""
version := "1.0"
scalaVersion := "2.10.2"
resolvers += "spray repo" at "http://repo.spray.io"
resolvers += "spray nightlies" at "http://nightlies.spray.io"
resolvers += "SpringSource Milestone Repository" at "http://repo.springsource.org/milestone"
resolvers += "Neo4j Cypher DSL Repository" at "http://m2.neo4j.org/content/repositories/releases"
libraryDependencies ++= Seq(
"com.typesafe.akka" %% "akka-actor" % "2.3.0",
"com.typesafe.akka" %% "akka-slf4j" % "2.3.0",
"com.typesafe.akka" %% "akka-testkit" % "2.3.0" % "test",
"com.typesafe.akka" %% "akka-persistence-experimental" % "2.3.0",
"io.spray" % "spray-can" % "1.3.0",
"io.spray" % "spray-routing" % "1.3.0",
"io.spray" % "spray-testkit" % "1.3.0" % "test",
"io.spray" %% "spray-json" % "1.2.5",
"ch.qos.logback" % "logback-classic" % "1.0.13",
"org.specs2" %% "specs2" % "1.14" % "test",
"org.springframework.scala" % "spring-scala" % "1.0.0.M2",
"org.springframework.data" % "spring-data-neo4j" % "3.0.0.RELEASE",
"org.springframework.data" % "spring-data-neo4j-rest" % "3.0.0.RELEASE",
"javax.validation" % "validation-api" % "1.1.0.Final",
"com.github.nscala-time" %% "nscala-time" % "0.8.0",
"org.neo4j" % "neo4j-kernel" % "2.0.1" % "test" classifier "tests",
"com.sun.jersey" % "jersey-core" % "1.9",
"org.mockito" % "mockito-all" % "1.9.5"
)
scalacOptions ++= Seq(
"-unchecked",
"-deprecation",
"-Xlint",
"-Ywarn-dead-code",
"-language:_",
"-target:jvm-1.7",
"-encoding", "UTF-8"
)
org.scalastyle.sbt.ScalastylePlugin.Settings
scalariformSettings
ScalariformKeys.preferences := ScalariformKeys.preferences.value
.setPreference(AlignParameters, true)
.setPreference(CompactControlReadability, true)
IDEA reports problems with my file.
I am getting Cannot resolve symbol scalariformSettings and Cannot resolve symbol ScalariformKeyseven if I everything works from terminal.
adding import com.typesafe.sbt.SbtScalariform._ to build.sbt seems to fix the error on 13.1.1 with scala plugin 0.33.403, but I have to admit it ignored the import at first and then randomly started to see it.