I am new to scala and writing test cases using scala test and spray json. My code is as follows.
case class MyModel(Point1: String,
Point2: String,
Point3: Seq[String],
Point4: Seq[String])
it should "serialise/deserialize a MyModel to JSON" in {
val json= """{"Point1":"","Point3":[],"Point2":"","Point4":[]}""".parseJson
val myModelViaJson= json.convertTo[MyModel]
myModelViaJson.Point1 shouldBe ""
myModelViaJson.Point3.isEmpty shouldBe true
myModelViaJson.Point2 shouldBe ""
myModelViaJson.Point4.isEmpty shouldBe true
}
On doing sbt test I am geting following error
should serialise/deserialize a MyModel to JSON *** FAILED ***
[info] spray.json.DeserializationException: Expected String as JsString, but got []
[info] at spray.json.package$.deserializationError(package.scala:23)
[info] at spray.json.ProductFormats.fromField(ProductFormats.scala:63)
[info] at spray.json.ProductFormats.fromField$(ProductFormats.scala:51)
How to solve this?
Add val myModelViaJson= json.convertTo[MyModel] before parsing.
Refer: jsonformats-for-case-classes
So, the code will look like
val json= """{"Point1":"","Point3":[],"Point2":"","Point4":[]}""".parseJson
implicit val format = jsonFormat4(MyModel)
val myModelViaJson= json.convertTo[MyModel]
myModelViaJson.Point1 shouldBe ""
myModelViaJson.Point3.isEmpty shouldBe true
myModelViaJson.Point2 shouldBe ""
myModelViaJson.Point4.isEmpty shouldBe true
Related
I am currently getting decoding failures for String types where the JSON file has nulls.
case class User(
updatedAt: String,
name: String
)
Json looks like:
{
"updated_at": null,
"name": "John"
}
In my tests are have:
"parse User from json file" in {
val jsonString = Source.fromURL(getClass.getResource("/user.json")).mkString
val expected = User(
null,
"John"
)
parser.decode[User](jsonString) must_=== Right(expected)
}
I have the following implicits:
implicit val customConfig: Configuration = Configuration.default.withSnakeCaseMemberNames
implicit val userDe: Decoder[User] = deriveConfiguredDecoder[User]
implicit val userEn: Encoder[User] = deriveConfiguredEncoder[User]
When I run my tests, I get this test failure:
[error] Actual: Left(DecodingFailure(String, List(DownField(updated_at))))
[error] Expected: Right(User(null,John))
What is wrong with my setup here?
I have some code like below.
def computeGroupByCount(Columns: List[String], DF: DataFrame): List[JsValue] = {
val result: ListBuffer[JsValue] = new ListBuffer[JsValue]()
val encoder: Encoder[ColumnGroupByCount] = Encoders.product[ColumnGroupByCount]
groupByCountColumns.par.foreach(colName => {
val groupByCount: Array[ColumnGroupByCount] = DF
.groupBy(colName)
.count()
.map(x => ResponseOnGroupByCount(colName.toString, x.getString(0), x.getLong(1)))(encoder)
.collect()
result += Json.toJson(groupByCount)
})
result.toList
}
When i run this code It is giving below error.But it is working in IntelliJ
[info] Cause: org.codehaus.janino.InternalCompilerException: Class 'org.apache.spark.sql.catalyst.expressions.codegen.GeneratedClass' was loaded through a different loader
and throwing weird code error
Please Help me on this.
I have backticks used for reserved keyword. One example for the case class is as follows:
case class IPC(
`type`: String,
main: Boolean,
normalized: String,
section:String,
`class`: String,
subClass: String,
group:String,
subGroup: String
)
I have declared the sparksession as follows:
def run(params: SparkApp.Params): Unit ={
val sparkSession = SparkSession.builder.master("local[*]").appName("SparkUsptoParser").getOrCreate()
// val conf = new SparkConf().setAppName("SparkUsptoParser").set("spark.driver.allowMultipleContexts", "true")
val sc = sparkSession.sparkContext
sc.setLogLevel("INFO")
sc.hadoopConfiguration.set("fs.s3a.connection.timeout", "500000")
val (patentParsedRDD, zipPathRDD) = runLocal(sc, params)
logger.info(f"Starting to parse files, appending parquet ${params.outputPath}")
import sparkSession.implicits._
val patentParseDF = patentParsedRDD.toDF().write.mode(SaveMode.Append).parquet(params.outputPath)
logger.info(f"Done parsing and appending parquet")
// save list of processed archive
val logPath = params.outputPath + "/log_%s" format java.time.LocalDate.now.toString
zipPathRDD.coalesce(1).saveAsTextFile(logPath)
logger.info(f"Log file save to $logPath")
}
I am trying to run the jar package with sbt. However, I receive the error, "reserved keyword and cannot be used as field name".
Command used:
./bin/spark-submit /Users/Projects/uspto-parser/target/scala-2.11/uspto-parser-assembly-0.1.jar
Error:
Exception in thread "main" java.lang.UnsupportedOperationException: `class` is a reserved keyword and cannot be used as field name
- array element class: "usptoparser.IPC"
- field (class: "scala.collection.immutable.List", name: "ipcs")
- root class: "usptoparser.PatentDocument"
at org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$org$apache$spark$sql$catalyst$ScalaReflection$$serializerFor$1$$anonfun$8.apply(ScalaReflection.scala:627)
at org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$org$apache$spark$sql$catalyst$ScalaReflection$$serializerFor$1$$anonfun$8.apply(ScalaReflection.scala:625)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
at scala.collection.immutable.List.foreach(List.scala:381)
at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241)
at scala.collection.immutable.List.flatMap(List.scala:344)
Versions:
sparkVersion := "2.3.0"
sbt.version = 0.13.8
scalaVersion := "2.11.2"
You can work it around by using a field name that is not a reserved Java keyword and then renaming it using 'as':
scala> case class IPC(name: String, `class`: String)
defined class IPC
scala> val x = Seq(IPC("a", "b"), IPC("d", "e")).toDF
java.lang.UnsupportedOperationException: `class` is a reserved keyword and cannot be used as field name
- root class: "IPC"
at org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$org$apache$spark$sql$catalyst$ScalaReflection$$serializerFor$1$$anonfun$8.apply(ScalaReflection.scala:627)
...
scala> case class IPC(name: String, clazz: String)
defined class IPC
scala> val x = Seq(IPC("a", "b"), IPC("d", "e")).toDF
x: org.apache.spark.sql.DataFrame = [name: string, clazz: string]
scala> x.select($"clazz".as("class")).show(false)
+-----+
|class|
+-----+
|b |
|e |
+-----+
scala>
case class Response(jobCompleted:String,detailedMessage:String)
override def runJob(sc: HiveContext, runtime: JobEnvironment, data:
JobData): JobOutput = {
val generateResponse= new GenerateResponse(data,sc)
val response=generateResponse.generateResponse()
response.pettyPrint
}
I am trying to get ouput from spark job server in this format from my scala code.
" result":{
"jobCompleted":true,
"detailedMessage":"all good"
}
However what returns to me is the following result:{"{\"jobCompleted\":\"true\",\"detailedMessage.."}.
Can some one please point out what I am doing wrong and how to get the correct format. I also tried response.toJson which returns me the AST format
"result": [{
"jobCompleted": ["true"],
"detailedMessage": ["all good"]
}],
I finally figured it out. Based on this stack over flow question. If there is a better way kindly post here as I am new to scala and spark job server.
Convert DataFrame to RDD[Map] in Scala
So the key is to convert the response to a Map[String,JsValue]. Below is the sample code I was playing with.
case class Response(param1:String,param2:String,param3:List[SubResult])
case class SubResult(lst:List[String])
object ResultFormat extends DefaultJsonProtocol{
implicit val subresultformat=jsonFormat1(SubResult)
implicit val responsefomat=jsonFormat3(Response)
}
type JobOutput=Map[String,JsValue]
def runJob(....){
val xlst=List("one","two")
val ylst=List("three","four")
val subresult1=SubResult(xlst)
val subresult2=SubResult(ylst)
val subResultlist=List(subresult1,subresult2)
val r=Result("xxxx","yyy",subResultlist)
r.toJson.asJsObject.fields
//Returns output type of Map[String,JsValue] which the spark job server serializes correctly.
}
lazy val buildDb = taskKey[Unit]("Initializes the database")
buildDb := {
(compile in Compile).value
val s: TaskStreams = streams.value
s.log.info("Building database")
try {
...
} catch {
case e: Throwable =>
sys.error("Failed to initialize the database: " + e.getMessage)
}
s.log.info("Finished building database")
}
This produces the following error
C:\work\server\build.sbt:98: error: type mismatch;
found : Unit
required: T
s.log.info("Finished building database")
^
[error] Type error in expression
But if I define it like this lazy val buildDb = taskKey[String]("Initializes the database") and then add to the last line in the task "Happy end!" string everything seem to work. Am I to blame, or something wrong with the macro?
The same happened to me. I was able to fix the issue e.g. by adding a : TaskKey[Unit] to the taskKey definition. Here are my findings for sbt 0.13.5:
The following definition is OK (it seems that it is pure luck that this is OK):
lazy val collectJars = taskKey[Unit]("collects JARs")
collectJars := {
println("these are my JARs:")
(externalDependencyClasspath in Runtime).value foreach println
}
The following definition (the same as above without the first println) yields the same error "found: Unit, required: T":
lazy val collectJars = taskKey[Unit]("collects JARs")
collectJars := {
(externalDependencyClasspath in Runtime).value foreach println
}
My findings are that this is definitely something magical: For example, if I indent the line lazy val collectJars = ... by one blank, then it compiles. I would expect (but have not checked) that .sbt and .scala build definitions also behave differently.
However, if you add the type signature, it seems to always compile:
lazy val collectJars: TaskKey[Unit] = taskKey[Unit]("collects JARs")
collectJars := {
(externalDependencyClasspath in Runtime).value foreach println
}
Last but not least: The issue seems to be specific for TaskKey[Unit]. Unit tasks are not a good idea - in your example, you could at least return Boolean (true for success / false for failure).