Using JsonPath from scala to extract full field : value list - scala

I'm trying to get a fully qualified set of path : value pairs from a json document.
i.e. given
{"a":"b", "c":{"d":3"}}
I'd like
a :: "b"
c.d :: 3
or something spiritually similar. There appears to be a java library which claims to do exactly that:
import $ivy.`com.jayway.jsonpath:json-path:2.6.0`
import com.jayway.jsonpath.Configuration
import com.jayway.jsonpath.Option
import com.jayway.jsonpath.JsonPath._
val conf = com.jayway.jsonpath.Configuration.defaultConfiguration();
val pathList = using(conf).parse("""{"a":"b", "c":{"d":3}}""")
val arg = pathList.read("$..id")
I get this error
java.lang.ClassCastException: net.minidev.json.JSONArray cannot be cast to scala.runtime.Nothing$
at repl.MdocSession$App.<init>(json test.worksheet.sc:38)
at repl.MdocSession$.app(json test.worksheet.sc:3)
Any ideas out there?

val arg = pathList.read[net.minidev.json.JSONArray]("$..*")
Needed a cast...

Related

Creating a DataFrame containing instances of a case class with integer attributes

Spark fails to convert a sequence of such instances to a DataFrame/Dataset, which is not a great developer experience.
Is this a bug or an expected feature?
Consider an example:
import spark.implicits._
case class Test(`9`: Double)
This fails:
val failingTest = Seq(Test(9.0)).toDF()
This works fine:
val successfulTest = Seq(9.0).toDF("9").as[Test]
successfulTest.show()
If you see the prerequisites of declaring a variable and how you name them, then how you are specifying a variable name is not allowed.
The variable name must start with a letter and cannot begin with a number or other characters
Just change your case class to use string variable name without symbol as below and it should work.
import spark.implicits._
case class Test(Id: Double)
val failingTest = Seq(Test(9.0)).toDS()
The above line of code would give you a dataset of Test as below :
failingTest: org.apache.spark.sql.Dataset[Test] = [Id: double]
val failingTest1 = Seq(test).toDF()
The above line would give you a Dataset of Row which is DataFrame as below:
failingTest1: org.apache.spark.sql.DataFrame = [Id: double]
It is giving you error because you are not providing the name in proper format.
You could also consider it as bug as it is not giving you error when you create a case class which I think it should have done.
Your code would also work if you just remove the value 9 and use any character as below:
import spark.implicits._
case class Test1(`i`: Double)
val failingTest = Seq(Test1(9.0)).toDF()

Gatling: Dynamically assemble HttpCheck for multiple Css selectors

I am working on a Gatling test framework that can be parameterized through external config objects. One use case I have is that there may be zero or more CSS selector checks that need to be saved to variables. In my config object, I've implemented that as a Map[String,(String, String)], where the key is the variable name, and the value is the 2-part css selector.
I am struggling with how to dynamically assemble the check. Here's what I got so far:
val captureMap: Map[String, (String, String)] = config.capture
httpRequestBuilder.check(
captureMap.map((mapping) => {
val varName = mapping._1
val cssSel = mapping._2
css(cssSel._1, cssSel._2).saveAs(varName)
}).toArray: _* // compilation error here
)
The error I'm getting is:
Error:(41, 10) type mismatch;
found : Array[io.gatling.core.check.CheckBuilder[io.gatling.core.check.css.CssCheckType,jodd.lagarto.dom.NodeSelector,String]]
required: Array[_ <: io.gatling.http.check.HttpCheck]
}).toArray: _*
apparently, I need to turn my CheckBuilder into a HttpCheck, so how do I do that?
Update:
I managed to get it to work by introducing a variable of type HttpCheck and returning it in the next line:
httpRequestBuilder.check(
captureMap.map((mapping) => {
val varName = mapping._1
val cssSel = mapping._2
val check:HttpCheck= css(cssSel._1, cssSel._2).saveAs(varName)
check
}).toArray: _*
)
While this works, it's ugly as hell. Can this be improved?
I had the same issue.
I had the following imports:
import io.gatling.core.Predef._
import io.gatling.http.Predef.http
I changed these imports to:
import io.gatling.core.Predef._
import io.gatling.http.Predef._
import io.gatling.http.request.builder.HttpRequestBuilder.toActionBuilder
which made it work.

How can i process the empty strings present in records and get if processed via Spark-Scala?

Below is the structure of my table.I have a bunch of records present in the below table.
Products:
product_id|product_category_id|product_name|product_descrition|product_price|product_image
I wanted to sort the data based on product_price, Since it contains null data i am getting the below exception.How can i achieve it in Spark-Scala?
val productsRDD = sc.textFile("/user/cloudera/products")
productsRDD.map(rec=>rec.split(",")(4).toFloat,rec).sortByKey().take(5).foreach(println)
Exception:
java.lang.NumberFormatException: empty String
You can use the filter or filterNot method to filter the empty strings like this :
val productsRDD = sc.textFile("/user/cloudera/products")
import scala.util.Try
productsRDD.map{rec=>
val floatValue = Try(rec.split(",")(4).toFloat).toOption
(floatValue,rec)
}.filter(_._1.isDefined).map(a=>(a._1.get,a._2)).sortByKey().take(5).foreach(println)
P.s: The code is not tested ! but it should work !
If you want to preserve the data instead of filtering it out you can try Try's and Options:
import scala.util.Try
val productsRDD = sc.textFile("/user/cloudera/products")
productsRDD.map(rec=> (Try(rec.split(",")(4).toFloat).toOption, rec)).sortByKey().take(5).foreach(println)
If you want to set a default value, you can try something like this:
import scala.util.Try
val default = Float.MaxValue
val productsRDD = sc.textFile("/user/cloudera/products")
productsRDD.map(rec=> (Try(rec.split(",")(4).toFloat).getOrElse(default), rec)).sortByKey().take(5).foreach(println)
Try this approach using 0 in case of null or calling as instance of:
def nullOrFloat(x : String) : Float = x match {
case x:String => java.lang.Float.parseFloat(x)
case null => null.asInstanceOf[Float]
}
val productsRDD = sc.textFile("/user/cloudera/products")
productsRDD.map(rec=> ( nullOrFloat(rec.split(",")(4)),rec)).sortByKey().take(5).foreach(println)

IntelliJ error with Scala function: "cannot resolve reference format with such signature"

IntelliJ complains about this code:
val document: Node // (initialized further up in the code)
val s: String = (new scala.xml.PrettyPrinter(80, 4)).format(document))
With the error:
Cannot resolve reference format with such signature
However - such a function exists. It has a default value for the second parameter and it seems IntelliJ isn't identifying it correctly.
I am not sure about that specific error you mention, but you have one parenthesis too many. You have:
val s: String = (new scala.xml.PrettyPrinter(80, 4)).format(document))
It should be:
val s: String = (new scala.xml.PrettyPrinter(80, 4)).format(document)
I just tried you code in sbt (once I made that correction) and it seems fine:
scala> import scala.xml._
import scala.xml._
scala> val document : Node = <test>blah</test>
document: scala.xml.Node = <test>blah</test>
scala> val s: String = (new PrettyPrinter(80, 4)).format(document)
s: String = <test>blah</test>

Reading Basic File Attributes in Scala?

I'm trying to get basic file attributes using Scala, and my reference is this Java question:
Determine file creation date in Java
and this piece of code I'm trying to rewrite in Scala:
static void getAttributes(String pathStr) throws IOException {
Path p = Paths.get(pathStr);
BasicFileAttributes view
= Files.getFileAttributeView(p, BasicFileAttributeView.class)
.readAttributes();
System.out.println(view.creationTime()+" is the same as "+view.lastModifiedTime());
}
The thing I just can't figure out is this line of code..I don't understand how to pass a class in this way using scala... or why Java is insisting upon this in the first place instead of using an actual constructed object as the parameter. Can someone please help me write this line of code to function properly? I must be using the wrong syntax
val attr = Files.readAttributes(f,Class[BasicFileAttributeView])
Try this:
def attrs(pathStr:String) =
Files.getFileAttributeView(
Paths.get(pathStr),
classOf[BasicFileAttributes] //corrected
).readAttributes
Get file creation date in Scala, from Basic Files Attributes:
// option 1,
import java.nio.file.{Files, Paths}
import java.nio.file.attribute.BasicFileAttributes
val pathStr = "/tmp/test.sql"
Files.readAttributes(Paths.get(pathStr), classOf[BasicFileAttributes]).creationTime
res3: java.nio.file.attribute.FileTime = 2018-03-06T00:25:52Z
// option 2,
import java.nio.file.{Files, Paths}
import java.nio.file.attribute.BasicFileAttributeView
val pathStr = "/tmp/test.sql"
{
Files
.getFileAttributeView(Paths.get(pathStr), classOf[BasicFileAttributeView])
.readAttributes.creationTime
}
res20: java.nio.file.attribute.FileTime = 2018-03-07T19:00:19Z