How do you parse string from arraybuffer to double using Scala? - scala

I'm trying to map string to double from an ArrayBuffer that I parsed through Playframework but I keep getting the following error:
Exception in thread "main" java.lang.NumberFormatException: For input string: ""0.04245800""
I'm not sure why it's doing this and I'm new to Scala coming from Python background.
import org.apache.http.client.methods.HttpGet
import play.api.libs.json._
import org.apache.http.impl.client.DefaultHttpClient
object main extends App {
val url = "https://api.binance.com/api/v1/aggTrades?symbol=ETHBTC"
val httpClient = new DefaultHttpClient()
val httpResponse = httpClient.execute(new HttpGet(url))
val entity = httpResponse.getEntity()
val content = ""
if (entity !=null) {
val inputStream = entity.getContent()
val result = io.Source.fromInputStream(inputStream).getLines.mkString
inputStream.close
println("REST API: " + url)
val json: JsValue = Json.parse(result)
var prices = (json\\"p")
println(prices.map(_.toString()).map(_.toDouble))
}
}

If you know for sure your list contains only strings you can cast them like this, and use the 'original' value to get the Double value from:
println(prices.map(_.as[JsString].value.toDouble))
As JsString is not a String you cannot call toDouble on that.
Just for completeness: If you are not certain your list contains only strings you should add an instanceof check or pattern matching.

Related

How to get creation date of a file in a Scala dataframe

How to print the date of a file in Scala is explained here.
My question is how I can get a variable containing this information which can be returned as a column to a dataframe. None of the conversions I would expect to be allowed, actually are allowed.
My code (using Scala 2.11):
import org.apache.spark.sql.functions._
import java.nio.file.{Files, Paths} // Needed for file time
import java.nio.file.attribute.BasicFileAttributes
import java.util.Date
def GetFileTimeFunc(pathStr: String): String = {
// From: https://stackoverflow.com/questions/47453193/how-to-get-creation-date-of-a-file-using-scala
val FileTime = Files.readAttributes(Paths.get(pathStr), classOf[BasicFileAttributes]).creationTime;
val JavaDate = Date.from(FileTime.toInstant);
return(JavaDate.toString())
}
#transient val GetFileTime = udf(GetFileTimeFunc _)
val filePath = "dbfs:/mnt/myData/" // location of data
val file_df = dbutils.fs.ls(filePath).toDF // Output columns are $"path", $"name", and $"size"
.withColumn("FileTimeCreated", GetFileTime($"path"))
display(file_df)//.select("name", "size"))
Output:
SparkException: Failed to execute user defined function($anonfun$2: (string) => string)
For some reason, Instant is not allowed as a column type, so I cannot use it as a return type.The same for FileTime, JavDate, etc.

java.text.ParseException: Unparseable date: "Some(2014-05-14T14:40:25.950)"

I need to fetch the date from a file.
Below is my spark program:
import org.apache.spark.sql.SparkSession
import scala.xml.XML
import java.text.SimpleDateFormat
object Active6Month {
def main(args:Array[String]){
val format = new SimpleDateFormat("yyyy-MM-dd'T'hh:mm:ss.SSS")
val format1 = new SimpleDateFormat("yyyy-MM")
val spark = SparkSession.builder.appName("Active6Months").master("local").getOrCreate()
val data = spark.read.textFile("D:\\BGH\\StackOverFlow\\Posts.xml").rdd
val date = data.filter{line => {
line.toString().trim().startsWith("<row")
}}.filter{line=>{
line.contains("PostTypeId=\"1\"")
}}.map{line=>{
val xml = XML.loadString(line)
var closedDate = format1.format(format.parse(xml.attribute("ClosedDate").toString())).toString()
(closedDate,1)
}}.reduceByKey(_+_)
date.foreach(println)
spark.stop
}
}
And I am getting this error:
java.text.ParseException: Unparseable date: "Some(2014-05-14T14:40:25.950)"
The format of date in file is perfect i.e:
CreationDate="2014-05-13T23:58:30.457"
But in error it shows the String "Some" attached to it.
And my other question is why same working in below code:
val date = data.filter{line => {
line.toString().trim().startsWith("<row")
}}.filter{line=>{
line.contains("PostTypeId=\"1\"")
}}.flatMap{line=>{
val xml = XML.loadString(line)
xml.attribute("ClosedDate")
}}.map{line=>{
(format1.format(format.parse(line.toString())).toString(),1)
}}.reduceByKey(_+_)
My guess is that xml.attribute("ClosedDate").toString() is actually returning a string containing Some attached to it. Have you debugged that to make sure?
Maybe you shouldn't use toString(), but instead, get the attribute value, by using the proper method.
Or you can do it the "ugly" way and include "Some" in the pattern:
val format = new SimpleDateFormat("'Some('yyyy-MM-dd'T'hh:mm:ss.SSS')'")
Your second approach works because (and that's a guess because I don't code in Scala), probably the xml.attribute("ClosedDate") method returns an object, and calling toString() on this object returns the string with "Some" attached to it (why? ask the API authors). But when you use map on this object, it sets the line variable to the correct value (without the "Some" part).

Extracting a field from a Json string using jackson mapper in Scala

I have a json string:
val message = "{\"me\":\"a\",
\"version\":\"1.0\",
\"message_metadata\": \"{
\"event_type\":\"UpdateName\",
\"start_date\":\"1515\"}\"
}"
I want to extract the value of the field event_type from this json string.
I have used below code to extract the value:
val mapper = new ObjectMapper
val root = mapper.readTree(message)
val metadata =root.at("/message_metadata").asText()
val root1 = mapper.readTree(metadata)
val event_type =root1.at("/event_type").asText()
print("eventType:" + event_type.toString) //UpdateName
This works fine and I get the value as UpdateName. But I when I want to get the event type in a single line as below:
val mapper = new ObjectMapper
val root = mapper.readTree(message)
val event_type =root.at("/message_metadata/event_type").asText()
print("eventType:" + event_type.toString) //Empty string
Here event type returns a empty sting. This might be because of the message_metadata has Json object as a string value. Is there a way I can get the value of event_type in a single line?
The problem is that your JSON message contains an object who's message_metadata field itself contains JSON, so it must be decoded separately. I'd suggest that you don't put JSON into JSON but only encode the data structure once.
Example:
val message = "{\"me\":\"a\",
\"version\":\"1.0\",
\"message_metadata\": {
\"event_type\":\"UpdateName\",
\"start_date\":\"1515\"
}
}"
You can parse your JSON using case classes and then get your event_type field from there.
case class Json(me: String, version: String, message_metadata: Message)
case class Message(event_type: String, start_date: String)
object Mapping {
def main(args: Array[String]): Unit = {
import com.fasterxml.jackson.databind.ObjectMapper
import com.fasterxml.jackson.module.scala.DefaultScalaModule
import com.fasterxml.jackson.module.scala.experimental.ScalaObjectMapper
val objectMapper = new ObjectMapper() with ScalaObjectMapper
objectMapper.registerModule(DefaultScalaModule)
val str = "{\n \"me\": \"a\",\n \"version\": \"1.0\",\n \"message_metadata\": {\n \"event_type\": \"UpdateName\",\n \"start_date\": \"1515\"\n }\n}"
val json = objectMapper.readValue(str, classOf[Json])
//to print event_type
println(json.message_metadata.event_type)
//output: UpdateName
}
}
You can even convert a JSON to Scala Case Class and then get the particular field from the case class.
Please find a working and detailed answer which I have provided using generics here.

Returning the custom return type upon exception in scala

I am trying to read xml using the following method to extract data from xml
def xmlparser(xml:String): (String,List[String]) =
Try {
val documentbuilder=DocumentBuilderFactory.newInstance.newDocumentBuilder
val xmldocument = documentbuilder.parse(new InputSource(new java.io.StringReader(xml)))
val nodesofchild=xmldocument.getChildNodes
val xmlvalues=extractvalues(nodesofchild)
("xmlname",xmlvalues)
}
I need to return ("xmlname",xmlvalues) if xml is valid ,else i need to return ("xmlname",null).I tried using ".toOption.orNull" but it is returning only "null".Could somebody help me how to return ("xmlname",null) instead of "null"
Instead of your current code:
def xmlparser(xml:String): (String, Option[List[String]]) =
val values = Try {
val documentbuilder=DocumentBuilderFactory.newInstance.newDocumentBuilder
val xmldocument = documentbuilder.parse(new InputSource(new java.io.StringReader(xml)))
val nodesofchild=xmldocument.getChildNodes
val xmlvalues=extractvalues(nodesofchild)
}
("xmlname", xmlvalues.toOption)
}

How does scala know how to return what it returns from function?

I ran the following code, and it prints out successfully the "URL" from the s3Service.createUnsignedObjectUrl method. My question is, how does the variable even get returned and stored into the "linkForText" variable? I read that scala functions usually need something like a ": Int" and a "return" within the "def"ined function. But I see none of that here. How is the store function able to do this?
package com.justthor
import org.jets3t.service.impl.rest.httpclient.RestS3Service
import org.jets3t.service.security.AWSCredentials
import org.jets3t.service.model.S3Object
import org.jets3t.service.acl.{ AccessControlList, GroupGrantee, Permission }
import java.io.InputStream
object Main extends App{
val classPath = "/"
// upload a simple text file
val textFilename = "test.txt"
val linkForText = store(textFilename, getClass.getResourceAsStream(s"$classPath$textFilename"))
// upload a cat image, taken from http://imgur.com/gallery/bTiwg
// set the content type to "image/jpg"
val imageFilename = "cat.jpg"
val linkForImage = store(imageFilename, getClass.getResourceAsStream(s"$classPath$imageFilename"), "image/jpg")
println(s"Url for the text file is $linkForText")
println(s"Url for the cat image is $linkForImage")
def store(key: String, inputStream: InputStream, contentType: String = "text/plain") = {
val awsAccessKey = "YOUR_ACCESS_KEY"
val awsSecretKey = "YOUR_SECRET_KEY"
val awsCredentials = new AWSCredentials(awsAccessKey, awsSecretKey)
val s3Service = new RestS3Service(awsCredentials)
val bucketName = "test-scala-upload"
val bucket = s3Service.getOrCreateBucket(bucketName)
val fileObject = s3Service.putObject(bucket, {
// public access is disabled by default, so we have to manually set the permission to allow read access to the uploaded file
val acl = s3Service.getBucketAcl(bucket)
acl.grantPermission(GroupGrantee.ALL_USERS, Permission.PERMISSION_READ)
val tempObj = new S3Object(key)
tempObj.setDataInputStream(inputStream)
tempObj.setAcl(acl)
tempObj.setContentType(contentType)
tempObj
})
s3Service.createUnsignedObjectUrl(bucketName,
fileObject.getKey,
false, false, false)
}
}
Inference and scala-isms.
The return type is inferred from the return value, which is the result of the final statement in a method/function.
So, in your case whatever is returned by the last line: s3Service.createUnsignedObjectUrl(...) is the value returned from store. And, as there is no branching, then the return type will be inferred from this value. And, if there is branching, then inferrence will take the least common denominator of the possible return types.