Parse Json file on S3 using Json Play using Scala - scala

I want to access a json file from S3 using json play fromework
val creds:DefaultAWSCredentialsProviderChain = new DefaultAWSCredentialsProviderChain
val s3Client = new AmazonS3Client(creds)
val uri: AmazonS3URI = new AmazonS3URI(conf_file)
val s3Object: S3Object = s3Client.getObject(uri.getBucket, uri.getKey)
val json = Json.parse(s3Object.getObjectContent)
val mylist = (json \ "mydata").get.as[List[JsValue]]
But this line gives an error
val mylist = (json \ "mydata").get.as[List[JsValue]]
as
no such element "mydata"
Can anyone tell how to access a json file and read its contents using json play in scala.
I am able to access same file from local machine, as well as fetch contents of "mydata" from within json

Did you tried to Print first the object and check if it is formatted properly to JSON, because everything seems works fine to me.
val json: JsValue = Json.parse("""{
"mydata": [
{"first": "aa"},
{"second": "bb"},
{"third": "cc"}
]
}""")
Try to implement something like this.
(json \ "mydata").asOpt[Seq[JsValue]].getOrElse(None)

Related

How to store values into a dataframe from a List using Scala for handling nested JSON data

I have the below code, where I am pulling data from an API and storing it into a JSON file. Further I will be loading into an oracle table. Also, the data value of the ID column is the column name under VelocityEntries. I am able to print the data in completedEntries.values but I need help to put it into one df and add with the emedded_df
With the below code I am able to print the data in completedEntries.values but I need help to put it into one df and add with the emedded_df
val inputStream = scala.io.Source.fromInputStream(connection.getInputStream).mkString
val fileWriter1 = new FileWriter(new File(filename))
fileWriter1.write(inputStream.mkString)
fileWriter1.close()
val json_df = spark.read.option("multiLine", true).json(filename)
val embedded_df = json_df.select(explode(col("sprints")) as "x").select(("x.*"))
val list_df = json_df.select("velocityStatEntries.*").columns.toList
for( i <- list_df)
{
completed_df = json_df.select(s"velocityStatEntries.$i.completed.value")
completed_df.show()
}

How to send JSON response in Spark

My JSON file(input.json) looks like below.
{"first_name":"Sabrina","last_name":"Mayert","email":"donny54#yahoo.com"}
{"first_name":"Taryn","last_name":"Dietrich","email":"donny54#yahoo.com"}
My Scala code looks like below. Here I am trying to return first_name and last_name based on email.
val conf = new SparkConf().setAppName("RowCount").setMaster("local")
val sc = new SparkContext(conf)
val sqlContext = new SQLContext(sc)
val input = sqlContext.read.json("input.json")
val data = input
.select("first_name", "last_name")
.where("email=='donny54#yahoo.com'")
.toJSON
data.write.json("input2")
sc.stop
complete(data.toString)
data.write.json("input2") creating file looks like below
{"value":"{\"first_name\":\"Sabrina\",\"last_name\":\"Mayert\"}"}
{"value":"{\"first_name\":\"Taryn\",\"last_name\":\"Dietrich\"}"}
complete(data.toString) returning response [value: string]
How can I get response array of JSON object.
[{"first_name":"Sabrina","last_name":"Mayer"},{"first_name":"Taryn","last_name":"Dietrich"}]
Thanks for help in advance.
You are converting to json twice. Do not use the json conversion twice, and you should get your desired output:
val data = input
.select("first_name", "last_name")
.where("email=='donny54#yahoo.com'")
data.write.json("input2")
Output:
{"first_name":"Sabrina","last_name":"Mayert"}
{"first_name":"Taryn","last_name":"Dietrich"}
Does this solve your issue, or do you specifically need to convert it to an array?

How to convert spark response to JSON object

val conf = new SparkConf().setAppName("test").setMaster("local")
val sc = new SparkContext(conf)
val sqlContext = new SQLContext(sc)
val input = sqlContext.read.json("input.json")
input.select("email", "first_name").where("email=='donny54#yahoo.com'").show()
I am getting following response
How can I get response as a JSON Object?
You can write it to Json File : https://www.tutorialkart.com/apache-spark/spark-write-dataset-to-json-file-example/
Or if you prefer to show it as a Dataset of Json Strings, use the toJSON function :
input
.select("email", "first_name")
.where("email=='donny54#yahoo.com'")
.toJSON()
.show()

Parse JSON data with Apache Spark and Scala

I have this type of file with data where each line is a JSON object except first few words(see attached image). I want to parse this type of file using Spark and Scala. I have tried it using sqlContext.read.json(“path to json file”) but it gives me error(corrupt data) because whole data is not a JSON object. How do I parse this JSON file to SQL dataframe?
Try this:
val rawRdd = sc.textFile("path-to-the-file")
val jsonRdd = rawRdd.map(_.substring(32)) //32 - number of first characters to ignore
val df = spark.read.json(jsonRdd)

spark scala dataframes - create an object with attributes in a json file

I have a json file of the format
{"x1": 2, "y1": 6, "x2":3, "y2":7}
I have a scala class
class test(int:x, int:y)
Using spark I am trying to read this file and create two test objects for each line from the json file. for example
{"x1": 2, "y1": 6, "x2":3, "y2":7} should create
test1 = new test(2,6) and
test2 = new test(3,7)
Then for each line of the json file, i want to call a function that takes two test objects as parameters. Example callFunction(test1,test2)
How do i do this with spark. I see method that will convert rows in json file to list of objects but no way to create multiple objects using attributes in a single row of json file
val conf = new SparkConf()
.setAppName("Example")
.setMaster("local")
val sc = new SparkContext(conf)
val sqlContext = new SQLContext(sc)
import sqlContext.implicits._
val coordinates = sqlContext.read.json("c:/testfile.json")
//NOT SURE HOW TO DO THE FOLLOWING
//test1 = new Test(attr1 of json file, attr2 of json file)
//test2 = new Test(attr3 of json file, attr4 of json file)
//callFunction(test1,test2)
//collect the result of callFunction