I am trying to read some DynamoDb data and update them using spark/scala.
I am reading the data in Json like this:
{
"2021-11-24": {
"Execution_Steps": {
"Step_1": "OK",
"Step_2": "RUNNING"
},
"status": "RUNNING",
"start_date": "2021-11-25 00:00:00"
},
"2021-11-20": {
"end_date": "2021-11-25 01:00:00",
"status": "OK",
"start_date": "2021-11-25 00:00:00"
}
}
Using jackson I could serialize it to Maps
val dataMap = mapper.readValue(jsonData, classOf[Map[String, String]])
So i got this map:
Map(2021-11-20 -> Map(end_date -> 2021-11-25 01:00:00, status -> OK, start_date -> 2021-11-25 00:00:00), 2021-11-24 -> Map(Execution_Steps -> Map(Step_2-> RUNNING, Step_1-> OK), status -> RUNNING, start_date -> 2021-11-25 00:00:00))
How can I update the Map "2021-11-24"."Execution_Steps"."Step_2" to OK instead of Running?
Thanks!
Well assuming your starting point:
import com.fasterxml.jackson.databind.{JsonNode, ObjectMapper}
import com.fasterxml.jackson.databind.node.ObjectNode
val json = "{\n \"2021-11-24\": {\n \"Execution_Steps\": {\n \"Step_1\": \"OK\",\n \"Step_2\": \"RUNNING\"\n },\n \"status\": \"RUNNING\",\n \"start_date\": \"2021-11-25 00:00:00\"\n },\n \"2021-11-20\": {\n \"end_date\": \"2021-11-25 01:00:00\",\n \"status\": \"OK\",\n \"start_date\": \"2021-11-25 00:00:00\"\n }\n}"
You can work with the JsonNode directly and go into a Map[String,String] as a final step
val objectMapper = new ObjectMapper()
val jsonNode = objectMapper.readTree(json)
def update(objectNode: JsonNode): JsonNode = {
jsonNode
.get("2021-11-24")
.get("Execution_Steps")
.asInstanceOf[ObjectNode]
.put("Step_2", "OK")
objectNode
}
val updated =
update(jsonNode)
objectMapper
.treeToValue(updated, classOf[java.util.Map[String,String]])
Note:get may return null. Maybe use Option?
This API doesn't really go well with Scala's immutability philosophy.
I solved the problem with gatear's help,
Instead of read data like this:
val dataMap = mapper.readValue(jsonData, classOf[Map[String, String]])
I read in a ObjectNode:
val dataMap = mapper.readTree(jsonData).asInstanceOf[ObjectNode]
Then I was able to update using the command:
dataMap.get("2021-11-24").asInstanceOf[ObjectNode].get("Execution_Steps").asInstanceOf[ObjectNode].put("Step_2", "OK")
Related
I am trying to write a JSON file using spark. There are some keys that have null as value. These show up just fine in the DataSet, but when I write the file, the keys get dropped. How do I ensure they are retained?
code to write the file:
ddp.coalesce(20).write().mode("overwrite").json("hdfs://localhost:9000/user/dedupe_employee");
part of JSON data from source:
"event_header": {
"accept_language": null,
"app_id": "App_ID",
"app_name": null,
"client_ip_address": "IP",
"event_id": "ID",
"event_timestamp": null,
"offering_id": "Offering",
"server_ip_address": "IP",
"server_timestamp": 1492565987565,
"topic_name": "Topic",
"version": "1.0"
}
Output:
"event_header": {
"app_id": "App_ID",
"client_ip_address": "IP",
"event_id": "ID",
"offering_id": "Offering",
"server_ip_address": "IP",
"server_timestamp": 1492565987565,
"topic_name": "Topic",
"version": "1.0"
}
In the above example keys accept_language, app_name and event_timestamp have been dropped.
Apparently, spark does not provide any option to handle nulls. So following custom solution should work.
import com.fasterxml.jackson.module.scala.DefaultScalaModule
import com.fasterxml.jackson.module.scala.experimental.ScalaObjectMapper
import com.fasterxml.jackson.databind.ObjectMapper
case class EventHeader(accept_language:String,app_id:String,app_name:String,client_ip_address:String,event_id: String,event_timestamp:String,offering_id:String,server_ip_address:String,server_timestamp:Long,topic_name:String,version:String)
val ds = Seq(EventHeader(null,"App_ID",null,"IP","ID",null,"Offering","IP",1492565987565L,"Topic","1.0")).toDS()
val ds1 = ds.mapPartitions(records => {
val mapper = new ObjectMapper with ScalaObjectMapper
mapper.registerModule(DefaultScalaModule)
records.map(mapper.writeValueAsString(_))
})
ds1.coalesce(1).write.text("hdfs://localhost:9000/user/dedupe_employee")
This will produce output as :
{"accept_language":null,"app_id":"App_ID","app_name":null,"client_ip_address":"IP","event_id":"ID","event_timestamp":null,"offering_id":"Offering","server_ip_address":"IP","server_timestamp":1492565987565,"topic_name":"Topic","version":"1.0"}
If you are on Spark 3, you can add
spark.sql.jsonGenerator.ignoreNullFields false
ignoreNullFields is an option to set when you want DataFrame converted to json file since Spark 3.
If you need Spark 2 (specifically PySpark 2.4.6), you can try converting DataFrame to rdd with Python dict format. And then call pyspark.rdd.saveTextFile to output json file to hdfs. The following example may help.
cols = ddp.columns
ddp_ = ddp.rdd
ddp_ = ddp_.map(lambda row: dict([(c, row[c]) for c in cols])
ddp_ = ddp.repartition(1).saveAsTextFile(your_hdfs_file_path)
This should produce output file like,
{"accept_language": None, "app_id":"123", ...}
{"accept_language": None, "app_id":"456", ...}
What's more, if you want to replace Python None with JSON null, you will need to dump every dict into json.
ddp_ = ddp_.map(lambda row: json.dumps(row, ensure.ascii=False))
Since Spark 3, and if you are using the class DataFrameWriter
https://spark.apache.org/docs/latest/api/java/org/apache/spark/sql/DataFrameWriter.html#json-java.lang.String-
(same applies for pyspark)
https://spark.apache.org/docs/3.0.0-preview/api/python/_modules/pyspark/sql/readwriter.html
its json method has an option ignoreNullFields=None
where None means True.
So just set this option to false.
ddp.coalesce(20).write().mode("overwrite").option("ignoreNullFields", "false").json("hdfs://localhost:9000/user/dedupe_employee")
To retain null values converting to JSON please set this config option.
spark = (
SparkSession.builder.master("local[1]")
.config("spark.sql.jsonGenerator.ignoreNullFields", "false")
).getOrCreate()
I work with scala play and I use WS to make get a response from an URL.
My JSON example :
[
{
"object": "001",
"object-description": "MODEL",
"criterion": "TW3",
"criterion-description": "MODELE X07"
},
{
"object": "002",
"object-description": "TYPE",
"criterion": "STANDA",
"criterion-description": "STANDARD TYPE"
}, ...
I want to get only "criterion" field where "object" equal "002". So, in this example the value "STANDA".
A Test:
ws.url(
url)
.get()
.map { response =>
Right((response.json \ "object="002"" \\ "criterion").map(_.as[String]))
}
How I can do that ?
Thanks for your help.
Your can transform the whole response into scala classes using automated formatters and then operate on those.
case class Data(`object`: String, criterion: String)
implicit val dataRead = Json.reads[Data]
response.json.as[List[Data]]
.filter(_.`object` == "002")
.map(_.criterion)
I've been trying to make groovy script that parses jdbc and REST response, put the results in a model and then compare them. I am following this answer: Dynamically compare Rest XML/JSON response and JDBC using groovy array in SoapUI, but not with much success. My jdbc response is below:
<Results>
<ResultSet fetchSize="128">
<Row rowNumber="1">
<ID>BCE448A4DFB94C6892D957DB8275D2AC</ID>
<NAME>SevDealRecord</NAME>
<AMOUNT/>
<CREATIONDATE>2012-06-20 11:31:48.0</CREATIONDATE>
<MODIFICATIONDATE>2012-06-20 15:20:02.0</MODIFICATIONDATE>
<CURRENCY>EUR</CURRENCY>
<REFERENCEDATE>2012-06-20 00:00:00.0</REFERENCEDATE>
<STATUSCODE>DPE_2</STATUSCODE>
<STATUSDESCRIPTION>2 - Preliminary evaluation in progress (Direct PE)</STATUSDESCRIPTION>
<ASSIGNEDTOUSERIQID>E506565555A6486FBA8FCC431F4E979E</ASSIGNEDTOUSERIQID>
<ASSIGNTOUSERDISPLAYNAME>NMISO</ASSIGNTOUSERDISPLAYNAME>
<WORKFLOWID>140AE208F9334FB9946BFEAF5C89CE18</WORKFLOWID>
<WORKFLOWNAME>1 - Direct Private Equity</WORKFLOWNAME>
</Row>
<Row rowNumber="2">
<ID>D4DBB1B906A04DE49AB1FF3EE4653180</ID>
<NAME>T28678</NAME>
<AMOUNT/>
<CREATIONDATE>2012-06-21 13:45:36.0</CREATIONDATE>
<MODIFICATIONDATE>2012-06-21 13:46:03.0</MODIFICATIONDATE>
<CURRENCY>EUR</CURRENCY>
<REFERENCEDATE>2012-06-21 00:00:00.0</REFERENCEDATE>
<STATUSCODE>DRAFT</STATUSCODE>
<STATUSDESCRIPTION>Draft{F}Brouillon</STATUSDESCRIPTION>
<ASSIGNEDTOUSERIQID>E506565555A6486FBA8FCC431F4E979E</ASSIGNEDTOUSERIQID>
<ASSIGNTOUSERDISPLAYNAME>NMISO</ASSIGNTOUSERDISPLAYNAME>
<WORKFLOWID/>
<WORKFLOWNAME/>
</Row>
And here is REST response:
[{
"id": "12CF6F8DA3B148D98D63A428EC7F8D7B",
"name": "アコム株式会社",
"amount1": null,
"creationDate": null,
"modificationDate": "2019-01-14T16:28:21.027+00:00",
"currency": "USD",
"referenceDate": "2019-01-04T00:00:00+00:00",
"status": {
"code": "DRAFT",
"description": "Draft"
},
"assignedToUser": {
"id": "E506565555A6486FBA8FCC431F4E979E",
"displayName": "NMISO"
},
"assignedToGroup": null,
"workflow": null
}, {
"id": "AA4F19E5C8B34222865EFED293D52146",
"name": "Lürssen",
"amount1": null,
"creationDate": null,
"modificationDate": "2019-01-14T16:28:20.963+00:00",
"currency": "USD",
"referenceDate": "2019-01-04T00:00:00+00:00",
"status": {
"code": "DRAFT",
"description": "Draft"
},
"assignedToUser": {
"id": "E506565555A6486FBA8FCC431F4E979E",
"displayName": "NMISO"
},
"assignedToGroup": null,
"workflow": null
},
What I tried:
#groovy.transform.Canonical
class Model {
def id
def name
def amount1
def creationDate
def modificationDate
def currency
def referenceDate
def statusCode
def statusDescription
def assignedToUserIqid
def assignedToUserDisplayName
def assignedToGroup
def workflowId
def workflowName
// this will accept jdbc row
def buildJdbcData(row) {
row.with {
id = ID
name = NAME
amount1 = AMOUNT
creationDate = CREATIONDATE
modificationDate = MODIFICATIONDATE
currency = CURRENCY
referenceDate = REFERENCEDATE
statusCode = STATUSCODE
statusDescription = STATUSDESCRIPTION
assignedToUserDisplayName = ASSIGNTOUSERDISPLAYNAME
assignedToGroup = ASSIGNTOUSERDISPLAYNAME
workflowId = WORKFLOWID
workflowName = WORKFLOWNAME
}
}
def buildJsonData(slurp){
id = slurp.id
name = slurp.name
amount1 = slurp.amount1
creationDate = slurp.creationDate
modificationDate = slurp.modificationDate
currency = slurp.currency
referenceDate = slurp.referenceDate
statusCode = slurp.status.code
statusDescription = slurp.status.description
assignedToUserIqid = slurp.assignedToUser.id
assignedToUserDisplayName = slurp.assignedToUser.displayName
assignedToGroup = slurp.assignedToGroup
workflowId = slurp.workflow
}
}
def jdbcResponse = context.expand('${JDBC_DealList#ResponseAsXml}')
def results = new XmlSlurper().parseText(jdbcResponse)
def jdbcDataObjects = []
results.ResultSet.Row.each { row ->
jdbcDataObjects.add(new Model().buildJdbcData(row)) //Objects not added properly to the model
}
log.info jdbcDataObjects
def jsonResponse = testRunner.testCase.testSteps["Deals"].testRequest.response.contentAsString
def jsonObjects = new JsonSlurper().parseText(jsonResponse)
log.info jsonObjects
def jsonDataObjects = [] jsonDataObjects.add(new Model().buildJsonData(jsonObjects))
Now, the log.info jdbcDataObjects is giving me the WORKFLOWNAME elements from jdbc response. And log.info jsonObjects is giving me the whole JSON model, and I am not sure how to add all the elements to the above defined Model? Some help would be much appreciated.
In previous project ,we did this but for Soap not REST.but I believe, you can follow same approach.
1.we put query in Execel -ESMQuery1
2.Execel had 2 columns. First node to compare,second DB value to compare.
Eg- //soap/xmlnode1 ESMQuery1(UserName)
Here UserName is column name.
3.You simply need to create loop for all nodes mentioned and resolve ESMQuery(UserName).
Instead of xmlpath you can use Json path.
Thanks.
How can I read a JSON file into a Map, using Scala. I've been trying to accomplish this but the JSON I am reading is nested JSon and I have not found a way to easily extract the JSON into keys because of that. Scala seems to be wanting to also convert the nested JSON String into an object. Instead, I want the nested JSON as a String "value". I am hoping someone can clarify or give me a hint on how I might do this.
My JSON source might look something like this:
{
"authKey": "34534645645653455454363",
"member": {
"memberId": "whatever",
"firstName": "Jon",
"lastName": "Doe",
"address": {
"line1": "Whatever Rd",
"city": "White Salmon",
"state": "WA",
"zip": "98672"
},
"anotherProp": "wahtever",
}
}
I want to extract this JSON into a Map of 2 keys without drilling into the nested JSON. Is this possible? Once I have the Map, my intention is to add the key-values to my POST request headers, like so:
val sentHeaders = Map("Content-Type" -> "application/javascript",
"Accept" -> "text/html", "authKey" -> extractedValue,
"member" -> theMemberInfoAsStringJson)
http("Custom headers")
.post("myUrl")
.headers(sentHeaders)
Since the question is tagged 'gatling', behind the curtains this lib depends on Jackson/fasterxml for JSON processing, so we can make use of it.
There is no way to retrieve a nested structured part of JSON as String directly, but with very few additional code the result can still be achieved.
So, having the input JSON:
val json = """{
| "authKey": "34534645645653455454363",
| "member": {
| "memberId": "whatever",
| "firstName": "Jon",
| "lastName": "Doe",
| "address": {
| "line1": "Whatever Rd",
| "city": "White Salmon",
| "state": "WA",
| "zip": "98672"
| },
| "anotherProp": "wahtever"
| }
|}""".stripMargin
A Jackson's ObjectMapper can be created and configured for use in Scala:
// import com.fasterxml.jackson.module.scala.DefaultScalaModule
val mapper = new ObjectMapper().registerModule(DefaultScalaModule)
To parse the input json easily, a dedicated case class is useful:
case class SrcJson(authKey: String, member: Any) {
val memberAsString = mapper.writeValueAsString(member)
}
We also include val memberAsString in it, which will contain our target JSON string, obtained through a reverse conversion from initially parsed member which actually is a Map.
Now, to parse the input json:
val parsed = mapper.readValue(json, classOf[SrcJson])
The references parsed.authKey and parsed.memberAsString will contain the researched values.
have a look at the scala play library - it has support for handling JSON. From what you describe, it should be pretty straightforward to read in the JSON and get the string value from any desired node.
Scala Play - JSON
I am trying to update several document fields and return full document after update.
I use elastic4s 1.3.4, elasticsearch 1.4.3 (as server).
Here is a code:
import scala.concurrent.ExecutionContext.Implicits.global
object ElasticsearchTester extends App {
private val settings: Settings = ImmutableSettings.settingsBuilder().put("cluster.name", "clustername").build()
private val client: ElasticClient = ElasticClient.remote(settings, ("localhost", 9300))
val initial = """
|{
| "name":"jojn",
| "surname":"olol"
|}
""".stripMargin
val updateString = """
|{
| "surname":"123",
| "global": {
| "new":"fiedl"
| }
|}
""".stripMargin
import com.sksamuel.elastic4s.ElasticDsl._
val future = client.execute {
create index "my_index"
}.flatMap { r=>
client.execute {
index into "my_index/user" doc StringDocumentSource(initial)
}.flatMap { re=>
println("Ololo indexed is: " + initial)
println("Ololo indexed id: " + re.getId)
client.execute {
update id re.getId in "my_index/user" doc StringDocumentSource(updateString) docAsUpsert true params ("fields" -> "_source")
}.map{res=>
println("Ololo result is: " + res.getGetResult.sourceAsString())
}
}
}
Await.result (future, 20.seconds)
println("Ololo ok")
}
Why I get NullPointerException in line res.getGetResult.sourceAsString()? It seems that update response do not contains a document after update operation.
Is it possible to return document _source from update response?
Elastic4s seem has no api in UpdateDefinition (for now 23.07.2015) to set fields. However it's builder support this operation, the code below is like dirty hack, but it works as exepcted, just set fields directly into _builder:
val updateRequest = update id re.getId in "my_index/user" doc StringDocumentSource(updateString) docAsUpsert true
updateRequest._builder.setFields("_source")
client.execute {
updateRequest
}.map { res=>
println("Ololo result is: " + res.getGetResult.sourceAsString())
}
}
prints
Ololo indexed id: AU66n1yiYVxOgU2h4AoG
Ololo result is: {"name":"jojn","surname":"123","global":{"new":"fiedl"}}
Ololo ok
Note
Elasticsearch does support returning fields after update request.
Updated
Elastic4s after this commit support this via UpdateDsl.includeSource or UpdateDsl.setFields methods.