How to extract values from json string from api in scala? - scala

I am trying to extract specific value from each json in a response from api.
for example if I have http response is kind of string array as below:
[
{
"trackerType": "WEB",
"id": 1,
"appId": "ap-website",
"host": {
"orgId": "ap",
"displayName": "AP Mart",
"id": "3",
"tenantId": "ap"
}
},
{
"trackerType": "WEB",
"id": 2,
"appId": "test-website",
"host": {
"orgId": "t1",
"tenantId": "trn11"
}
}
]
I wanted to extract or keep only list of values app_id and tenant_id as below:
[
{
"appId": "ap-website",
"tenantId": "ap"
},
{
"appId": "test-website",
"tenantId": "trn11"
}
]

If your HTTP response is quite big and you wouldn't hold it all in memory then consider using IO streams for parsing the body and serialization of the result list.
Below is an example of how it can be done with the dijon library.
Add dependency to your build file:
libraryDependencies += "me.vican.jorge" %% "dijon" % "0.5.0+18-46bbb74d", // Use %%% instead of %% for Scala.js
Import following packages:
import com.github.plokhotnyuk.jsoniter_scala.core._
import dijon._
import scala.language.dynamics._
Parse your input stream transforming it value by value in the callback to the output stream:
val in = new java.io.ByteArrayInputStream(
"""
[
{
"trackerType": "WEB",
"id": 1,
"appId": "ap-website",
"host": {
"orgId": "ap",
"displayName": "AP Mart",
"id": "3",
"tenantId": "ap"
}
},
{
"trackerType": "WEB",
"id": 2,
"appId": "test-website",
"host": {
"orgId": "t1",
"tenantId": "trn11"
}
}
]
""".getBytes("UTF-8"))
val out = new java.io.BufferedOutputStream(System.out)
out.write('[')
scanJsonArrayFromStream[SomeJson](in) {
var writeComma = false
x =>
if (writeComma) out.write(',') else writeComma = true
val json = obj("appId" -> x.appId, "tenantId" -> x.host.tenantId)
writeToStream[SomeJson](json, out)(codec)
true
} (codec)
out.write(']')
out.flush()
You can try it with Scastie here
When using this code in your application, you need to replace the source and destination of input and output streams.
There are other options how to solve your task. Please add more context that will help us in selection of the most simple and efficient solution.
Feel free to comment - I will be happy to help you in tuning the solution to your needs.

Related

play framework json lookup inside array

I have simple json:
{
"name": "John",
"placesVisited": [
{
"name": "Paris",
"data": {
"weather": "warm",
"date": "31/01/22"
}
},
{
"name": "New York",
"data": [
{
"weather": "warm",
"date": "31/01/21"
},
{
"weather": "cold",
"date": "28/01/21"
}
]
}
]
}
as you can see in this json there is placesVisited field, and if name is "New York" the "data" field is a List, and if the name is "Paris" its an object.
what I want to do is to pull the placesVisited object where "name": "New York" and then I will parse it to a case class I have, I can't use this case class for both objects in placesVisited cause they have diff types for the same name.
so what I thought is to do something like:
(myJson \ "placesVisited") and here I need to add something that will give me element where name is "New York", how can I do that?
my result should be this:
{
"name": "New York",
"data": [
{
"weather": "warm",
"date": "31/01/21"
},
{
"weather": "cold",
"date": "28/01/21"
}
]
}
something like this maybe can happen but its horrible haha:
(Json.parse(myjson) \ "placesVisited").as[List[JsObject]].find(item => {
item.value.get("name").toString.contains("New York")
}).getOrElse(throw Exception("could not find New York element")).as[NewYorkModel]
item.value.get("name").toString can slightly be simplified to (item \ "name").as[String] but otherwise there's not much to improve.
Another option is to use a case class Place(name: String, data: JsValue) and do it like this:
(Json.parse(myjson) \ "placesVisited")
.as[List[Place]]
.find(_.name == "New York")

AWS Stepfunction: Substring from input

As a general question is it possible do a substring function within step functions?
I receive the following event:
{
"input": {
"version": "0",
"id": "d9c5fec0-d08d-6abd-4ea5-0107fbbce47d",
"detail-type": "EBS Multi-Volume Snapshots Completion Status",
"source": "aws.ec2",
"account": "12345678",
"time": "2021-11-12T12:08:16Z",
"region": "us-east-1",
"resources": [
"arn:aws:ec2::us-east-1:snapshot/snap-0a98c2a42ee266123"
]
}
}
but need the snapshot id as input to DescribeInstances, therefore I need to extract snap-0a98c2a42ee266123 from arn:aws:ec2::us-east-1:snapshot/snap-0a98c2a42ee266123
Is there any simple way to do this within step functions?. That is to say without having to pass it to a lambda or something equally convoluted?
This has recently become possible with the addition of new intrinsic functions. ArrayGetItem gets an item by its index. StringSplit splits a string at a delimeter. Use a Pass state to extract the snapshot name from the resource ARN:
{
"StartAt": "ExtractSnapshotName",
"States": {
"ExtractSnapshotName": {
"Type": "Pass",
"Parameters": {
"input.$": "$.input",
"snapshotName.$": "States.ArrayGetItem(States.StringSplit(States.ArrayGetItem($.input.resources,0), '/'),1)"
},
"ResultPath": "$",
"End": true
}
}
}
output:
{
"input": {
"version": "0",
"id": "d9c5fec0-d08d-6abd-4ea5-0107fbbce47d",
"detail-type": "EBS Multi-Volume Snapshots Completion Status",
"source": "aws.ec2",
"account": "12345678",
"time": "2021-11-12T12:08:16Z",
"region": "us-east-1",
"resources": [
"arn:aws:ec2::us-east-1:snapshot/snap-0a98c2a42ee266123"
]
},
"snapshotName": "snap-0a98c2a42ee266123"
}

Can't get Service Alerts Protobuff to include header_text or description_text using Python gtfs_realtime_pb2 module

We are having difficulty adding header_text and description_text to a Service Alerts protobuff file. We are attempting to match the example shown on this page here.
https://developers.google.com/transit/gtfs-realtime/examples/alerts
Our data starts in the following dictionary:
alerts_dict = {
"header": {
"gtfs_realtime_version": "1",
"timestamp": "1543318671",
"incrementality": "FULL_DATASET"
},
"entity": [{
"497": {
"active_period": [{
"start": 1525320000,
"end": 1546315200
}],
"url": "http://www.capmetro.org/planner",
"effect": 4,
"header_text": "South 183: Airport",
"informed_entity": [{
"route_type": "3",
"route_id": "17",
"trip": "",
"stop_id": "3304"
}, {
"route_type": "3",
"route_id": "350",
"trip": "",
"stop_id": "3304"
}],
"description_text": "Stop closed temporarily",
"cause": 2
},
"460": {
"active_period": [{
"start": 1519876800,
"end": 1546315200
}],
"url": "http://www.capmetro.org/planner",
"effect": 4,
"header_text": "Ave F / Duval Detour",
"informed_entity": [{
"route_type": "3",
"route_id": "7",
"trip": "",
"stop_id": "1167"
}, {
"route_type": "3",
"route_id": "7",
"trip": "",
"stop_id": "1268"
}],
"description_text": "Stop closed temporarily",
"cause": 2
}
}]
}
Our Python code is as follows:
newfeed = gtfs_realtime_pb2.FeedMessage()
newfeedheader = newfeed.header
newfeedheader.gtfs_realtime_version = '2.0'
for alert_id, alert_dict in alerts_dict["entity"][0].iteritems():
print(alert_id)
print(alert_dict)
newentity = newfeed.entity.add()
newalert = newentity.alert
newentity.id = str(alert_id)
newtimerange = newalert.active_period.add()
newtimerange.end = alert_dict['active_period'][0]['end']
newtimerange.start = alert_dict['active_period'][0]['start']
for informed in alert_dict['informed_entity']:
newentityselector = newalert.informed_entity.add()
newentityselector.route_id = informed['route_id']
newentityselector.route_type = int(informed['route_type'])
newentityselector.stop_id = informed['stop_id']
print(alert_dict['description_text'])
newdescription = newalert.header_text
newdescription = alert_dict['description_text']
newalert.cause = alert_dict['cause']
newalert.effect = alert_dict['effect']
pb_feed = newfeed.SerializeToString()
with open("servicealerts.pb", 'wb') as fout:
fout.write(pb_feed)
The frustrating part is that we don't receive any sort of error message. Everything appears to run properly but the resulting pb file doesn't contain the new header_text or description_text items.
We are able to read the pb file using the following code:
feed = gtfs_realtime_pb2.FeedMessage()
response = open("servicealerts.pb")
feed.ParseFromString(response.read())
print(feed)
We truly appreciate any help that anyone can offer in pointing us in the right direction of figuring this out.
I was able to find the answer. This Python Notebook showed that by properly formatting the dictionary the PB could be generated with a few of lines of code.
from google.transit import gtfs_realtime_pb2
from google.protobuf.json_format import MessageToDict
newfeed = gtfs_realtime_pb2.FeedMessage()
ParseDict(alerts_dict, newfeed)
pb_feed = newfeed.SerializeToString()
with open("servicealerts.pb", 'wb') as fout:
fout.write(pb_feed)
All I had to do was format by dictionary properly.
if ALERT_GROUP_ID not in entity_dict.keys():
entity_dict[ALERT_GROUP_ID] = {"id": ALERT_GROUP_ID,
"alert":{
"active_period": [{
"start": int(START_TIME),
"end": int(END_TIME)
}],
"cause": cause_dict.get(CAUSE, ""),
"effect": effect_dict.get(EFFECT),
"url": {
"translation": [{
"text": URL,
"language": "en"
}]
},
"header_text": {
"translation": [{
"text": HEADER_TEXT,
"language": "en"
}]
},
"informed_entity": [{
'route_id': ROUTE_ID,
'route_type': ROUTE_TYPE,
'trip': TRIP,
'stop_id': STOP_ID
}],
"description_text": {
"translation": [{
"text": "Stop closed temporarily",
"language": "en"
}]
},
},
}
# print(entity_dict[ALERT_GROUP_ID]["alert"]['informed_entity'])
else:
entity_dict[ALERT_GROUP_ID]["alert"]['informed_entity'].append({
'route_id': ROUTE_ID,
'route_type': ROUTE_TYPE,
'trip': TRIP,
'stop_id': STOP_ID
})

Cannot use Nested VariableOperators.mapItemsOf in Spring Data MongoDb

I'm forced to use the aggregation framework and the project operation of Spring Data MongoDb.
What I'd like to do is creating an array of object as a result of a project operation.
Considering this intermediate aggregation result:
{
"processes": [
{
"id": "101a",
"assignees": [
{
"id": "201a",
"username": "carl93"
},
{
"id": "202a",
"username": "susan"
}
]
},
{
"id": "101b",
"assignees": [
{
"id": "201a",
"username": "carl93"
},
{
"id": "202a",
"username": "susan"
}
]
}
]
}
I'm trying to get for each process, all the assignee usernames and ids. Hence, what I want to obtain is something like this:
[
{
"results": [
{
"id": "201a",
"value": "carl93",
"parentObjectId": "101a"
},
{
"id": "202a",
"value": "susan",
"parentObjectId": "101a"
},
{
"id": "201a",
"value": "carl93",
"parentObjectId": "101b"
},
{
"id": "202a",
"value": "susan",
"parentObjectId": "101b"
}
]
}
]
To reach this goal I'm using 2 nested VariableOperators.mapItemsOf obtaining:
org.springframework.data.mapping.MappingException: Cannot convert [Document{{id= 201a, value= carl93, parentObjectId= 101a}}, Document{{id= 202a, value = susan, parentObjectId= 101a}}]
of type class java.util.ArrayList into an instance of class java.lang.Object!
Implement a custom Converter<class java.util.ArrayList, class java.lang.Object> and register it with the CustomConversions.
Here's the code that I'm currently using:
new ProjectionOperation().and(
VariableOperators.mapItemsOf("processes")
.as("pr")
.andApply(
VariableOperators.mapItemsOf("$pr.ownership.assignees")
.as("ass")
.andApply(aggregationOperationContext -> {
Document document = new Document();
document.append("id", "$$ass.id");
document.append("value", "$$ass.username");
document.append("parentObjectId", "$$pr.id");
return document;
})
)
).as("results");
The code produces this:
[
[
{
"id": "201a",
"value": "carl93",
"parentObjectId": "101a"
},
{
"id": "202a",
"value": "susan",
"parentObjectId": "101a"
}
],
[
{
"id": "201a",
"value": "carl93",
"parentObjectId": "101b"
},
{
"id": "202a",
"value": "susan",
"parentObjectId": "101b"
}
]
]
As you can see there are 2 nested arrays, [[],[]]. This is the reason why the exception is thrown.
Nevertheless what I want to obtain is just one array, adding all the objects in it (possibly without duplicates or null values). I've tried the addToSet operator and other aggregtion operators, without any success.
Use $reduce with $concatArrays to join the arrays.
new ProjectionOperation().and(
ArrayOperators.arrayOf("processes")
.reduce(ArrayOperators.ConcatArrays.arrayOf("$$value").concat(
VariableOperators.mapItemsOf("$$this.ownership.assignees")
.as("ass")
.andApply(aggregationOperationContext -> {
Document document = new Document();
document.append("id", "$$ass.id");
document.append("value", "$$ass.username");
document.append("parentObjectId", "$$this.id");
return document;
})
)).startingWith(Arrays.asList())
).as("results");

Create index from raw JSON using elastic4s

I need to create an index that has some context completion suggester mappings as in (https://www.elastic.co/guide/en/elasticsearch/reference/current/suggester-context.html). As I see in https://github.com/sksamuel/elastic4s/issues/452 this doesn't seem to be supported by the DSL.
So, it would be nice to create an index from a raw JSON string (similar to raw queries). Is it possible to achieve this?
Considering that you have the JSON mapping in a variable rawMapping like this:
val rawMapping =
"""{
"service": {
"properties": {
"name": {
"type" : "string"
},
"tag": {
"type" : "string"
},
"suggest_field": {
"type": "completion",
"context": {
"color": {
"type": "category",
"path": "color_field",
"default": ["red", "green", "blue"]
},
"location": {
"type": "geo",
"precision": "5m",
"neighbors": true,
"default": "u33"
}
}
}
}
}
}"""
You can create the index using the raw mapping like this:
client.execute {
create index "services" source rawMapping
}