Extract values from JSON file in Talend - talend

I have json file like this:
{"2020-04-28":
{
"37,N7L2H4,Carry,CHOPA,PLX":
{
"inter_results": {
"inter_mark": "GITA"
,"down": null
,"up": null
,"wiki": {"included": "false", "options": ["RRR", "SSS","HHH"]
}}
,"38, N5L2J4, HURT, SERRA, PZT": {
"inter_results": {
"inter_mark": "MARI"
,"down": "250"
,"up": "1250"
,"wiki": {"included": "true", "options": ["XXX", "YYY"]
}}
,"39, N4L2H4, HIBA, FILA, PFG": {
"inter_results": {
"inter_mark": "HILO"
,"down": "100"
,"up": "250"
,"wiki": {"included": "true", "options": ["RTG", "VTH","HJI","JKL"]
}}
}
}
And i want extract the Values N7L2H4,N5L2J4,N4L2H4 from this json file using tFileInputJson with jsonPath.

Using the native components of Talend, this is hard to achieve. You can do it with some java code but it's not elegant.
Here's a solution using json components suite from Talend Exchange which you can download here
The component tJSONDocTraverseFields allows you to list all fields, paths and values of your json.
It gives this output:
$.2020-04-28.37,N7L2H4,Carry,CHOPA,PLX.inter_results.inter_mark|4|inter_mark|"GITA"|false|21
$.2020-04-28.37,N7L2H4,Carry,CHOPA,PLX.inter_results.down|4|down|null|false|21
$.2020-04-28.37,N7L2H4,Carry,CHOPA,PLX.inter_results.up|4|up|null|false|21
$.2020-04-28.37,N7L2H4,Carry,CHOPA,PLX.inter_results.wiki.included|5|included|"false"|false|21
$.2020-04-28.37,N7L2H4,Carry,CHOPA,PLX.inter_results.wiki.options[0]|6|options|"RRR"|true|21
$.2020-04-28.37,N7L2H4,Carry,CHOPA,PLX.inter_results.wiki.options[1]|6|options|"SSS"|true|21
$.2020-04-28.37,N7L2H4,Carry,CHOPA,PLX.inter_results.wiki.options[2]|6|options|"HHH"|true|21
$.2020-04-28.38, N5L2J4, HURT, SERRA, PZT.inter_results.inter_mark|4|inter_mark|"MARI"|false|21
$.2020-04-28.38, N5L2J4, HURT, SERRA, PZT.inter_results.down|4|down|"250"|false|21
$.2020-04-28.38, N5L2J4, HURT, SERRA, PZT.inter_results.up|4|up|"1250"|false|21
You can then parse the json path to get the values you want:
I split the path by "." to get the field "37,N7L2H4,Carry,CHOPA,PLX", then split the result again on "," and get the first value.
tJSONDocOpen allows you to initialize your json file, it acts as a connection. You then select it in tJSONDocTraverseFields.

Related

How to map a json string into object type in sink transformation

Using Azure Data Factory and a data transformation flow. I have a csv that contains a column with a json object string, below an example including the header:
"Id","Name","Timestamp","Value","Metadata"
"99c9347ab7c34733a4fe0623e1496ffd","data1","2021-03-18 05:53:00.0000000","0","{""unit"":""%""}"
"99c9347ab7c34733a4fe0623e1496ffd","data1","2021-03-19 05:53:00.0000000","4","{""jobName"":""RecipeB""}"
"99c9347ab7c34733a4fe0623e1496ffd","data1","2021-03-16 02:12:30.0000000","state","{""jobEndState"":""negative""}"
"99c9347ab7c34733a4fe0623e1496ffd","data1","2021-03-19 06:33:00.0000000","23","{""unit"":""kg""}"
Want to store the data in a json like this:
{
"id": "99c9347ab7c34733a4fe0623e1496ffd",
"name": "data1",
"values": [
{
"timestamp": "2021-03-18 05:53:00.0000000",
"value": "0",
"metadata": {
"unit": "%"
}
},
{
"timestamp": "2021-03-19 05:53:00.0000000",
"value": "4",
"metadata": {
"jobName": "RecipeB"
}
}
....
]
}
The challenge is that metadata has dynamic content, meaning, that it will be always a json object but the content can vary. Therefore I cannot define a schema. Currently the column "metadata" on the sink schema is defined as object, but whenever I run the transformation I run into an exception:
Conversion from ArrayType(StructType(StructField(timestamp,StringType,false),
StructField(value,StringType,false), StructField(metadata,StringType,false)),true) to ArrayType(StructType(StructField(timestamp,StringType,true),
StructField(value,StringType,true), StructField(metadata,StructType(StructField(,StringType,true)),true)),false) not defined
We can get the output you expected, we need the expression to get the object Metadata.value.
Please ref my steps, here's my source:
Derived column expressions, create a JSON schema to convert the data:
#(id=Id,
name=Name,
values=#(timestamp=Timestamp,
value=Value,
metadata=#(unit=substring(split(Metadata,':')[2], 3, length(split(Metadata,':')[2])-6))))
Sink mapping and output data preview:
The key is that your matadata value is an object and may have different schema and content, may be 'value' or other key. We only can manually build the schema, it doesn't support expression. That's the limit.
We can't achieve that within Data Factory.
HTH.

Passing multiple json as a payload for a request in Gatling

sample json payload:
'{
"Stub1": "XXXXX",
"Stub2": "XXXXX-3047-4ed3-b73b-83fbcc0c2aa9",
"Code": "CodeX",
"people": [
{
"ID": "XXXXX-6425-EA11-A94A-A08CFDCA6C02"
"customer": {
"Id": 173,
"Account": 275,
"AFile": "tel"
},
"products": [
{
"product": 1,
"type": "A",
"stub1": "XXXXX-42E1-4A13-8190-20C2DE39C0A5",
"Stub2": "XXXXX-FC4F-41AB-92E7-A408E7F4C632",
"stub3": "XXXXX-A2B4-4ADF-96C5-8F3CDCF5821D",
"Stub4": "XXXXX-1948-4B3C-987F-B5EC4D6C2824"
},
{
"product": 2,
"type": "B",
"stub1": "XXXXX-42E1-4A13-8190-20C2DE39C0A5",
"Stub2": "XXXXX-FC4F-41AB-92E7-A408E7F4C632",
"stub3": "XXXXX-A2B4-4ADF-96C5-8F3CDCF5821D",
"Stub4": "XXXXX-1948-4B3C-987F-B5EC4D6C2824"
}
]
}
]
}'
I am working on a POST call. Is there any way to feed multiple json files as a payload in Gatling. I am using body(RawFileBody("file.json")) as json here.
This works fine for a single json file. I want to check response for multiple json files. Is there any way we can parametrize this and get response against multiple json files.
As far as I can see, there's a couple of ways you could do this.
Use a JSON feeder (https://gatling.io/docs/current/session/feeder#json-feeders). This would need your multiple JSON files to be in a single file, with the root element being a JSON array. Essentially you'd put the JSON objects you have inside an array inside a single JSON file
Create a Scala Iterator and have the names of the JSON files you're going to use in it. e.g:
val fileNames = Iterator("file1.json", "file2.json)
// and later, in your scenario
body(RawFileBody(fileNames.next())
Note that this method cannot be used across users, as the iterator will initialize separately for each user. You'd have to use repeat or something similar to send multiple files as a single user.
You could do something similar by maintaining the file names as a list inside Gatling's session variable, but this session would still not be shared between different users you inject into your scenario.

GraphQL schema from a single object JSON file in Gatsby

So I'd like to query a single JSON file which is not an array from Gatsby's GraphQL but I don't really know how to do it.
As far as I understand gatsby-transformer-json docs - it only supports loading arrays and have them accessible via allFileNameJson schema.
My gatsby-config plugins (only the necessary ones for this question):
{
resolve: 'gatsby-source-filesystem',
options: {
name: 'data',
path: `${__dirname}/src/data`
}
},
'gatsby-transformer-json'
And then let's say in src/data i have a something.json file, like this:
{
"key": "value"
}
Now I'd like to query the data from something.json file, but there is no somethingJson schema i can query (tried with Gatsby's GraphiQL).
Could someone point out what am I doing wrong or how can I solve this problem?
Ok, so it is possible to query single-object files as long as they have a parent (folder).
Let's take these parameters:
gatsby-source-filesystem configured to src/data
test.json file positioned at src/data/test.json with { "key": "value" } content in it
Now as the test.json file actually has a parent (data folder) - you can query the fields from test.json like this:
{
dataJson {
key
}
}
But putting those directly in your root folder is a bad practice, because when you will store another json file, i.e. secondtest.json with { "key2": "value2" } content, and will query it with the same query as above, you will get data only from a single node (not sure if it takes first, or last encountered node),
So, the perfect solution for this case is to have your single-object json files stored in separate folders with just one json per folder.
For example you have some "About Me" data:
create a about folder in your src/data
create an index.json file with i.e. { "name": "John" }
query your data
Like this:
{
aboutJson {
name
}
}
That's it.

Talend read JSON data from tRESTRequest

I am trying to learn Talend.
Scenario:
I have to create a REST endpoint (i am using tRESTRequest) which takes a POST request at http://localhost:8086/emp/create and accepts below json and prints each json field and sends a sample json response containing only name field.
How can I do so ?
How to read the json data into a java component like tJava?
Structure:
{
"emp" :
[
{
"id":"123",
"name": "testemp1"
},
{
"id":"456",
"name": "testemp2"
}
]
}
Expected Response:
{
"emp" :
[
{
"name": "testemp1"
},
{
"name": "testemp2"
}
]
}
I am using tRESTRequest -> tExtractJSONFields -> tRESTResponse.
For looping on the right elements and parsing the contents, please see my answer JSON Deserialization on Talend
I did not understand the second question. When deserializing JSON, the data will already be available in the usual row format for processing further. Beginner tutorials will show you the standard structure. The component tJava is - of course - an exception to that rule. Handling data is different in this component and not neccessarily row based.
Talend has an excellent knowledge base for components and examples, see https://help.talend.com/

Retrieve UserName from ServiceNow

I am able to retrieve records for a particular Incident ID using Invoke-RestMethod. However, while retrieving the data, values like Resolved To, Updated By, etc. get populated by a sysid.
Resolved By comes in this format:
https<!>://devinstance.servicenow.com/api/sysid, value= sysid
I would like to view the username instead of the sysid.
The 'User ID' (user_name) isn't on the Incident, it's on the sys_user table, so you'll have to dot-walk to it.
If you're using the table API, you'll need to specify a dot-walked field to return, using the sysparm_fields query parameter.
This is no problem, just specify your endpoint like this:
$uri = "https://YOUR_INSTANCE.service-now.com/api/now/table/incident?sysparm_query=number%3DINC0000001&sysparm_fields=resolved_by.user_name"
I've specified a query for a specific incident number is requested, but you can replace that with whatever your query is.The important part is sysparm_fields=resolved_by.user_name. You'll want to specify any other fields you need here, as well.
The JSON I get as a result of running this API call, is the following:
{
"result": [
{
"resolved_by.user_name": "admin"
}
]
}
Note the element name: "resolved_by.user_name".
Another option for doing this, would be to tell the API to return both display, and actual values by specifying the sysparm_display_value parameter and setting it to all to return both sys_id and display value, or just true to return only display values.
Your URI would then look like this:
https://dev12567.service-now.com/api/now/table/incident?sysparm_query=resolved_byISNOTEMPTY%5Enumber%3DINC0000001&sysparm_display_value=all
And your JSON would contain the following:
"number": {
"display_value": "INC0000001",
"value": "INC0000001"
},
"resolved_by": {
"display_value": "System Administrator",
"link": "https://YOUR_INSTANCE.service-now.com/api/now/table/sys_user/6816f79cc0a8016401c5a33be04be441",
"value": "6816f79cc0a8016401c5a33be04be441"
},
"sys_updated_by": {
"display_value": "admin",
"value": "admin"
},
This would be accessed by:
answer.result[n].resolved_by.display_value