Creating a JSON structure in PDI without blocks - pentaho-spoon

I'm trying to get a simple JSON output value in PDI from a field that was defined in an earlier step.
The field is id_trans, and I want the result to look like {"id_trans":"1A"} when id_trans value is 1A.
However, when using the JSON Output step and setting the json bloc name to empty, I get this: {"":[{"id_trans":"1A"}]}, which is normal given that the JSON Ouptut step generates json blocks, as specified in the doc.
How can I get rid of the bloc ( i.e. []) structure in a simple manner? I thought of using an external python script, but I would rather use steps in PDI.

You can easily do that with another JSON Input step. Just specify your output value from JSON Output step as Select field and under the tab fields, specify a fieldname and data[0] as Path.

Related

Parse json column value in CSV file

I have a CSV file with the below format:
where column c3 is of JSON format. I need to parse that Json value to different columns and generate a CSV file.
I am using parse functionlaity :
But I am getting the below error:
Can someone help me identify whether I am doing something wrong
Looks like you have selected Document per line under JSON settings in Parse transformation. As your file contains single JSON, change the settings to select Single document. (By default, document form is selected as Document per line).
I tried the sample by selecting Document per line and got same error as you. After changing the settings to Single document, I can preview data without errors.
Changed the JSON settings:
Make sure the format is a valid JSON.
The format of c3 column in your file is not exactly in JSON format. When you preview data, the value is not parsed and shown as NULL/blank data in parsed columns.
I tried parsing with valid JSON format and it’s working fine.
Example of JSON data format looks like:
{"Name": "None", "status": "None", "processdby": 2}
Reference: Parse transformation in mapping data flow

How to Validate Data issue for fixed length file in Azure Data Factory

I am reading a fixed-width file in mapping Data Flow and loading it to the table. I want to validate the fields, datatype, lengths of the field that I am extracting in the Derived column using substring.
How to Achieve this in ADF
Use a Conditional Split and add a condition for each property of the field that you wish to test for. For data type checking, we literally just landed new isInteger(), isString() ... functions today. The docs are still in the printing press, but you'll find them in the expression builder. For length use length().

ADF copy task field type boolean as lowercase

In ADF i have a copy task that copies data from JSON to Delimited text, i get the result as
A | B | C
"name"|False|"description"
Json record is like
{"A":"name","B":"false","C":"description"}
Excepted result is as below
A | B | C
"name"|false|"description"
The bool value have to be in lowercase in the resulting Delimited text file, what am i missing?
I can reproduce this. The reason is you are converting the string to the ADF dataytpe "Boolean" which for some reason renders the values in Proper case.
Do you really have a receiving process which is case-sensitive? If you need to maintain the case of the source value simply remove the mapping, ie
If you do need some kind of custom mapping, then simply change the mapping data type to String and not Boolean.
UPDATE after new JSON provided
OK, so your first json sample has the "false" value in quotes so is treated as a string. In your second example, your json "true" is not in quotes so is a genuine json boolean value. ADF is auto-detecting this at run time and it seems like it can not be over-ridden as far as I can tell. Happy to be corrected. As an alternative, consider altering your original json to a string, as per you original example OR copying the file to Blob Store or Azure Data Lake, runniing some transform on it (eg Databricks) and then outputting the file. Alternately consider Mapping Data Flows.

Is it possible to Group By a field extracted from JSON input in a Siddhi Query?

I currently have an stream, with 1 string attribute that contains a Json event.
This stream receives different events, which I want to apply Json path expressions so I can use those attributes on filters and functions.
JsonPath extractors work like a charm on filters and selectors, unfortunately, I am not being able to use them for the 'Group By' part.
I am actually doing it in an embedded Siddhi App with siddhi-execution-json extension added manually, but for the discussion, so everybody can
easily check and test it, I will paste an example app that works on WSO2 Stream Processor.
The objective looks like the following App:
#App:name("Group_by_json_attribute")
define stream JsonStream(json string);
#sink(type='log')
define stream LogStream(myField string, count long);
#info(name='query1')
from JsonStream#window.time(10 sec)
select json:getString(json, '$.myField') as myField, count() as count
group by myField having count > 1
insert into LogStream;
and it can accept the following events:
{"myField": "my_value"}
However, this query will raise the error:
Cannot find attribute type as 'myField' does not exist in 'JsonStream'; define stream JsonStream(json string)
I have also tried to use directly the Json extractor at 'Group by':
group by json:getString(json, '$.myField') as myField having count > 1
However the error now is:
mismatched input ':' expecting {',', ORDER, LIMIT, OFFSET, HAVING, INSERT, DELETE, UPDATE, RETURN, OUTPUT}
which seems to not be expecting to use an extension here
I am just wondering, if it is possible to group by attributes not directly defined in the input stream. In this case is a field extracted from a JSON object, but it could be any other function that generates another attribute.
I am also using versions from maven central repository
Siddhi: io.siddhi:siddhi-core:5.0.1
siddhi-execution-json: io.siddhi.extension.execution.json:siddhi-execution-json:2.0.1
(Edit) Clarification
The objective is, to use attributes not directly defined in the Stream, to be used on the Group By.
The reason why is, I currently have an embedded app which defines the whole set of input streams coming from external sources formatted as JSON, and there are also a set of output streams to inform external components when a query matches.
This app allows users to create custom queries on this set of predefined Streams, but they are not able to create Streams by their own.
Many thanks!
It seems we are expecting the group by fields from the query input stream, in this case, JsonStream. Use another query before this for extraction and the aggregation and filtering in the following query,
#App:name("Group_by_json_attribute")
define stream JsonStream(json string);
#sink(type='log')
define stream LogStream(myField string, count long);
#info(name='extract_stream')
from JsonStream
select json:getString(json, '$.myField') as myField
insert into ExtractedStream;
#info(name='query1')
from ExtractedStream#window.time(10 sec)
select myField, count() as count
group by myField
having count > 1
insert into LogStream;

talend - put a logic before tOutput

I have a talend job that have the following components:
- 1 database input component that fetched data from the database
- 1 output component that writes the fetched data into flat file.
Now, I have a scenario wherein, from two fields on the fetched data, only one should be put on in the output based on some if else logic.
Can anyone assist me in this matter? Is it using tMap?
Your if/else logic could be put in the tMap; as a ternary operator in the output of your tMap.
condition?resultifOK:resultifKO
For example, you want to insert in your file the value of ColumnA if it is "MY CONDITION", otherwise you insert value of ColumnB :
You'll have, directly in the output row of tMap:
"MY CONDITION".equals(row1.ColumnA)?row1.ColumnA:row1.ColumnB
You can't directly use a if/else in tMap. If you have several conditions , consider using a Routine instead of multiple ternary operators.