PDI Metadata Injection for JSON Input - metadata

Actually, I need to create a transformation which will read the JSON file from the system directory and rename the JSON fields(keys) based on the metadata inputs. Finally, write the modified JSON into '.js' file using JSON output step. This conversion must be done using the ETL Metadata Injection step.
Since I am new to Pentaho Data Integration tool, can anyone help me with the sample '.ktr' files for the above scenario.
Thanks in advance.

The same use case is on the Pentaho official documentation here, except it does it with Excel files rather than JSON objects.
Now, the Metadata Injection Step requires the development of a rather sophisticated machinery. And json, it is rather simple to build with a simple javascript. So, where do you get the "dictionary" (source field name -> target field name) from?

Related

Automatically map contents of REST JSON body as flat table in Data Flow

With the Copy Data transformation it is possible to retrieve data from a REST call (array with flat json objects, similar to Odata) and copy the contents to a flat table keeping the data types from the source but without the necessity to set the schema for that specific data.
When I try to recreate this with Data Flow, I can't get this to work. When I check the Data Preview of my Source I get a hierarchy with a body (with my odata like data) and a header. And if I send that to my sink (Avro) it will be saved in this same hierarchical structure (including the header). I know I can fix this manually by using a Select operation (body.column1, body.column2, etc.), but I want to make my Data Flow dynamic so I'm able to use it with multiple tables/endpoints.
So I receive it like this with my REST source:
link
And I want it to be like this at my Sink without hardcoding my schema:
link
The only work around I can come up with is retrieving the data using Copy Data, put it somewhere temporarily and then use my data flow to further transform the data. Is there a more easy way to do this? I cannot imagine that I'm the only one that has this issue.
Hopefully it's clear and somebody is able to help. Thank you very much in advance.
Data flow projection will get schema from API including body and header. Hence, when you use auto mapping everything going to be saved.
Below work arounds you can think of,
As you mentioned, using copy data first and then data flow to further transform.
Use select or derived column transformations and transform your data to get all column names and then finally use sink. For this you can opt with Column pattern matching syntax. So that one condition can be meet with multiple columns to transform.
Check below link to know about column pattern mappings.
https://learn.microsoft.com/en-us/azure/data-factory/concepts-data-flow-column-pattern

azure data flow converting boolean to string

I am trying to split large json files into smaller chunks using azure data flow. It splits the file but it changes column type boolean to string in output files. This same data flow will be used for different json files with different schemas therefore can't have any fixed schema mapping defined. I have to use auto mapping option. Please suggest how could I solve this issue of automatic datatype conversion? or Any other approach to split the file in the azure data factory?
Here, with my dataset I have tried to have a source as a json file and Sink as a json. If you have a fixed Schema and import it then the data flow works fine and could return a boolean value after running the pipeline.
But as you stated to have "same data flow will be used for different json files with different schemas therefore can't have any fixed schema mapping defined". Hence, you must have a Derived Column to explicitly covert all to boolean values.
Import Schema :
**In the sink you could inspect :
Data Preview :
In your ADF Data Flow Source transformation, click on the Projection Tab and click "Define default format". Set explicit values for Boolean True/False so that ADF can use that hint for proper data type inference for your data.

How can you get XML out of a Data Factory?

How can you get XML out of a Data Factory?
Great there is an XML format but this is only a source ... not a sink
So how can ADF write XML output?
I've looked around and there have been suggestions of using external services, but I'd like to keep it all "in Data Factory"
e.g. I could knock together an Azure Function, which could take JSON, and convert it to XML, using an example like so
But how can I then get ADF to, e.g. to this XML to a File System ?
No, this is not possible.
If you just want to copy, then use binary format is ok. But if you are trying to let ADF output XML, it is not possible.(As the document you mentioned told.)

Talend - Insert data into DB using Django REST API

I am trying utilise Django REST APIs to insert data into the database, instead of the direct write. I've been able to read JSON data using the tRESTClient component but I am not too sure about the insertion/POST. Could someone point me to the components (and relation) that I should use?
The current job that I have is mostly:
Read data from raw file -> tMap -> DB
and I wish to do something like:
Read data from raw file -> tMap -> (pass on data to REST endpoint via POST)
Used the tRestClient component after my tMap and I could see the records getting inserted into the DB but all of them are without any data. Strangely nowhere I was asked to specify the JASON tree. The number of records getting inserted are equal to rows being read from raw file so at least something is right. But I couldn't locate the menu/options to specify which data element read from the raw file should tag to which JASON element.
How do I specify the data to JSON mapping?
PS: I realise that this might not be the most efficient way to ingest data but that's what the business wants since it brings in an additional layer of control.

Conditional routing in Apache NiFi

I'm using NiFi to get data from an Oracle database and put some of this data in Kafka (using the processor PutKafka).
Example : if the attribute "id" contains "aaabb"
Is that possible in Apache NiFi? How can i do it?
This should definitely be possible, the flow might be something like this...
1) ExecuteSQL or QueryDatabaseTable to get the data from the database, these produce Avro
2) ConvertAvroToJson processor to convert the Avro to Json
3) EvaluateJsonPath to extract the id field into an attribute
4) RouteOnAttribute to route flow files where the id attribute contains "aaabbb"
5) PutKafka to deliver any of the matching results from RouteOnAttribute
To add on to Bryan's example flow, I wanted to point you to some great documentation that should help introduce you to Apache NiFi.
Firstly, I would suggest checking out the NiFi documentation. It is very good and should help a lot. In addition to providing details on each of the processors Bryan mentioned it also has general documentation for every type of user.
For a basic introduction to build a NiFi flow check out this video.
For example templates check out this repo. It's a has an excel file at it's root level which has a description and list of processors for each template.