Parse json column value in CSV file - azure-data-factory

I have a CSV file with the below format:
where column c3 is of JSON format. I need to parse that Json value to different columns and generate a CSV file.
I am using parse functionlaity :
But I am getting the below error:
Can someone help me identify whether I am doing something wrong

Looks like you have selected Document per line under JSON settings in Parse transformation. As your file contains single JSON, change the settings to select Single document. (By default, document form is selected as Document per line).
I tried the sample by selecting Document per line and got same error as you. After changing the settings to Single document, I can preview data without errors.
Changed the JSON settings:
Make sure the format is a valid JSON.
The format of c3 column in your file is not exactly in JSON format. When you preview data, the value is not parsed and shown as NULL/blank data in parsed columns.
I tried parsing with valid JSON format and it’s working fine.
Example of JSON data format looks like:
{"Name": "None", "status": "None", "processdby": 2}
Reference: Parse transformation in mapping data flow

Related

Conversion from XML to Json removes 0 in Azure Data Factory copy activity

I am converting XML files to Json(gzip compression) using Azure Data Factory copy activity.
However , I observe that in the XML file I have the values stored as 0123456789. However , when this is converted to Json it is saved as "value" : 123456789. Without 0.
I would like to keep the Json values as-is from the XML . Please provide suggestions for the same.
PS: I can't use data flow and I can't modifiy the xml file.
The solution was to uncheck the detect data type in the copy activity

Google Cloud Data Fusion is appending a column to original data

When I am loading data encrypted data from GCS source to GCS sink there one additional column getting added.
Original data
Employee ID,Employee First Name,Employee Last Name,Employee Joining Date,Employee location
1,Vinay,Argekar,01/01/2017,India
2,Thirukkumaran,Haridass,02/02/2017,USA
3,David,Wu,03/04/2000,Canada
4,Vinod,Kumar,04/02/2002,India
5,Joshua,Abraham,04/15/2010,France
6,Allaudin,Dastigar,09/24/2012,UK
7,Senthil,Kumar,08/15/2009,Germany
8,Sudha,Narayanan,12/14/2016,India
9,Ravi,Prasad,11/11/2011,Costa Rica
Data came to file after running pipeline
0,Employee ID,Employee First Name,Employee Last Name,Employee Joining Date,Employee location
91,1,Vinay,Argekar,01/01/2017,India
124,2,Thirukkumaran,Haridass,02/02/2017,US
164,3,David,Wu,03/04/2000,Canada
193,4,Vinod,Kumar,04/02/2002,India
224,5,Joshua,Abraham,04/15/2010,France
259,6,Allaudin,Dastigar,09/24/2012,UK
293,7,Senthil,Kumar,08/15/2009,Germany
328,8,Sudha,Narayanan,12/14/2016,India
363,9,Ravi,Prasad,11/11/2011,Costa Rica
First column 0 was not present in original file
When you are configuring the GCS source, did you specify the Format to be CSV or was it left as Text? When the Format is Text, the output schema actually contains an offset, which is the first column that first column that you see in the output data. When you specify the format to be CSV, you have to specify the output schema of the file.

ADF copy task field type boolean as lowercase

In ADF i have a copy task that copies data from JSON to Delimited text, i get the result as
A | B | C
"name"|False|"description"
Json record is like
{"A":"name","B":"false","C":"description"}
Excepted result is as below
A | B | C
"name"|false|"description"
The bool value have to be in lowercase in the resulting Delimited text file, what am i missing?
I can reproduce this. The reason is you are converting the string to the ADF dataytpe "Boolean" which for some reason renders the values in Proper case.
Do you really have a receiving process which is case-sensitive? If you need to maintain the case of the source value simply remove the mapping, ie
If you do need some kind of custom mapping, then simply change the mapping data type to String and not Boolean.
UPDATE after new JSON provided
OK, so your first json sample has the "false" value in quotes so is treated as a string. In your second example, your json "true" is not in quotes so is a genuine json boolean value. ADF is auto-detecting this at run time and it seems like it can not be over-ridden as far as I can tell. Happy to be corrected. As an alternative, consider altering your original json to a string, as per you original example OR copying the file to Blob Store or Azure Data Lake, runniing some transform on it (eg Databricks) and then outputting the file. Alternately consider Mapping Data Flows.

Scala - Writing dataframe to a file as binary

I have a hive table of type parquet, with column Content storing various documents as base64 encoded.
Now, I need to read that column and write into a file in HDFS, so that the base64 column will be converted back to a document for each row.
val profileDF = sqlContext.read.parquet("/hdfspath/profiles/");
profileDF.registerTempTable("profiles")
val contentsDF = sqlContext.sql(" select unbase64(contents) as contents from profiles where file_name'file1'")
Now that contentDF is storing the binary format of a document as a row, which I need to write to a file. Tried different options but couldn't get back the dataframe content to a file.
Appreciate any help regarding this.
I would suggest save as parquet:
https://spark.apache.org/docs/1.6.3/api/java/org/apache/spark/sql/DataFrameWriter.html#parquet(java.lang.String)
Or convert to RDD and do save as object:
https://spark.apache.org/docs/1.6.3/api/java/org/apache/spark/rdd/RDD.html#saveAsObjectFile(java.lang.String)

Creating a JSON structure in PDI without blocks

I'm trying to get a simple JSON output value in PDI from a field that was defined in an earlier step.
The field is id_trans, and I want the result to look like {"id_trans":"1A"} when id_trans value is 1A.
However, when using the JSON Output step and setting the json bloc name to empty, I get this: {"":[{"id_trans":"1A"}]}, which is normal given that the JSON Ouptut step generates json blocks, as specified in the doc.
How can I get rid of the bloc ( i.e. []) structure in a simple manner? I thought of using an external python script, but I would rather use steps in PDI.
You can easily do that with another JSON Input step. Just specify your output value from JSON Output step as Select field and under the tab fields, specify a fieldname and data[0] as Path.