Data not flowing from tfileinputdelimited - talend

I have a tfileinputdelimited in my Job. I am just trying to pass the data from this to a tfilterrow. The data is coming from repository i.e metadata. But only 4 rows are going to tfilterrow whereas attaching a tlogrow to tfileinputdelimited it is showing for a input string. I have not declared any files as string.
What might be the cause for data not flowing from tfileinputdelimited?
ITEM_ID,ITEM_NAME,ITEM_DESC,PRICE_NUMBER,WHOLESALE_COST,DISCONTINUED_FLAG,MANUFACTURER_ID,DISTRIBUTOR_ID
1313,'Regulator System','Air Regulators',250.00,150.00,'0',100,2012
I declared it as Input, String, String, float, float, String, int, int.

Related

Convert String to Int in Azure data factory Derived column expression

I've created a dataflow task in azure data factory and used derived column transformation. One of the source derived column value is '678396' which is extracted through Substring function and datatype "String" by default. I want to convert it into "Integer" because my target column datatype is "Integer".
I've to converted the column in this expression:
ToInteger(Substring(Column_1,1,8))
Please help me with correct expression.
Kind regards,
Rakesh
You don't need to build the expression. If you column data are all like int string "678396", or the output of Substring(Column_1,1,8) are int String
Data Factory can convert the int string to integer data type directly from source to sink. We don't need convert again.
Make sure you set column mapping correctly in sink settings. All things would works well.
Update:
This my csv dataset:
You can choose the Quote character to singe quote, then could solve the problem. See the source data preview in Copy active and Data Flow:
Copy active source:
Data Flow overview:
In data flow, we will get the alert like you said comment, we could ignore it and debug the data flow directly:
HTH.
you don't even need to substruct quotes '', as ToInteger function can convert numbers as string type

ADF copy task field type boolean as lowercase

In ADF i have a copy task that copies data from JSON to Delimited text, i get the result as
A | B | C
"name"|False|"description"
Json record is like
{"A":"name","B":"false","C":"description"}
Excepted result is as below
A | B | C
"name"|false|"description"
The bool value have to be in lowercase in the resulting Delimited text file, what am i missing?
I can reproduce this. The reason is you are converting the string to the ADF dataytpe "Boolean" which for some reason renders the values in Proper case.
Do you really have a receiving process which is case-sensitive? If you need to maintain the case of the source value simply remove the mapping, ie
If you do need some kind of custom mapping, then simply change the mapping data type to String and not Boolean.
UPDATE after new JSON provided
OK, so your first json sample has the "false" value in quotes so is treated as a string. In your second example, your json "true" is not in quotes so is a genuine json boolean value. ADF is auto-detecting this at run time and it seems like it can not be over-ridden as far as I can tell. Happy to be corrected. As an alternative, consider altering your original json to a string, as per you original example OR copying the file to Blob Store or Azure Data Lake, runniing some transform on it (eg Databricks) and then outputting the file. Alternately consider Mapping Data Flows.

Topic data format in Kafka for KSQL operations

I just started using ksql, when I do print topic from beginning I get data in below format.
rowtime: 4/12/20, 9:00:05 AM MDT, key: {"messageId":null}, value: {"WHS":[{"Character Set":"UTF-8","action":"finished","Update-Date-Time":"2020-04-11 09:00:02:25","Number":0,"Abbr":"","Name":"","Name2":"","Country-Code":"","Addr-1":"","Addr-2":"","Addr-3":"","Addr-4":"","City":"","State":""}]}
But all the examples in KSQL have the data in below format
{"ROWTIME":1537436551210,"ROWKEY":"3375","rating_id":3375,"user_id":2,"stars":3,"route_id":6972,"rating_time":1537436551210,"channel":"web","message":"airport refurb looks great, will fly outta here more!"}
so I'm not able to perform any operations, the format is showing as
Key format: JSON or SESSION(KAFKA_STRING) or HOPPING(KAFKA_STRING) or TUMBLING(KAFKA_STRING) or KAFKA_STRING
Value format: JSON or KAFKA_STRING
on my topic. How can I modify the data into the specific format?
Thanks
ksqlDB does not yet support JSON message keys, (See the tracking Github issue).
However, you can still access the data, both in the key and the value. The JSON key is just a string after all!
The value, when reformatted, looks like this:
{
"WHS":[
{
"Character Set":"UTF-8",
"action":"finished",
"Update-Date-Time":"2020-04-11 09:00:02:25",
"Number":0,
"Abbr":"",
"Name":"",
"Name2":"",
"Country-Code":"",
"Addr-1":"",
"Addr-2":"",
"Addr-3":"",
"Addr-4":"",
"City":"",
"State":""
}
]
}
Which, assuming all rows share a common format, ksqlDB can easily handle.
To import your stream you should be able to run something like this:
-- assuming v0.9 of Kafka
create stream stuff
(
ROWKEY STRING KEY,
WHS ARRAY<
STRUCT<
`Character Set` STRING,
action STRING,
`Update-Date-Time` STRING,
Number STRING,
... etc
>
>
)
WITH (kafka_topic='?', value_format='JSON');
The value column WHS is an array of structs, (where the will be only one element), and the struct defines all the fields you need to access. Note, some field names needed quoting as they contained invalid characters, e.g. spaces and dashes.

How to execute hql files with multiple SQL queries per single file?

I have hql file which have a lot of hive queries and I would like to execute the whole file using Spark SQL.
This is what I have tried.
val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc)
Usually to execute individual queries we do it this way:
sqlContext.sql("SELECT * from table")
However, when we have hql file with hundreds of queries, I use to do like this.
import scala.io.Source
val filename = "/path/to/file/filename.hql"
for (line <- Source.fromFile(filename).getLines) {
sqlContext.sql(line)
}
However, I get an error saying:
NoViableAltException
This is the top of the file.
DROP TABLE dat_app_12.12_app_htf;
CREATE EXTERNAL TABLE dat_app_12.12_app_htf(stc string,
ftg string,
product_type string,
prod_number string,
prod_ident_number string,
prod_family string,
frst_reg_date date, gth_name string,
address string,
tel string,
maker_name string) ROW format serde 'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
stored AS inputformat 'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
outputformat 'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
LOCATION
'file_location';
When the queries are multi-line queries like the above one, it doesn't work.
However, when I format the queries and put all the lines into one line, it works.
CREATE EXTERNAL TABLE dat_app_12.12_app_htf(stc string, ftg string, product_type string, prod_number string, prod_ident_number string, prod_family string, frst_reg_date date, gth_name string, address string, tel string, maker_name string) ROW format serde 'org.apache.hadoop.hive.ql.io.orc.OrcSerde' stored AS inputformat 'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' outputformat 'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat' LOCATION 'file_location';
But I have thousands of lines like this. What is the proper way to do it.
Can anyone help in getting this solved.
tl;dr I don't think it's possible.
Spark SQL uses AstBuilder as the ANTLR-based SQL parser and accepts a single SQL statement at a time (see SqlBase.g4 for the full coverage of all supported SQL queries).
With that said, the only way to do it is to parse the multi-query input file yourself before calling Spark SQL's sqlContext.sql (or spark.sql as of Spark 2.0).
You could rely on empty lines as separators perhaps, but that depends on how input files are structured (and they could easily use semicolon instead).
In your particular case I've noticed that the end markers are actually semicolons.
// one query that ends with semicolon
DROP TABLE dat_app_12.12_app_htf;
// another query that also ends with semicolon
CREATE EXTERNAL TABLE dat_app_12.12_app_htf(stc string,
ftg string,
product_type string,
prod_number string,
prod_ident_number string,
prod_family string,
frst_reg_date date, gth_name string,
address string,
tel string,
maker_name string) ROW format serde 'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
stored AS inputformat 'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
outputformat 'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
LOCATION
'file_location';
If that's consistent, you could parse the file line by line (as you do with for expression) and read until ; is found. Multiple-line SQL queries are fine for Spark SQL and so you should have your solution.
I had a similar use case in a project and simply gave up trying to parse all the possible ways people write SQLs before I could hand it over to Spark.
Hey Did you try with this command
spark-sql –master yarn-client –conf spark.ui.port=port -i /hadoop/sbscript/hql_for_dml.hql

BizTalk and DB2 CLOB

Has anyone had experience dealing with DB2 stored procedures which take a CLOB input parameter and calling that stored procedure from BizTalk?
I've tried changing the schema type to string, base64binary, hexbinary, byte, but no matter what I get this error:
Error details: The parameter value for parameter 1 could not be converted to a native data type. Parameter Name: P_EML_BODY, Data Type: Long strings of input text<br> More long strings of input text <br>More long strings of input text, Value : CharForBit
It might be the way the parameters are being created and in what order. Take a look here. Are any of them null or empty?