KSQL : How can I change separator (comma) of DELIMITED FORMAT? - apache-kafka

I try to put a big number of messages (350M) to customer topic (source topic) with value format like this
10957402000||10965746672||2|2756561822|452048703649890|8984048701003649890
and then I make some streams and table on that topic, but the delimited format supported by ksql is just comma separator. I have some questions:
Is there any way to config ksql can understand my format? Or I have to convert to format default by ksql (comma separator)
From the original value from source topic like above, how this command can mapping value to table column? Or I have to convert format to json?
CREATE STREAM (sub_id BIGINT, contract_id BIGINT, cust_id BIGINT, account_id BIGINT,telecom_service_id BIGINT, isdn BIGINT, imsi BIGINT) \
WITH (KAFKA_TOPIC='customer', VALUE_FORMAT='DELIMITED');
Thanks you.

Edit 26 February 2021 ksqlDB now supports configurable delimiters - use the VALUE_DELIMITER (or KEY_DELIMITER) configuration option. For example:
CREATE STREAM (COL1 INT, COL2 VARCHAR)
WITH (KAFKA_TOPIC='test', VALUE_FORMAT='DELIMITED', VALUE_DELIMITER='TAB')
Original answer:
Currently KSQL only supports comma-separated for DELIMITED value format. So you'll need to use commas, or JSON, or Avro, for your source data.

Related

how to insert comma separated document to postgresql?

,0,1,2,3,4,5,6
0,1,1,1,76.0,2.99,2005-05-25 11:30:37.000,2019-04-11 18:11:50
1,2,1,1,573.0,0.99,2005-05-28 10:35:23.000,2019-04-11 18:11:50
2,3,1,1,1185.0,5.99,2005-06-15 00:54:12.000,2019-04-11 18:11:50
3,4,1,2,1422.0,0.99,2005-06-15 18:02:53.000,2019-04-11 18:11:50
4,5,1,2,1476.0,9.99,2005-06-15 21:08:46.000,2019-04-11 18:11:50
5,6,1,1,1725.0,4.99,2005-06-16 15:18:57.000,2019-04-11 18:11:50
Hello, I want to know how to insert this comma separated doc to postgresql.
document details are shown above.
I know timestamp need to insert like this '2019-04-11 18:11:50' . but I don't want to add '' for all of the timestamp values.
What I want to insert datatype in order, details are below.
integer, integer, smallint, smallint, integer, numeric, integer,
timestamp without time zone, timestamp without time zone
please let me know..
How can I import TIMESTAMP into POSTGRES without seconds? Coming from large CSV
https://www.youtube.com/watch?app=desktop&v=6Jf7eTkIaR4

Hive table text file upload special characters

I have a pipe delimited text file that I'm trying to create a Hive external table from. However in COL_2 for a particular value (d’Algerie) the ’ character is getting replaced by a box, i.e. d�Algerie. I've tried some of the online solutions such as:
ALTER TABLE pi_aarrepos_analysis.tbl_input_accounts SET SERDEPROPERTIES ('serialization.encoding'='GBK');
but I've had no luck in keeping the special character. Below is my code:
DROP TABLE IF EXISTS TABLE_NAME purge;
CREATE EXTERNAL TABLE IF NOT EXISTS TABLE_NAME
(
COL_1 STRING,
COL_2 STRING,
COL_3 STRING,
COL_4 STRING,
COL_5 STRING,
COL_6 STRING,
COL_7 STRING,
COL_8 STRING,
COL_9 STRING,
COL_10 STRING,
COL_11 STRING
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '|'
STORED AS TEXTFILE
LOCATION '/location/'
tblproperties ("skip.header.line.count"="1");
LOAD DATA INPATH '/location/' INTO TABLE TABLE_NAME;
Would anyone know any solutions to keeping the special characters in the table upload?
Edit:
Output of:
select "${system:file.encoding}";
gave me "UTF-8"
This isn't my answer. A colleague was able to help so this is her solution so credit goes to her but I think its important to know for future reference. The text file was ANSI-encoded so she suggested putting in the following line in between the create table command and the load data statement above:
ALTER TABLE TABLE_NAME SET SERDEPROPERTIES ('serialization.encoding'='CP1252');

How to store string spaces as null in numeric column

I want to get records from my local txt file to postgresql table.
I have created following table.
create table player_info
(
Name varchar(20),
City varchar(30),
State varchar(30),
DateOfTour date,
pay numeric(5),
flag char
)
And, my local txt file contains following data.
John|Mumbai| |20170203|55555|Y
David|Mumbai| |20170305| |N
Romcy|Mumbai| |20170405|55555|N
Gotry|Mumbai| |20170708| |Y
I am just executing this,
copy player_info (Name,
City,
State,
DateOfTour,
pay_id,
flag)
from local 'D:\sample_player_info.txt'
delimiter '|' null as ''
exceptions 'D:\Logs\player_info'
What I want is,
For my numeric column, If 3 spaces are there,
then I have to insert NULL as pay else whatever 5 digits numeric number.
pay is a column in my table whose datatype is numeric.
Is this correct or possible to do this ?
You cannot store strings in a numeric column, at all. 3 spaces is a string, so it cannot be stored in the column pay as that is defined as numeric.
A common approach to this conundrum is to create a staging table which uses less precise data types in the column definitions. Import the source data into the staging table. Then process that data so that it can be reliably added to the final table. e.g. in the staging table set a column called pay_str to NULL where pay_str = ' ' (or perhaps LIKE ' %')

Date/time formatting for table creation

I am creating a table that will be populated with a COPY. Here's the format of that data:
6/30/2014 2:33:00 PM
MM-DD-YYYY HH:MM:SS ??
What would I use as the formatting for the CREATE TABLE statement?
CREATE TABLE practice (
Data_Time ????
)
One alternative might be to read as varchar() then format later. Seems convoluted tho.
Always store timestamps as timestamp (or timestamptz).
Never use string types (text, varchar, ...) for that.
CREATE TABLE practice (
practice_id serial PRIMARY KEY
, data_time timestamp NOT NULL
);
If your timestamp literals are clean and follow the standard MDY format, you can set the DateStyle temporarily for the transaction to read proper timestamp types directly:
BEGIN;
SET LOCAL datestyle = 'SQL, MDY'; -- works for your example
COPY practice (data_time) FROM '/path/to/file.csv';
COMMIT;
Else, your idea is not that bad: COPY to a temporary table with a text column, sanitize the data and INSERT timestamps from there possibly using to_timestamp(). Example:
Formatting Date(YY:MM:DD:Time) in Excel
You should pretty much never use vharchar() in postgres. Always use text. But it sounds to me like you want 2 columns
create table practice (date_time timestamp, format text)

exporting to csv from db2 with no delimiter

I need to export content of a db2 table to CSV file.
I read that nochardel would prevent to have the separator between each data but that is not happening.
Suppose I have a table
MY_TABLE
-----------------------
Field_A varchar(10)
Field_B varchar(10)
Field_A varchar(10)
I am using this command
export to myfile.csv of del modified by nochardel select * from MY_TABLE
I get this written into the myfile.csv
data1 ,data2 ,data3
but I would like no ',' separator like below
data1 data2 data3
Is there a way to do that?
You're asking how to eliminate the comma (,) in a comma separated values file? :-)
NOCHARDEL tells DB2 not to surround character-fields (CHAR and VARCHAR fields) with a character-field-delimiter (default is the double quote " character).
Anyway, when exporting from DB2 using the delimited format, you have to have some kind of column delimiter. There isn't a NOCOLDEL option for delimited files.
The EXPORT utility can't write fixed-length (positional) records - you would have to do this by either:
Writing a program yourself,
Using a separate utility (IBM sells the High Performance Unload utility)
Writing an SQL statement that concatenates the individual columns into a single string:
Here's an example for the last option:
export to file.del
of del
modified by nochardel
select
cast(col1 as char(20)) ||
cast(intcol as char(10)) ||
cast(deccol as char(30));
This last option can be a pain since DB2 doesn't have an sprintf() function to help format strings nicely.
Yes there is another way of doing this. I always do this:
Put select statement into a file (input.sql):
select
cast(col1 as char(20)),
cast(col2 as char(10)),
cast(col3 as char(30));
Call db2 clp like this:
db2 -x -tf input.sql -r result.txt
This will work for you, because you need to cast varchar to char. Like Ian said, casting numbers or other data types to char might bring unexpected results.
PS: I think Ian points right on the difference between CSV and fixed-length format ;-)
Use "of asc" instead of "of del". Then you can specify the fixed column locations instead of delimiting.