Is there a better implementation to extract a multiline text structure from one file based on the index from the other file?

Is there a better implementation to extract a multiline text structure from one file based on the index from the other file? - perl

I have two files. One is a file of table creation strings dumped from database, the other is the name of tables, with a "prompt" as a prefix and a "..." as suffix. Just as below:
file A (index):
prompt branch...
prompt branch_param...
prompt branch_pre_param...
prompt business...
prompt business_map...
prompt business_type...
file B (dump):
CREATE TABLE "KS"."BRANCH"
("BRANCH_CODE" CHARACTER(3) NOT NULL DEFAULT '',
"BRANCH_NAME" CHARACTER(40) NOT NULL DEFAULT '',
"PARAM_LEVEL" INTEGER NOT NULL DEFAULT 0
)
DATA CAPTURE NONE
IN "LONG_DATA_TBS";
CREATE TABLE "KS"."BRANCH2BANK"
("BRANCH_CODE" CHARACTER(3) NOT NULL DEFAULT '',
"BANK_CODE" CHARACTER(6) NOT NULL DEFAULT '',
"ACC_COMP_RESULT" CHARACTER(1) NOT NULL DEFAULT ''
)
DATA CAPTURE NONE
IN "SMALL_TBS";
CREATE TABLE "KS"."BRANCH2BOND"
("BRANCH_CODE" CHARACTER(3) NOT NULL DEFAULT '',
"BOND_CODE" CHARACTER(8) NOT NULL DEFAULT '',
"BOND_NAME" CHARACTER(20) NOT NULL DEFAULT '',
"TOTAL_AMT" DECIMAL(19, 4) NOT NULL DEFAULT 0,
"FINANCING_CUST_NO" CHARACTER(10) NOT NULL DEFAULT '',
"SET_DATE" CHARACTER(8) NOT NULL DEFAULT '',
"SET_TIME" CHARACTER(8) NOT NULL DEFAULT '',
"SET_EMP" CHARACTER(6) NOT NULL DEFAULT '',
"SPARE1" CHARACTER(20) NOT NULL DEFAULT '',
"SPARE2" CHARACTER(20) NOT NULL DEFAULT ''
)
DATA CAPTURE NONE
IN "SMALL_TBS";
CREATE TABLE "KS"."BRANCH_PARAM"
("BRANCH_CODE" CHARACTER(3) NOT NULL DEFAULT '',
"PARAM_CODE" CHARACTER(4) NOT NULL DEFAULT '',
"SET_DATE" CHARACTER(8) NOT NULL DEFAULT '',
"SET_TIME" CHARACTER(8) NOT NULL DEFAULT ''
)
DATA CAPTURE NONE
IN "SMALL_TBS";
CREATE TABLE "KS"."BRANCH_RESERVE_CREDIT_STOCK"
("BRANCH_CODE" CHARACTER(3) NOT NULL DEFAULT '',
"SET_TIME" CHARACTER(8) NOT NULL DEFAULT ''
)
DATA CAPTURE NONE
IN "TX_DATA_TBS"
INDEX IN "TX_INDEX_TBS";
I have written a perl implementation, but I think it is much too ugly and inefficient. Is there a better way to improve this?
my code: (rewritten with Richard and lilydjwg's advice) (last version)
#!/usr/bin/perl
use 5.016;
my (%hash,$cont);
open IN,'<',shift;
while(<IN>){
chomp;
$hash{$1}=1 if /prompt (\w+)\.\.\./;
}
close IN;
open IN,'<',shift;
while(<IN>){
chomp;
$cont = (defined $hash{lc $1}?say "prompt $1..." : 0) if /CREATE TABLE "KS"\."(\w+)"/;
say if $cont == 1;
}
close IN;

Presumably it's the repeated reads you don't like.
So - read the CREATE TABLE file once, checking for:
CREATE TABLE "KS"."(\w+)"
Then you can build up the table definition until the next CREATE TABLE at which point you put the table-definition into a hash keyed by the table-name.
Then, read your prompts and grab the definitions one by one from the hash printing them out.
Alternatively, you could just read the CREATE TABLE file into a single string and search+replace the table-name part since that's all you seem to be changing at the moment. The first approach is more flexible though.
Edit:
You could make the defined bit a little clearer perhaps with:
while ($line=<IN>) {
chomp($line);
if (/CREATE TABLE "KS"\."(\w+)"/ && $hash{lc $1}) {
$line = ...
}
say $line;
}
I like to use an explicit variable in my while-loops once I get beyond a couple of lines too.

It seems that file A is relatively small. You can read and build a set (or the like) from it which contains all the table names. Then read and identify on the SQL dump file, for each table creation statement, check if that table name is in your set.
I don't quite know Perl, but this Python code seems what you want:
import sys
tableNames = {x[7:-3] for l in open(sys.argv[1]) if x.startswith('prompt ')}
for l in open(sys.argv[2]):
if l.startswith('CREATE TABLE "KS"."'):
name = l.split('"')[4].lower()
if name in tableNames:
print("prompt {0}...\nCreate table{0}(".format(name))
print(l, end='')

Related

Snowflake null values quoted in CSV breaks PostgreSQL unload

I am trying to shift data from Snowflake to Postgresql and to do so I first load it into s3 in CSV format. In the table, comas in text could appear, I therefore use FIELD_OPTIONALLY_ENCLOSED_BY snowflake unloading option to quote the content of the problematic cells. However when this happen + null values, I can't manage to have a valid CSV for PostgreSQL.
I created a simple table for you to understand the issue. Here it is :
CREATE OR REPLACE TABLE PUBLIC.TEST(
TEXT_FIELD VARCHAR(),
NUMERIC_FIELD INT
);
INSERT INTO PUBLIC.TEST VALUES
('A', 1),
(NULL, 2),
('B', NULL),
(NULL, NULL),
('Hello, world', NULL)
;
COPY INTO #STAGE/test
FROM PUBLIC.TEST
FILE_FORMAT = (
COMPRESSION = NONE,
TYPE = CSV,
FIELD_OPTIONALLY_ENCLOSED_BY = '"'
NULL_IF = ''
)
OVERWRITE = TRUE;
Snowflake will from that create the following CSV
"A",1
"",2
"B",""
"",""
"Hello, world",""
But after that, it is for me impossible to copy this CSV inside a PostgreSQL Table as it is.
Even thought from PostgreSQL documentation we have next to NULL option :
Specifies the string that represents a null value. The default is \N (backslash-N) in text format, and an unquoted empty string in CSV format.
Not setting COPY Option in PostgreSQL COPY INTO will result in a failed unloading. Indeed it won't work as we also have to specify the quote used using QUOTE. Here it'll be QUOTE '"'
Therefore during POSTGRESQL unloading, using :
FORMAT csv, HEADER false, QUOTE '"' will give :
DataError: invalid input syntax for integer: "" CONTEXT: COPY test, line 3, column numeric_field: ""
FORMAT csv, HEADER false, NULL '""', QUOTE '"' will give :
NotSupportedError: CSV quote character must not appear in the NULL specification
FYI, To test the unloading in s3 I will use this command in PostgreSQL:
CREATE IF NOT EXISTS TABLE PUBLIC.TEST(
TEXT_FIELD VARCHAR(),
NUMERIC_FIELD INT
);
CREATE EXTENSION IF NOT EXISTS aws_s3 CASCADE;
SELECT aws_s3.table_import_from_s3(
'PUBLIC.TEST',
'',
'(FORMAT csv, HEADER false, NULL ''""'', QUOTE ''"'')',
'bucket',
'test_0_0_0.csv',
'aws_region'
)
Thanks a lot for any ideas on what I could do to make it happen? I would love to find a solution that don't requires modifying the csv between snowflake and postgres. I think it is an issue more on the Snowflake side as it don't really make sense to quote null values. But PostgreSQL is not helping either.

When you set the NULL_IF value to '', you are actually telling Snowflake to convert NULLS to a BLANK, which then get quoted. When you are copying out of Snowflake, the copy options are "backwards" in a sense and NULL_IF acts more like an IFNULL.
This is the code that I'd use on the Snowflake side, which will result in an unquoted empty string in your CSV file:
FILE_FORMAT = (
COMPRESSION = NONE,
TYPE = CSV,
FIELD_OPTIONALLY_ENCLOSED_BY = '"'
NULL_IF = ()
)

How can I combine these two statements?

I'm currently trying to insert data into a database from a text boxes, $enter / $enter2 being where the text is being written.
The database consists of three columns ID, name and nametwo
ID is auto incrementing and works fine
Both statements work fine on their own, but because they are being issued separately the first leaves nametwo blank and the second leaves name blank.
I've tried combining both but haven't had much luck, hope someone can help.
$dbh->do("INSERT INTO $table(name) VALUES ('".$enter."')");
$dbh->do("INSERT INTO $table(nametwo) VALUES ('".$enter2."')");

To paraphrase what others have said:
my $sth = $dbh->prepare("INSERT INTO $table(name,nametwo) values (?,?)");
$sth->execute($enter, $enter2);
So you don't have to worry about quoting.

You should read database manual.
The query should be:
$dbh->do("INSERT INTO $table(name,nametwo) VALUES ('".$enter."', '".$enter2."')");

The SQL syntax is
INSERT INTO MyTable (
name_one,
name_two
) VALUES (
"value_one",
"value_two"
)
Your way of generating SQL statements is very fragile. For example, it will fail if the table name is Values or the value is Jester's.
Solution 1:
$dbh->do("
INSERT INTO ".$dbh->quote_identifier($table_name)."
name_one,
name_two
) VALUES (
".$dbh->quote($value_one).",
".$dbh->quote($value_two)."
)
");
Solution 2: Placeholders
$dbh->do(
" INSERT INTO ".$dbh->quote_identifier($table_name)."
name_one,
name_two
) VALUES (
?, ?
)
",
undef,
$value_one,
$value_two,
);

How to import data into teradata tables from delimited file using BTEQ import?

I am trying to execute following bteq command on linux environment but couldn't load data properly into Teradata DB server. Can someone please advise me to resolve the below issue that I am facing while loading.
BTEQ Command used :
.SET width 64000;
.SET session transaction btet;
.logmech ldap
.logon XXXXXXX/XXXXXXXX,********;
DATABASE corecm;
.PACK 1000
.IMPORT VARTEXT '~' FILE=/v/global/user/application_event_bus_evt
.REPEAT *
USING(APPLICATION_EVENT_ID CHAR(24),BUS_EVT_ID CHAR(24),BUS_EVT_VID BIGINT,BUS_EVT_RESTATE_IN SMALLINT)
insert into corecm.application_event_bus_evt (APPLICATION_EVENT_ID
, BUS_EVT_ID
, BUS_EVT_VID
, BUS_EVT_RESTATE_IN
)
values
( COALESCE(:APPLICATION_EVENT_ID,1)
, COALESCE(:BUS_EVT_ID,1)
, COALESCE(:BUS_EVT_VID,1)
, COALESCE(:BUS_EVT_RESTATE_IN,1)
) ;
.LOGOFF;
.EXIT;
SAMPLE INPUT FILE DELIMITTER "~" [ /v/global/user/application_event_bus_evt ] :
Ckn3gMxLEeOgIQBQVgErYA==~g+GDDtlaY3n7BdUrYshDFA==~1~1
CL1kEcxLEeOgIQBQVgErYA==~qoKoiuGDbClpcGt/z6RKGw==~1~1
oYIVcMxKEeOgIQBQVgErYA==~mfmQiwl7yAteevzJfilMvA==~1~1
5N7ME5bM4xGhM7exj3ykUw==~yFM2FZbM4xGhM7exj3ykUw==~1~0
JLBH4JfM4xGDH9s5+Ds/8w==~doZ/7pfM4xGDH9s5+Ds/8w==~1~0
fGvpoMxKEeOgIQBQVgErYA==~mQUQIK2mY6WIPcszfp5BTQ==~1~1
Table Definition :
CREATE MULTISET TABLE CORECM.APPLICATION_EVENT_BUS_EVT ,NO FALLBACK ,
NO BEFORE JOURNAL,
NO AFTER JOURNAL,
CHECKSUM = DEFAULT,
DEFAULT MERGEBLOCKRATIO
(
APPLICATION_EVENT_ID CHAR(26) CHARACTER SET LATIN NOT CASESPECIFIC NOT NULL,
BUS_EVT_ID CHAR(26) CHARACTER SET LATIN NOT CASESPECIFIC NOT NULL,
BUS_EVT_VID BIGINT NOT NULL,
BUS_EVT_RESTATE_IN SMALLINT)
UNIQUE PRIMARY INDEX ( APPLICATION_EVENT_ID ,BUS_EVT_ID ,BUS_EVT_VID )
INDEX APPLICATION_EVENT_BUS_EVT_IDX1 ( APPLICATION_EVENT_ID )
INDEX APPLICATION_EVENT_BUS_EVT_IDX2 ( BUS_EVT_ID ,BUS_EVT_VID );
Results set in DB server as,
APPLICATION_EVENT_ID BUS_EVT_ID BUS_EVT_VID BUS_EVT_RESTATE_IN
1 Ckn3gMxLEeOgIQBQVgErYA == g+GDDtlaY3n7BdUrYshD 85,849,873,219,141,958 12,544
2 CL1kEcxLEeOgIQBQVgErYA == qoKoiuGDbClpcGt/z6RK 85,849,873,219,155,783 12,544
3 oYIVcMxKEeOgIQBQVgErYA == mfmQiwl7yAteevzJfilM 85,849,873,219,142,006 12,544
4 5N7ME5bM4xGhM7exj3ykUw == JAf0GpbM4xGhM7exj3yk 85,849,873,219,155,797 12,288
5 JLBH4JfM4xGDH9s5+Ds/8w == Du6T7pfM4xGDH9s5+Ds/ 85,849,873,219,155,768 12,288
6 fGvpoMxKEeOgIQBQVgErYA == mQUQIK2mY6WIPcszfp5B 85,849,873,219,146,068 12,544
If we look at the Data, we can see two issues as,
First two column data length is 24 CHARACTERS ( as per input file ), but the issue is that it been shifted two characters in next column.
Column BUS_EVT_VID and BUS_EVT_RESTATE_IN has wrong data 85,849,873,219,141,958 and 12,544 instead of 1 and 1 respectively (this may be because first two column data got shifted)
I tried following options to resolve the above issue but couldn't resolve the issue,
Modified the Table Definition, i.e. changed datatype to
CHAR(28),CHAR(24),CHAR(26)
Modified the Table Definition column
datatypes to VARCHAR(24), VARCHAR(26)
Modified BTEQ command, i.e. altered datatype in below line,
USING(APPLICATION_EVENT_ID CHAR(24),BUS_EVT_ID CHAR(24),BUS_EVT_VID BIGINT,BUS_EVT_RESTATE_IN SMALLINT)
Thanks in advance.

When you define VARTEXT all input columns must be defined as VARCHAR, but you used CHAR and INT.
This should work, VARCHAR length based on the definition of your target table:
USING(
APPLICATION_EVENT_ID VARCHAR(26),
BUS_EVT_ID VARCHAR(26),
BUS_EVT_VID VARCHAR(19),
BUS_EVT_RESTATE_IN VARCHAR(6)
)

String getting converted to number when inserting it in database through Perl's DBI $sth->execute() function

I'm using Perl's DBI and SQLite database (I have DBD::SQLite installed). I have the following code:
my $dbh = DBI->connect("dbi:SQLite:dbname=$db", "", "", { RaiseError => 1, AutoCommit => 1 });
...
my $q = "INSERT OR IGNORE INTO books (identica, book_title) VALUES (?, ?)";
my $sth = $dbh->prepare($q);
$sth->execute($book_info->{identica}, $book_info->{book_title});
The problem I have is when $book_info->{identica} begins with 0's they get dropped and I get a number inserted in the database.
For example, identica of 00123 will get converted to 123.
I know SQLite doesn't have types, so how do I make DBI to insert the identica as string rather than number?
I tried quoting it as "$book_info->{identica}" when passing to $sth->execute but that didn't help.
EDIT
Even if I insert value directly in query it doesn't work:
my $i = $book_info->{identica};
my $q = "INSERT OR IGNORE INTO books (identica, book_title) VALUES ('$i', ?)";
my $sth = $dbh->prepare($q);
$sth->execute($book_info->{book_title});
This still coverts 00123 to 123, and 0000000009 to 9...
EDIT
Holy sh*t, I did this on the command line, and I got this:
sqlite> INSERT INTO books (identica, book_title) VALUES ('0439023521', 'a');
sqlite> select * from books where id=28;
28|439023521|a|
It was dropped by SQLite!
Here is how the schema looks:
CREATE TABLE books (
id INTEGER PRIMARY KEY AUTOINCREMENT,
identica STRING NOT NULL,
);
CREATE UNIQUE INDEX IDX_identica on books(identica);
CREATE INDEX IDX_book_title on books(book_title);
Any ideas what is going on?
SOLUTION
It's sqlite problem, see answer by in the comments by Jim. The STRING has to be TEXT in sqlite. Otherwise it treats it as number!
Changing schema to the following solved it:
CREATE TABLE books (
id INTEGER PRIMARY KEY AUTOINCREMENT,
identica TEXT NOT NULL,
);

Use bind params
my $sth = $dbh->prepare($q);
$sth->bind_param(1, 00123, { TYPE => SQL_VARCHAR });
$sth->bind_param(2, $book_info->{book_title});
$sth->execute();
UPDATE:
Read about type affinity in SQLite. Because your column type is STRING (technically unsupported), it defaults to INTEGER affinity. You need to create your column as TEXT instead.

According to the docs, if the column type (affinity) is TEXT it should store it as a string; otherwise it will be a number.

DB2 Case Sensitivity

I'm having great difficultly making my DB2 (AS/400) queries case insensitive.
For example:
SELECT *
FROM NameTable
WHERE LastName = 'smith'
Will return no results, but the following returns 1000's of results:
SELECT *
FROM NameTable
WHERE LastName = 'Smith'
I've read of putting SortSequence/SortType into your connection string but have had no luck... anyone have exepierence with this?
Edit:
Here's the stored procedure:
BEGIN
DECLARE CR CURSOR FOR
SELECT T . ID ,
T . LASTNAME ,
T . FIRSTNAME ,
T . MIDDLENAME ,
T . STREETNAME || ' ' || T . ADDRESS2 || ' ' || T . CITY || ' ' || T . STATE || ' ' || T . ZIPCODE AS ADDRESS ,
T . GENDER ,
T . DOB ,
T . SSN ,
T . OTHERINFO ,
T . APPLICATION
FROM
( SELECT R . * , ROW_NUMBER ( ) OVER ( ) AS ROW_NUM
FROM CPSAB32.VW_MYVIEW
WHERE R . LASTNAME = IFNULL ( #LASTNAME , LASTNAME )
AND R . FIRSTNAME = IFNULL ( #FIRSTNAME , FIRSTNAME )
AND R . MIDDLENAME = IFNULL ( #MIDDLENAME , MIDDLENAME )
AND R . DOB = IFNULL ( #DOB , DOB )
AND R . STREETNAME = IFNULL ( #STREETNAME , STREETNAME )
AND R . CITY = IFNULL ( #CITY , CITY )
AND R . STATE = IFNULL ( #STATE , STATE )
AND R . ZIPCODE = IFNULL ( #ZIPCODE , ZIPCODE )
AND R . SSN = IFNULL ( #SSN , SSN )
FETCH FIRST 500 ROWS ONLY )
AS T
WHERE ROW_NUM <= #MAXRECORDS
OPTIMIZE FOR 500 ROW ;
OPEN CR ;
RETURN ;

Why not do this:
WHERE lower(LastName) = 'smith'
If you're worried about performance (i.e. the query not using an index), keep in mind that DB2 has function indexes, which you can read about here. So essentially, you can create an index on upper(LastName).
EDIT
To do the debugging technique I discussed in the comments, you could do something like this:
create table log (msg varchar(100, dt date);
Then in your SP, you can insert messages to this table for debugging purposes:
insert into log (msg, dt) select 'inside the SP', current_date from sysibm.sysdummy1;
Then after the SP runs, you can select from this log table to see what happened.

If you want case-insensitive in your procedure, try using this option in it:
SET OPTION SRTSEQ = *LANGIDSHR ;
You should also create an index to support it for performance. Create the index when you have *LANGIDSHR as a connection attribute, and the shared-weight index should then be available to later jobs. (There are various ways to get the appropriate setting into effect.)
*LANGIDSHR relates to the language-ID for your jobs. Characters in the character set that might be considered as "equals", such as 'A' and 'a' or 'ü' and 'u', should be given equal weights (shared) and so select together.

I did something similar when I wanted a case insensitive search. I used UPPER(mtfield) = 'SEARCHSTRING'. I know this works.

See: https://stackoverflow.com/a/47181640/5507619
Database setting
There is a database config setting you can set at database creation. It's based on unicode, though.
CREATE DATABASE yourDB USING COLLATE UCA500R1_S1
The default Unicode Collation Algorithm is implemented by the UCA500R1 keyword without any attributes. Since the default UCA cannot simultaneously encompass the collating sequence of every language supported by Unicode, optional attributes can be specified to customize the UCA ordering. The attributes are separated by the underscore (_) character. The UCA500R1 keyword and any attributes form a UCA collation name.
The Strength attribute determines whether accent or case is taken into account when collating or comparing text strings. In writing systems without case or accent, the Strength attribute controls similarly important features.
The possible values are: primary (1), secondary (2), tertiary (3), quaternary (4), and identity (I). To ignore:
accent and case, use the primary strength level
case only, use the secondary strength level
neither accent nor case, use the tertiary strength level
Almost all characters can be distinguished by the first three strength levels, therefore in most locales the default Strength attribute is set at the tertiary level. However if the Alternate attribute (described below) is set to shifted, then the quaternary strength level can be used to break ties among white space characters, punctuation marks, and symbols that would otherwise be ignored. The identity strength level is used to distinguish among similar characters, such as the MATHEMATICAL BOLD SMALL A character (U+1D41A) and the MATHEMATICAL ITALIC SMALL A character (U+1D44E).
Setting the Strength attribute to higher level will slow down text string comparisons and increase the length of the sort keys.
Examples:
UCA500R1_S1 will collate "role" = "Role" = "rôle"
UCA500R1_S2 will collate "role" = "Role" < "rôle"
UCA500R1_S3 will collate "role" < "Role" < "rôle"
This worked for me. As you can see, ..._S2 ignores case, too.
Using a newer standard version, it should look like this:
CREATE DATABASE yourDB USING COLLATE CLDR181_S1
Collation keywords:
UCA400R1 = Unicode Standard 4.0 = CLDR version 1.2
UCA500R1 = Unicode Standard 5.0 = CLDR version 1.5.1
CLDR181 = Unicode Standard 5.2 = CLDR version 1.8.1
If your database is already created, there is supposed to be a way to change the setting.
CALL SYSPROC.ADMIN_CMD( 'UPDATE DB CFG USING DB_COLLNAME UCA500R1_S1 ' );
I do have problems executing this, but for all I know it is supposed to work.
Generated table row
Other options are e.g. generating a upper case row:
CREATE TABLE t (
id INTEGER NOT NULL PRIMARY KEY,
str VARCHAR(500),
ucase_str VARCHAR(500) GENERATED ALWAYS AS ( UPPER(str) )
)#
INSERT INTO t(id, str)
VALUES ( 1, 'Some String' )#
SELECT * FROM t#
ID STR UCASE_STR
----------- ------------------------------------ ------------------------------------
1 Some String SOME STRING
1 record(s) selected.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Is there a better implementation to extract a multiline text structure from one file based on the index from the other file? - perl

Related

Snowflake null values quoted in CSV breaks PostgreSQL unload

How can I combine these two statements?

How to import data into teradata tables from delimited file using BTEQ import?

String getting converted to number when inserting it in database through Perl's DBI $sth->execute() function

DB2 Case Sensitivity

Categories

Resources