Copy into snowflake table from raw data file using Perl DBI - perl

There's not much info out there for perl dbi and snowflake so I'll give this a shot. I have a raw file, of which the headers are contained in line 1. This exact 'copy into' command works from the snowflake gui. I'm not sure if I can just take this exact command and put it into a perl prepare and execute.
COPY INTO DBTABLE.LND_LND_STANDARD_DATA FROM (
SELECT SPLIT_PART(METADATA$FILENAME,'/',4) as SEAT_ID,
$1:auction_id_64 as AUCTION_ID_64,
DATEADD(S,\$1:date_time,'1970-01-01') as DATE_TIME,
$1:user_tz_offset as USER_TZ_OFFSET,
$1:creative_width as CREATIVE_WIDTH,
$1:creative_height as CREATIVE_HEIGHT,
$1:media_type as MEDIA_TYPE,
$1:fold_position as FOLD_POSITION,
$1:event_type as EVENT_TYPE
FROM #DBTABLE.lnd.S3_STAGE_READY/pr/data/standard/data_dt=20200825/00/STANDARD_FILE.gz.parquet)
pattern = '.*.parquet' file_format = (TYPE = 'PARQUET' SNAPPY_COMPRESSION = TRUE)
ON_ERROR = 'SKIP_FILE_10%'
my $SQL = "COPY INTO DBTABLE.LND_LND_STANDARD_DATA FROM (
SELECT SPLIT_PART(METADATA\$FILENAME,'/',4) as SEAT_ID,
\$1:auction_id_64 as AUCTION_ID_64,
DATEADD(S,\$1:date_time,'1970-01-01') as DATE_TIME,
\$1:user_tz_offset as USER_TZ_OFFSET,
\$1:creative_width as CREATIVE_WIDTH,
\$1:creative_height as CREATIVE_HEIGHT,
\$1:media_type as MEDIA_TYPE,
\$1:fold_position as FOLD_POSITION,
\$1:event_type as EVENT_TYPE
FROM \#DBTABLE.lnd.S3_STAGE_READY/pr/data/standard/data_dt=20200825/00/STANDARD_FILE.gz.parquet)
pattern = '.*.parquet' file_format = (TYPE = 'PARQUET' SNAPPY_COMPRESSION = TRUE)
ON_ERROR = 'SKIP_FILE_10%'";
my $sth = $dbh->prepare($sql);
$sth->execute;
In looking at the output from snowflake I see this error
syntax error line 3 at position 4 unexpected '?'.
syntax error line 4 at position 13 unexpected '?'.
COPY INTO DBTABLE.LND_LND_STANDARD_DATA FROM (
SELECT SPLIT_PART(METADATA$FILENAME,'/',4) as SEAT_ID,
$1? as AUCTION_ID_64,
DATEADD(S,$1?,'1970-01-01') as DATE_TIME,
$1? as USER_TZ_OFFSET,
$1? as CREATIVE_WIDTH,
$1? as CREATIVE_HEIGHT,
$1? as MEDIA_TYPE
Do I need to create bind variables for each of the columns? I usually pull in the data from the file and put them into variables but this is different as I can't read the raw file first, it has to come directly from the copy into command.
Any help would be appreciated.

It was interpreting the : as a bind variable value, rather than a value in a variant. I used the bracket notation, instead like the following:
my $SQL = "COPY INTO DBTABLE.LND_LND_STANDARD_DATA FROM (
SELECT SPLIT_PART(METADATA\$FILENAME,'/',4) as SEAT_ID,
\$1['auction_id_64'] as AUCTION_ID_64,
DATEADD(S,\$1['date_time,'1970-01-01') as DATE_TIME,
\$1['user_tz_offset'] as USER_TZ_OFFSET,
\$1:creative_width'] as CREATIVE_WIDTH,
etc...
That worked

Related

Snowflake null values quoted in CSV breaks PostgreSQL unload

I am trying to shift data from Snowflake to Postgresql and to do so I first load it into s3 in CSV format. In the table, comas in text could appear, I therefore use FIELD_OPTIONALLY_ENCLOSED_BY snowflake unloading option to quote the content of the problematic cells. However when this happen + null values, I can't manage to have a valid CSV for PostgreSQL.
I created a simple table for you to understand the issue. Here it is :
CREATE OR REPLACE TABLE PUBLIC.TEST(
TEXT_FIELD VARCHAR(),
NUMERIC_FIELD INT
);
INSERT INTO PUBLIC.TEST VALUES
('A', 1),
(NULL, 2),
('B', NULL),
(NULL, NULL),
('Hello, world', NULL)
;
COPY INTO #STAGE/test
FROM PUBLIC.TEST
FILE_FORMAT = (
COMPRESSION = NONE,
TYPE = CSV,
FIELD_OPTIONALLY_ENCLOSED_BY = '"'
NULL_IF = ''
)
OVERWRITE = TRUE;
Snowflake will from that create the following CSV
"A",1
"",2
"B",""
"",""
"Hello, world",""
But after that, it is for me impossible to copy this CSV inside a PostgreSQL Table as it is.
Even thought from PostgreSQL documentation we have next to NULL option :
Specifies the string that represents a null value. The default is \N (backslash-N) in text format, and an unquoted empty string in CSV format.
Not setting COPY Option in PostgreSQL COPY INTO will result in a failed unloading. Indeed it won't work as we also have to specify the quote used using QUOTE. Here it'll be QUOTE '"'
Therefore during POSTGRESQL unloading, using :
FORMAT csv, HEADER false, QUOTE '"' will give :
DataError: invalid input syntax for integer: "" CONTEXT: COPY test, line 3, column numeric_field: ""
FORMAT csv, HEADER false, NULL '""', QUOTE '"' will give :
NotSupportedError: CSV quote character must not appear in the NULL specification
FYI, To test the unloading in s3 I will use this command in PostgreSQL:
CREATE IF NOT EXISTS TABLE PUBLIC.TEST(
TEXT_FIELD VARCHAR(),
NUMERIC_FIELD INT
);
CREATE EXTENSION IF NOT EXISTS aws_s3 CASCADE;
SELECT aws_s3.table_import_from_s3(
'PUBLIC.TEST',
'',
'(FORMAT csv, HEADER false, NULL ''""'', QUOTE ''"'')',
'bucket',
'test_0_0_0.csv',
'aws_region'
)
Thanks a lot for any ideas on what I could do to make it happen? I would love to find a solution that don't requires modifying the csv between snowflake and postgres. I think it is an issue more on the Snowflake side as it don't really make sense to quote null values. But PostgreSQL is not helping either.
When you set the NULL_IF value to '', you are actually telling Snowflake to convert NULLS to a BLANK, which then get quoted. When you are copying out of Snowflake, the copy options are "backwards" in a sense and NULL_IF acts more like an IFNULL.
This is the code that I'd use on the Snowflake side, which will result in an unquoted empty string in your CSV file:
FILE_FORMAT = (
COMPRESSION = NONE,
TYPE = CSV,
FIELD_OPTIONALLY_ENCLOSED_BY = '"'
NULL_IF = ()
)

How to Perl Placeholder Variable for a DBD Mysql

I recently had some help with dbi and using placeholders in Perl for Mysql queries. However I am having an issue when using multiple statements for the previously declared or prepared variable in the dbi script.
Code:
use strict;
use warnings;
use DBI qw(:sql_types);
my $dbh = DBI->connect("DBI:mysql:...", ...);
## TABLE CREATION
$dbh->do("USE test;")
$dbh->do("CREATE TEMPORARY TABLE day5 (id INT, temp VARCHAR(4), time TIME, sumadd INT(11))");
$dbh->do("CREATE TEMPORARY TABLE humid (temp VARCHAR(4), i24 INT (10))");
$dbh->do("INSERT INTO day5 (id,temp,time) VALUES(1,'30','03:00:00') ");
$dbh->do("INSERT INTO humid (temp,i24) VALUES('30',8321) ");
## FAILING CODE
my $inter1 = 'i24'; # Generated value
my $sth = $dbh->prepare("SET \#sumadd5 = (SELECT ? FROM humid WHERE temp ='30') ");
$sth->bind_param( 1, $inter1 );
$sth->finish();
$dbh->do("UPDATE day5 SET sumadd= (SELECT \#sumadd5) WHERE time= '03:00:00' ");
my $sumadd = $dbh->selectrow_array("SELECT sumadd FROM day5");
print "$sumadd\n";
$dbh->disconnect();
$sumadd is undefined, but I expect 8321.
My question is how can I make it so the #sumadd5 variable can be inserted into the above query [UPDATE day5 SET sumadd=]? I have added some remarks in the syntax in the spirit of clarity.
IF I RUN MANUALLY BY APPLYING THE SYNTAX INTO MYSQL REMOVING PERL SPECIFICS MY TABLE UPDATE. IF I RUN THE SCRIPT NOTHING HAPPENS TO THE TABLE AND NO ERRORS ARE DISPLAYED.
I suspect that the break is with the UPDATE, however I can't confirm the $inter1 is being passed to the placeholder.
Problem #1
You never execute the statement you prepare. Use
my $sth = $dbh->prepare(q{...});
$sth->bind_param(1, $inter1);
$sth->execute(); <---
$sth->finish();
Shortcut #1:
my $sth = $dbh->prepare(q{...});
$sth->execute($inter1);
$sth->finish();
Shortcut #2:
$dbh->do(
q{...},
undef,
$inter1,
);
Problem #2
SELECT ? ...
behaves as if you had done
SELECT "i24" ... // As opposed to: SELECT i24 ...
just like
SELECT time ...
behaves as if you had done
SELECT '03:00:00' ...`
You can't build the SQL statement you are executing. You need to build it before you execute it. Use
$dbh->do(q{
SET #sumadd5 = (
SELECT }.( $dbh->quote_identifier($inter1) ).q{
FROM humid
WHERE temp = '30'
)
});

How do I use Placeholders in Perl to Inject a Variable into Mysql

I keep getting errors when attempting to use placeholders in my perl script for a Mysql routine.
Code :
use DBI;
my $driver = "mysql";
my $database = "database";
my $user = "exxxxxx";
my $password = "xxxxx";
my $dsn = "DBI:mysql:$database;mysql_local_infile=ON";
my $dbh = DBI->connect($dsn,$user,$password);
$dbh->do("SET \#tempc5 = (SELECT temp FROM day5 WHERE time = '00:00') ");
my $inter1 = i24;
$sth = $dbh->prepare( "SET \#sumadd5 = (SELECT ? FROM humid WHERE temp=\#tempc5) " );
$sth->bind_param( 1, $inter1 );
$sth->finish();
$dbh->disconnect();
This produces the following error:
Global symbol "$sth" requires explicit...
If I add a my $sth I get the following error:
Scalar found where operator expected...
Note that I am have no objection in trying this with $dbh->do("SET"
if possible.
The placeholders are not allowed for column names according to MySQL Manual for mysql_stmt_prepare() which is the function behind prepare.
The markers are legal only in certain places in SQL statements. For
example, they are permitted in the VALUES() list of an INSERT
statement (to specify column values for a row), or in a comparison
with a column in a WHERE clause to specify a comparison value.
However, they are not permitted for identifiers (such as table or
column names), or to specify both operands of a binary operator such
as the = equal sign. The latter restriction is necessary because it
would be impossible to determine the parameter type. In general,
parameters are legal only in Data Manipulation Language (DML)
statements, and not in Data Definition Language (DDL) statements.
If you think about it, it would not make sense to prepare a statement where you can change a column. Preparation of statement includes execution plan, but you can't plan execution of a statement where you don't know if given column has or doesn't have an index on it.
You can't use a placeholder there.
When you call prepare, all structural information about your tables is baked into the query, waiting for you to pass in data values to replace placeholders when you execute the query.
But you're trying to use a placeholder for a column name, which is part of the table's structure.
If you fix the Perl syntax to be:
my $inter1 = 'i24';
my $sth = $dbh->prepare( "SET \#sumadd5 = (SELECT ? FROM humid WHERE temp=\#tempc5) " );
$sth->execute($inter1);
it should run, but the ? will be treated as a data value rather than a column name (structural information). So you'll get the results of the SQL query
SET #sumadd5 = (SELECT 'i24' FROM humid WHERE temp=#tempc5)
instead of
SET #sumadd5 = (SELECT i24 FROM humid WHERE temp=#tempc5)
The subquery will return the literal value "i24" for each matching row rather than the value found in column i24.
You didn't quoted the vaule of $inter1. Change $inter1 = i24; to $inter1 = 'i24';. Just edited in your code, this will not give you syntax error.
use warnings;
use strict;
use DBI;
my $driver = "mysql";
my $database = "database";
my $user = "exxxxxx";
my $password = "xxxxx";
my $dsn = "DBI:mysql:$database;mysql_local_infile=ON";
my $dbh = DBI->connect($dsn,$user,$password);
$dbh->do("SET \#tempc5 = (SELECT temp FROM day5 WHERE time = '00:00') ");
my $inter1 = 'i24';
my $sth = $dbh->prepare( "SET \#sumadd5 = (SELECT ? FROM humid WHERE temp=\#tempc5) " );
$sth->bind_param( 1, $inter1 );
$sth->finish();
$dbh->disconnect();

Using date in ref cusor with pl/sql

I have a var: acc_date with type date.
It takes its value from a cursor and when I insert its value to logger table as:
insert into logger values(1,acc_date);
the out put when a select it from logger is
1 01-JAN-10
but when i use it to compare with another Date value in another cursor as
OPEN c_get_date_id
for 'SELECT Date_D.DATEKEY from Date_D where Date_D.DATEVALUE='||acc_date;
EXIT WHEN c_get_date_id%NOTFOUND;
FETCH c_get_date_id
INTO date_id;
insert into logger values (1,'Now with date_id'||date_id);
CLOSE c_get_date_id;
an error occurs:
Error report:
ORA-00904: "JAN": invalid identifier
ORA-06512: at "HW.FILLFACT", line 82
ORA-06512: at line 1
00904. 00000 - "%s: invalid identifier"
*Cause:
*Action:
strong text
You need at least add some quotes around the date:
....' where Date_D.DATEVALUE='''||acc_date||'''';
Double apostrophes within a string will be concatenated to a single apostrophe, so that the expression becomes
where Date_D.DATEVALUE='....';
In order to make the thing more foolprof, I'd also add a specific to_date:
.... ' where Date_D.DATEVALUE=to_date(''' || acc_date || ', ''dd-mon-yy'')';
At the moment your dynamic query is being interpreted as:
SELECT Date_D.DATEKEY from Date_D where Date_D.DATEVALUE=01-JAN-10
The error is because string representation of the date isn't being quoted, so it's seeing JAN as an identifier - and nothing matches that name. You could enclose the date value in quotes:
open c_get_date_id
for 'SELECT Date_D.DATEKEY from Date_D where Date_D.DATEVALUE='''||acc_date||'''';
But you're treating the date as a string, and forcing conversion of all your table values to strings to be compared, using your session's NLS_DATE_FORMAT. It would be better to compare it as a date (although this somewhat assumes all your values have the time portion set to midnight):
open c_get_date_id
for select date_d.datekey from date_d where date_d.datevalue = acc_date;
Your exit is in the wrong place though, and you aren't looping, so maybe you want:
open c_get_date_id
for select date_d.datekey from date_d where date_d.datevalue = acc_date;
loop
fetch c_get_date_id into date_id;
exit when c_get_date_id%notfound;
insert into logger values (1, 'Now with date_id'||date_id);
end loop;
close c_get_date_id;
If you only have one value in the first place though, you probably don't want a loop or cursor at all, and could do a simple select ... into instead:
select date_d.datekey into date_id from date_d
where date_d.datevalue = acc_date;
insert into logger values (1, 'Now with date_id'||date_id);
Though of course that would error if you had no matching date in your table, or more than one, and you'd need to deal with that - but then I guess you'd want to anyway.

DBI::Sybase data-conversion resulted in overflow

I am writing a Perl script that is using the DBI module and is connecting to a Sybase DB. I am calling a stored procedure (one that I don't have access to so I cannot post sample code) and when I get data back I get an error that reads "error_handler: Data-conversion resulted in overflow". I still get data back and after doing some intensive research it seems that some data types in the columns (such as BigInt, nvarchar, etc) are the culprits. Now the question is, how can I fix this? Can this be fixed on the client side or can it only be fixed on the server side?
my $dbh = DBI->connect("DBI:Sybase:server=$server", $username, $password, {PrintError => 0}) or die;
$dbh->do("use $database") or die;
my $sql = &getQuery;
my $sth = $dbh->prepare($sql) or die;
$sth->execute() or die;
while ($rowRef = $sth->fetchrow_arrayref) #Error seems to occur here
{
#Parse through each row
}
Part of the FreeTDS 0.82 log that explains the problem:
_ct_bind_data(): column 7 is type 38 and has length 8
_ct_get_server_type(0)
_ct_get_client_type(type 38, user 0, size 8)
cs_convert(0x18dfed40, 0x7fff73216050, 0x18e44250, 0x7fff73215fa0, 0x18e387c0, 0x18e45a64)
_ct_get_server_type(30)
_ct_get_server_type(0)
converting type 127 (8 bytes) to type = 47 (9 bytes)
cs_convert() calling tds_convert
cs_convert() tds_convert returned 10
cs_prretcode(0)
cs_convert() returning CS_FAIL
cs_convert-result = 1
The problem is on the FreeTDS side. I've had the same problem before and successfully fixed it by converting the returned fields to varchar in the select statement.
Given you don't have access to modify the original query, you can do some regex search and replace on the returned $sql variable in your code. In particular, if the original query has a part that looks like
SELECT field1, field2, field3 FROM ...
After you retrieve the query statement, you may run
my $new_sql;
if ($sql =~ /SELECT\s+(.*)\s+FROM/i) { # match selected field string
my $field_str = $1;
my #fields = split ",", $field_str; # parse individual fields
map s/\s//g, #fields; # get rid of spaces
my $new_str = join ", ", (map {sprintf "convert(varchar, $_)"} #fields); # construct new query string
my $quoted_field_str = quotemeta($field_str); # prepare regex replacement string
$new_sql = $sql;
$new_sql =~ s/$quoted_field_str/$new_str/i # actual replacement
}
print $new_sql;
Of course, if your original statement is more complex, you should print it out and check how to modify it with a generic replacement bearing the same spirit. Alternatively, you can ask your DBA (or whoever has access to the stored procedure) to modify the actual query directly.
Hope this helps.