Snowflake - Fail COPY INTO (Can't parse '0' as date with format 'YYYYMMDD') - copy

My pipe is executing a COPY INTO command every time a parquet file is loaded into a STAGED location in AWS S3, that's working just fine (the execution).
This is my copy query: (summarized)
copy into table_name
from (
TRY_TO_DATE(
$1:int_field::varchar,
'YYYYMMDD'
) as date_field
from #"stage-location"/path/path2/ (FILE_FORMAT = > c000)
) ON_ERROR = "SKIP_FILE_1%" PATTERN = ".*part.*"
So, I convert $1:int_field (type:int) to VARCHAR (::varchar) and then parse this varchar to DATE in 'YYYYMMDD' format. That works fine for int_field that conform to this format, but when the field is 0, the load fails (only when is executed by the pipe)
When the pipe executed the COPY COMMAND by it self I checked the COPY_HISTORY and got the following error:
Can't parse '0' as date with format 'YYYYMMDD'
And of course the load fails...
FAILED LOAD
Here is when the thing gets interesting: when I execute this SAME copy command by myself in the Worksheets, load goes smoothly:
OK LOAD
I tried:
VALIDATE, VALIDATION_MODE, VALIDATE_PIPE_LOAD, but This function does not support COPY INTO statements that transform data during a load, like mine.
FILE_FORMAT= (FORMAT_NAME=c000 DATE_FORMAT='YYYYMMDD') ON_ERROR = "SKIP_FILE_1%" >>> SAME ISSUE, the file's only loaded when I execute the COPY COMMAND with my own hand.
I thought the problem was the "ON_ERROR" option, but I can't erase it (I think), I need to filter the REAL errors :(
Maybe is some SESSION problem or so, I read smthg about DATE_INPUT_FORMAT, but I can't detect the exact problem to solve this.
Can someone help me? Thanks!

On my tests, I see that it fails all the time (even the stand-alone COPY does not work). On the other hand, querying from the stage file works as expected.
select TRY_TO_DATE(
$1::varchar,
'YYYYMMDD'
) as date_field
from #my_stage; -- works
copy into testing
from (
select
TRY_TO_DATE(
$1::varchar,
'YYYYMMDD'
)
from #my_stage
) ON_ERROR = "SKIP_FILE_1%"; -- fails with "Date '0' is not recognized"
It seems there is an issue with TRY_TO_DATE when running as part of a COPY transformation. By the way, I tested TRY_TO_NUMBER, and it works.
You should submit a case to the Snowflake support, so the development team can investigate the issue.

Related

Import CSV file to mariadb

Lately, I am facing problems importing from CSV files. I am using
MariaDB : 10.3.32-MariaDB-0ubuntu0.20.04.1
On Ubuntu : Ubuntu 20.04.3 LTS
I am using this command
LOAD DATA LOCAL INFILE '/path_to_file/data.csv' INTO TABLE tab
FIELDS TERMINATED BY ','
OPTIONALLY ENCLOSED BY '"'
ESCAPED BY '"'
LINES TERMINATED BY '\n'
IGNORE 1 LINES;
After searching and trying I found that I can only load File from tmp folder. i.e.
SELECT load_file('/tmp/data.csv');
But it didn't work on other paths.
And secondly, I found that even If the CSV file is present in tmp folder; If it contains a lot of fields then again MariaDB would fail to load. The main problem is that LOAD DATA command does not give any type of error or even warning; except if the file does not exist. Other than that nothing is shown. And nothing is imported.
I only succeeded to import very simple CSV from tmp folder
What I Suspected is that
MariaDB had been updated and in this new version there are some flags or configuration options that prohibit MariaDB from importing CSV files from other than tmp folder and
MariaDB would fail to load CSV because of some unknown problem, Maybe some special character (which I made sure nothing is in there).
There must be some option that makes MariaDB produce verbose error and warning log. Which I didn't know. Except for /var/log/mysql/error.log file. which does not contain any info containing failed to load CSV.
Any help would be appreciated.
Below is the first record of CSV. Actual CSV contains 49 fields and 1862 records (but the below sample contains only one record)
"S.No","Training Code","Intervention Type (NRM/Emp. Skill
Training)","Training Title/Activity","Start Month","Ending
Month","No. of Days Trainings","Start Date Training","End Date
Training","Name of Person","Father
Name","CNIC","Gender","Age","Education","Skill Level","CO Ref
#","COName","Village Name","Tehsil Name","District","Type of Farm
production","Total Land (if applicable)","Total Trees (if
applicable)","Sheeps/goats","Buffalo/Cows","Profession","Person's
Income Emp. Skill (Pre-Intervention)","Income from NRM (Pre-
Intervention)","HH Other Sources of Income","Total HH Income","Type
of Support provided","Tool Kit/Inputs Received or Not","Date of
Tool Kit receiving","Other intervention , like exposure market
trial, followup support, Advance Training etc","Production (Pre-
Intervention)","Production (Post-Intervention)","Change in
Production","Unit (kg, Maund,Liter, etc)","Income gain from
production (Post-Intervention)","Change in Income (NRM)","Income
gain by Employment -Emp.Skill (Post Intervention)","Change in
Income (Emp. Skill)","Outcome Trend","Employment/Self-
Employment/Other","Outcome Result","Remarks","Beneficiaries Contact
No.","Activity Location"
1,"AUP-0001","NRM","Dates Processing &
Packaging","Sep/2018","Sep/2018",2,"25/Sep/2018","26/Sep/2018",
"Some name","Barkat Gul",1234567891234,"Male",34,"Primary","Semi-
Skilled","AUP-NWD-073","MCO Haider Khel Welfare Committee","Haider
Khel","Mir Ali","North
Waziristan","Dates",,20,,,"Farming",,5000,"Farming",5000,"Training,
Packaging Boxes","Yes","10/10/2018",,180,320,140,"Kg",8000,3000,,,
"Positive","Self Employed","Value addition to the end product
(Packaging increase the Price per KG to 25%)",,,"Field NW"
BTW am NON-Technical :-O
While using Mariadb version 10.5.13-3.12.1 am able to import CSV files into Tables have set up.
Except with dates,
https://dba.stackexchange.com/questions/283966/tradedate-import-tinytext-how-to-show-date-format-yyyymmdd-of-20210111-or-2021?noredirect=1#comment555600_283966
There am still struggling to import text-format-dates AND to convert text-dates into the (YYYY-MM-DD) date format.
end.

pgAdmin argument formats can't be mixed

Background
Ubuntu 18.04
Postgresql 11.2 in Docker
pgAdmin4 3.5
Have a column named alias with type character varying[](64). Values have already been set on some rows before using psycopg2. Everything was alright then.
SQL = 'UPDATE public."mytable" SET alias=%s WHERE id=%s'
query = cursor.mogrify(SQL, ([values] , id))
cursor.execute(query)
conn.commit()
Recently, when I want to add more value using pgAdmin GUI as shown in the first figure, the error in the second figure happens, which says Argument formats can't be mixed:
Well, it turns out if insert the values using script such as psql or query tool in pgAdmin, the error does not happen, i.e., it only happens if using GUI of pgAdmin.
Example script:
UPDATE public."mytable" SET alias='{a, b}' WHERE id='myid'
But as the GUI is much easier to modify values, so really want to figure it out. Any idea?
It's a bug in pgAdmin 4.17.
It looks like it happens whenever you edit a char(n)[] or varchar(n)[] cell in a table (although char[] and varchar[] are unaffected).
It should be fixed in 4.18.
In the meantime, you can fix it yourself without much trouble. The pgAdmin4 backend is written in Python, so there's no need to rebuild anything; you can just dive in and change the source.
Find the directory where pgAdmin4 is installed, and open web/pgadmin/tools/sqleditor/__init__.py in an editor. Find the line:
typname = '%s(%s)[]'.format(
...and change it to:
typname = '{}({})[]'.format(
You'll need to restart the pgAdmin4 service for the change to take effect.
I wasn't able to get this working with the Character Varying data type but it worked once I converted the column data type to Text.

How to insert similar value into multiple locations of a psycopg2 query statement using dict? [duplicate]

I have a Python script that runs a pgSQL file through SQLAlchemy's connection.execute function. Here's the block of code in Python:
results = pg_conn.execute(sql_cmd, beg_date = datetime.date(2015,4,1), end_date = datetime.date(2015,4,30))
And here's one of the areas where the variable gets inputted in my SQL:
WHERE
( dv.date >= %(beg_date)s AND
dv.date <= %(end_date)s)
When I run this, I get a cryptic python error:
sqlalchemy.exc.ProgrammingError: (psycopg2.ProgrammingError) argument formats can't be mixed
…followed by a huge dump of the offending SQL query. I've run this exact code with the same variable convention before. Why isn't it working this time?
I encountered a similar issue as Nikhil. I have a query with LIKE clauses which worked until I modified it to include a bind variable, at which point I received the following error:
DatabaseError: Execution failed on sql '...': argument formats can't be mixed
The solution is not to give up on the LIKE clause. That would be pretty crazy if psycopg2 simply didn't permit LIKE clauses. Rather, we can escape the literal % with %%. For example, the following query:
SELECT *
FROM people
WHERE start_date > %(beg_date)s
AND name LIKE 'John%';
would need to be modified to:
SELECT *
FROM people
WHERE start_date > %(beg_date)s
AND name LIKE 'John%%';
More details in the pscopg2 docs: http://initd.org/psycopg/docs/usage.html#passing-parameters-to-sql-queries
As it turned out, I had used a SQL LIKE operator in the new SQL query, and the % operand was messing with Python's escaping capability. For instance:
dv.device LIKE 'iPhone%' or
dv.device LIKE '%Phone'
Another answer offered a way to un-escape and re-escape, which I felt would add unnecessary complexity to otherwise simple code. Instead, I used pgSQL's ability to handle regex to modify the SQL query itself. This changed the above portion of the query to:
dv.device ~ E'iPhone.*' or
dv.device ~ E'.*Phone$'
So for others: you may need to change your LIKE operators to regex '~' to get it to work. Just remember that it'll be WAY slower for large queries. (More info here.)
For me it's turn out I have % in sql comment
/* Any future change in the testing size will not require
a change here... even if we do a 100% test
*/
This works fine:
/* Any future change in the testing size will not require
a change here... even if we do a 100pct test
*/

Is there a way to use User Activity Variables to store SQL in Datastage

I am considering using RCP to run a generic datastage job, but the initial SQL changes each time it's called. Is there a process in which I can use a User Activity Variable to inject SQL from a text file or something so I can use the same datastage?
I know this Routine can read a file to look up parameters:
Routine = ‘ReadFile’
vFileName = Arg1
vArray = ”
vCounter = 0
OPENSEQ vFileName to vFileHandle
Else Call DSLogFatal(“Error opening file list: “:vFileName,Routine)
Loop
While READSEQ vLine FROM vFileHandle
vCounter = vCounter + 1
vArray = Fields(vLine,’,’,1)
vArray = Fields(vLine,’,’,2)
vArray = Fields(vLine,’,’,3)
Repeat
CLOSESEQ vFileHandle
Ans = vArray
Return Ans
But does that mean I just store the SQL in one Single line, even if it's long?
Thanks.
Why not just have the SQL within the routine itself and propagate parameters?
I have multiple queries within a single routine that does just that (one for source and one for AfterSQL statement)
This is an example and apologies I'm answering this on my mobile!
InputCol=Trim(pTableName)
If InputCol='Table1' then column='Day'
If InputCol='Table2' then column='Quarter, Day'
SQLCode = ' Select Year, Month, '
SQLCode := column:", Time, "
SQLCode := " to_date(current_timestamp, 'YYYY-MM-DD HH24:MI:SS'), "
SQLCode := \ "This is example text as output" \
SQLCode := "From DATE_TABLE"
crt SQLCode
I've used the multiple encapsulations in the example above, when passing out to a parameter make sure you check the ', " have either been escaped or are displaying correctly
Again, apologies for the quality but I hope it gives you some ideas!
You can give this a try
As you mentioned ,maintain the SQL in a file ( again , if the SQL keeps changing , you need to build a logic to automate populating the new SQL)
In the Datastage Sequencer , use a Execute Command Activity to open the SQL file
eg : cat /home/bk/query.sql
In the job activity which calls your generic job . you should map the command output of your EC activity to a job parameter
so if EC activity name is exec_query , then the job parameter will be
exec_query.$CommandOuput
When you run the sequence , your query will flow from
SQL file --> EC activity-->Parameter in Job activity-->DB stage( query parameterised)
Has you thinked to invoke a shellscript who connect to database and execute the SQL script from the sequential job? You could use sqlplus to connect in the shellscript and read the file with the SQL and use it. To execute the shellscript from the sequential job use a ExecCommand Stage (sh, ./, ...), it depends from the interpreter.
Other way to solve this, depends of the modification degree of your SQL; you could invoke a routine base who handle the parameters and invokes your parallel job.
The principal problem that I think you could have, is the limit of the long of the variable where you could store the parameter.
Tell me what option you choose and I could help you more.

SQLAlchemy, Psycopg2 and Postgresql COPY

It looks like Psycopg has a custom command for executing a COPY:
psycopg2 COPY using cursor.copy_from() freezes with large inputs
Is there a way to access this functionality from with SQLAlchemy?
accepted answer is correct but if you want more than just the EoghanM's comment to go on the following worked for me in COPYing a table out to CSV...
from sqlalchemy import sessionmaker, create_engine
eng = create_engine("postgresql://user:pwd#host:5432/db")
ses = sessionmaker(bind=engine)
dbcopy_f = open('/tmp/some_table_copy.csv','wb')
copy_sql = 'COPY some_table TO STDOUT WITH CSV HEADER'
fake_conn = eng.raw_connection()
fake_cur = fake_conn.cursor()
fake_cur.copy_expert(copy_sql, dbcopy_f)
The sessionmaker isn't necessary but if you're in the habit of creating the engine and the session at the same time to use raw_connection you'll need separate them (unless there is some way to access the engine through the session object that I don't know). The sql string provided to copy_expert is also not the only way to it, there is a basic copy_to function that you can use with subset of the parameters that you could past to a normal COPY TO query. Overall performance of the command seems fast for me, copying out a table of ~20000 rows.
http://initd.org/psycopg/docs/cursor.html#cursor.copy_to
http://docs.sqlalchemy.org/en/latest/core/connections.html#sqlalchemy.engine.Engine.raw_connection
If your engine is configured with a psycopg2 connection string (which is the default, so either "postgresql://..." or "postgresql+psycopg2://..."), you can create a psycopg2 cursor from an SQL Alchemy session using
cursor = session.connection().connection.cursor()
which you can use to execute
cursor.copy_from(...)
The cursor will be active in the same transaction as your session currently is. If a commit or rollback happens, any further use of the cursor with throw a psycopg2.InterfaceError, you would have to create a new one.
You can use:
def to_sql(engine, df, table, if_exists='fail', sep='\t', encoding='utf8'):
# Create Table
df[:0].to_sql(table, engine, if_exists=if_exists)
# Prepare data
output = cStringIO.StringIO()
df.to_csv(output, sep=sep, header=False, encoding=encoding)
output.seek(0)
# Insert data
connection = engine.raw_connection()
cursor = connection.cursor()
cursor.copy_from(output, table, sep=sep, null='')
connection.commit()
cursor.close()
I insert 200000 lines in 5 seconds instead of 4 minutes
It doesn't look like it.
You may have to just use psycopg2 to expose this functionality and forego the ORM capabilities. I guess I don't really see the benefit of ORM in such an operation anyway since it's a straight bulk insert and dealing with individual objects a la an ORM would not really make a whole lot of sense.
If you're starting from SQLAlchemy, you need to first get to the connection engine (also known by the property name bind on some SQLAlchemy objects):
engine = create_engine('postgresql+psycopg2://myuser:password#localhost/mydb')
# or
engine = session.engine
# or any other way you know to get to the engine
From the engine you can isolate a psycopg2 connection:
# get a psycopg2 connection
connection = engine.connect().connection
# get a cursor on that connection
cursor = connection.cursor()
Here are some templates for the COPY statement to use with cursor.copy_expert(), a more complete and flexible option than copy_from() or copy_to() as it is indicated here: https://www.psycopg.org/docs/cursor.html#cursor.copy_expert.
# to dump to a file
dump_to = """
COPY mytable
TO STDOUT
WITH (
FORMAT CSV,
DELIMITER ',',
HEADER
);
"""
# to copy from a file:
copy_from = """
COPY mytable
FROM STDIN
WITH (
FORMAT CSV,
DELIMITER ',',
HEADER
);
"""
Check out what the options above mean and others that may be of interest to your specific situation https://www.postgresql.org/docs/current/static/sql-copy.html.
IMPORTANT NOTE: The link to the documentation of cursor.copy_expert() indicates to use STDOUT to write out to a file and STDIN to copy from a file. But if you look at the syntax on the PostgreSQL manual, you'll notice that you can also specify the file to write to or from directly in the COPY statement. Don't do that, you're likely just wasting your time if you're not running as root (who runs Python as root during development?) Just do what's indicated in the psycopg2's docs and specify STDIN or STDOUT in your statement with cursor.copy_expert(), it should be fine.
# running the copy statement
with open('/path/to/your/data/file.csv') as f:
cursor.copy_expert(copy_from, file=f)
# don't forget to commit the changes.
connection.commit()
You don't need to drop down to psycopg2, use raw_connection nor a cursor.
Just execute the sql as usual, you can even use bind parameters with text():
engine.execute(text('''copy some_table from :csv
delimiter ',' csv'''
).execution_options(autocommit=True),
csv='/tmp/a.csv')
You can drop the execution_options(autocommit=True) if this PR will be accepted