Improve SqlLdr performance for 120 Million records Upload from Csv

Improve SqlLdr performance for 120 Million records Upload from Csv - oracle-sqldeveloper

It is taking almost 10hrs to finish loading into tables.
Here is my ctl file.
OPTIONS (
skip =1,
ERRORS=4000000,
READSIZE=5000000,
BINDSIZE=8000000,
direct=true
unrecoverable
)
load data
INFILE 'weeklydata1108.csv'
INSERT INTO TABLE t_location_data
FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"' TRAILING NULLCOLS
(f_start_ip,
f_end_ip,
f_country,
f_count_code ,
f_reg,
f_stat,
f_city,
f_p_code ,
f_area,
f_lat,
f_long,
f_anon_stat ,
f_pro_detect date "YYYY-MM-DD",
f_date "SYSDATE")
And sqlldr command for running it is
sqlldr username#\"\(DESCRIPTION=\(ADDRESS=\(HOST=**mydbip***\)\(PROTOCOL=TCP\)\(PORT=1521\)\)\(CONNECT_DATA=\(SID=Real\)\)\)\"/geolocation control='myload.ctl' log='insert.log' bad='insert.bad'

Related

Snowfalke csv copy failures

I'm trying to load csv data into snowflake using snowflake copy. CSV column separator is pipe character (|). One of the columns in the input csv has | character in data which is escaped by backslash . The datatype in the target snowflake database is VARCHAR(n). With the addition of the escape character(\) in the data the data size exceeds the target column definition size which causes copy to fail.
Is there a way I can remove the escape character (\) from the data before loaded into the table?
copy into table_name from 's3path.csv.gz' file_format=(type = 'csv', field_optionally_enclosed_by ='"' escape_unenclosed_field = NONE empty_field_as_null = true escape ='\' field_delimter ='|' skip_header=1 NULL_IF = ('\\N', '', '\N')) ON_ERROR ='ABORT_STATEMENT' PURGE = FALSE
Sample data that causes the failure: "Data 50k | $200K "

Snowflake null values quoted in CSV breaks PostgreSQL unload

I am trying to shift data from Snowflake to Postgresql and to do so I first load it into s3 in CSV format. In the table, comas in text could appear, I therefore use FIELD_OPTIONALLY_ENCLOSED_BY snowflake unloading option to quote the content of the problematic cells. However when this happen + null values, I can't manage to have a valid CSV for PostgreSQL.
I created a simple table for you to understand the issue. Here it is :
CREATE OR REPLACE TABLE PUBLIC.TEST(
TEXT_FIELD VARCHAR(),
NUMERIC_FIELD INT
);
INSERT INTO PUBLIC.TEST VALUES
('A', 1),
(NULL, 2),
('B', NULL),
(NULL, NULL),
('Hello, world', NULL)
;
COPY INTO #STAGE/test
FROM PUBLIC.TEST
FILE_FORMAT = (
COMPRESSION = NONE,
TYPE = CSV,
FIELD_OPTIONALLY_ENCLOSED_BY = '"'
NULL_IF = ''
)
OVERWRITE = TRUE;
Snowflake will from that create the following CSV
"A",1
"",2
"B",""
"",""
"Hello, world",""
But after that, it is for me impossible to copy this CSV inside a PostgreSQL Table as it is.
Even thought from PostgreSQL documentation we have next to NULL option :
Specifies the string that represents a null value. The default is \N (backslash-N) in text format, and an unquoted empty string in CSV format.
Not setting COPY Option in PostgreSQL COPY INTO will result in a failed unloading. Indeed it won't work as we also have to specify the quote used using QUOTE. Here it'll be QUOTE '"'
Therefore during POSTGRESQL unloading, using :
FORMAT csv, HEADER false, QUOTE '"' will give :
DataError: invalid input syntax for integer: "" CONTEXT: COPY test, line 3, column numeric_field: ""
FORMAT csv, HEADER false, NULL '""', QUOTE '"' will give :
NotSupportedError: CSV quote character must not appear in the NULL specification
FYI, To test the unloading in s3 I will use this command in PostgreSQL:
CREATE IF NOT EXISTS TABLE PUBLIC.TEST(
TEXT_FIELD VARCHAR(),
NUMERIC_FIELD INT
);
CREATE EXTENSION IF NOT EXISTS aws_s3 CASCADE;
SELECT aws_s3.table_import_from_s3(
'PUBLIC.TEST',
'',
'(FORMAT csv, HEADER false, NULL ''""'', QUOTE ''"'')',
'bucket',
'test_0_0_0.csv',
'aws_region'
)
Thanks a lot for any ideas on what I could do to make it happen? I would love to find a solution that don't requires modifying the csv between snowflake and postgres. I think it is an issue more on the Snowflake side as it don't really make sense to quote null values. But PostgreSQL is not helping either.

When you set the NULL_IF value to '', you are actually telling Snowflake to convert NULLS to a BLANK, which then get quoted. When you are copying out of Snowflake, the copy options are "backwards" in a sense and NULL_IF acts more like an IFNULL.
This is the code that I'd use on the Snowflake side, which will result in an unquoted empty string in your CSV file:
FILE_FORMAT = (
COMPRESSION = NONE,
TYPE = CSV,
FIELD_OPTIONALLY_ENCLOSED_BY = '"'
NULL_IF = ()
)

How to ignore errors, but not skip lines in COPY command redshift?

I have the below COPY statement. It skips lines for maxerror. Is there any way to COPY data over to redshift, forcing any errors into the column regardless of type? I dont want to lose information.
sql_prelim = """copy table1 from 's3://dwreplicatelanding/file.csv.gz'
access_key_id 'xxxx'
secret_access_key 'xxxx'
DELIMITER '\t' timeformat 'auto'
GZIP IGNOREHEADER 1
trimblanks
CSV
BLANKSASNULL
maxerror as 100000
"""
The error I want to skip is below, but ideally I want to skip all errors and maintain data:
1207- Invalid digit, Value 'N', Pos 0, Type: Decimal

oracle external table with date column and skip header

I have a file,
ID,DNS,R_D,R_A
1,123456,2014/11/17,10
2,987654,2016/05/20,30
3,434343,2017/08/01,20
that I'm trying to load to oracle using External Tables. I have to skip the header row and also load the date column.
This is my query:
DECLARE
FILENAME VARCHAR2(400);
BEGIN
FILENAME := 'actual_data.txt';
EXECUTE IMMEDIATE 'CREATE TABLE EXT_TMP (
ID NUMBER(25),
DNS VARCHAR2(20),
R_D DATE,
R_A NUMBER(25)
)
ORGANIZATION EXTERNAL (
TYPE ORACLE_LOADER
DEFAULT DIRECTORY USER_DIR
ACCESS PARAMETERS (
RECORDS DELIMITED BY NEWLINE
FIELDS TERMINATED BY '',''
MISSING FIELD VALUES ARE NULL
SKIP 1
(
"ID",
"DNS",
"R_D" date "dd-mon-yy",
"RECHARGE_AMOUNT"
)
)
LOCATION (''' || FILENAME || ''')
)
PARALLEL 5
REJECT LIMIT UNLIMITED';
END;
I get following exception:
ERROR at line 1:
ORA-29913: error in executing ODCIEXTTABLEOPEN callout
ORA-29400: data cartridge error
KUP-00554: error encountered while parsing access parameters
KUP-01005: syntax error: found "skip": expecting one of: "column, exit, (,
reject"
KUP-01007: at line 4 column 5
ORA-06512: at "SYS.ORACLE_LOADER", line 19
I'm using sqlplus.
Could some oracle veterans please help me out and tell me what I'm doing wrong here? I'm very new to oracle.

You don't want to create any kind of tables (including external ones) in PL/SQL; not that it is impossible, but it is opposite of the best practices.
Have a look at my attempt, based on information you provided - works OK.
SQL> alter session set nls_date_format = 'dd.mm.yyyy';
Session altered.
SQL> create table ext_tmp
2 (
3 id number,
4 dns varchar2(20),
5 r_d date,
6 r_a number
7 )
8 organization external
9 (
10 type oracle_loader
11 default directory kcdba_dpdir
12 access parameters
13 (
14 records delimited by newline
15 skip 1
16 fields terminated by ',' lrtrim
17 missing field values are null
18 (
19 id,
20 dns,
21 r_d date 'yyyy/mm/dd',
22 r_a
23 )
24 )
25 location ('actual_data.txt')
26 )
27 parallel 5
28 reject limit unlimited;
Table created.
SQL> select * from ext_tmp;
ID DNS R_D R_A
---------- -------------------- ---------- ----------
1 123456 17.11.2014 10
2 987654 20.05.2016 30
3 434343 01.08.2017 20
SQL>

In my case skip 1 didn't work even with placing it between records delimited by newline and fields terminated by ',' lrtrim until I used load when. Now skip 1 works with the following access parameters:
access parameters (
records delimited by newline
load when (someField != BLANK)
skip 1
fields terminated by '','' lrtrim
missing field values are null
reject rows with all null fields
)

Import from CSV into SQL Server 2005

The first day of month I will import from test.csv file information into my SQL Server 2005 database. I have this information in test.csv in the one column:
Receiver_number|Card_Number|Lname|purchase_date|tr_verif|station_name|prod_grp|product|unit_price|vol|amount|discount|sum_no_vat|vat|sum_with_vat|country|currency|milage|origin_amount|station_id
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
1003680708|704487347252000033|3|2014-02-02T19:00:00|005375|IX Fortas|01|95 Miles|3.574|109.82|474.88|-35.78|392.46|82.42|510.66|LT|LTL||510.66|65059
1003680708|704487347252000034|3|2014-02-02T19:00:00|005375|IX Fortas|24|Cola|2.893|1.00|3.50|0|2.89|.61|3.50|LT|LTL||3.50|65059
Every value is separated by a | symbol.
How I can get it in SQL query?
I have used SSIS, I have tried convert to excel and changed regional settings in the computer but I could get this result in SQL Server 2005....

Use bulk insert like below
BULK INSERT <your_table>
FROM '<your_path>\test.csv'
WITH
(
FIRSTROW = 2,
FIELDTERMINATOR = '|',
ROWTERMINATOR = '\n',
ERRORFILE = 'C:\import_error.csv',
TABLOCK
)

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Improve SqlLdr performance for 120 Million records Upload from Csv - oracle-sqldeveloper

Related

Snowfalke csv copy failures

Snowflake null values quoted in CSV breaks PostgreSQL unload

How to ignore errors, but not skip lines in COPY command redshift?

oracle external table with date column and skip header

Import from CSV into SQL Server 2005

Categories

Resources