Discarding rows containing empty string in CSV from uploading through SQL Loader control file - oracle12c

I am trying to upload a CSV which may/may not contain empty value for a column in a row.
I want to discard the rows that contain empty value from uploading to the DB through SQL Loader.
How can this be handled in ctrl file:
I have tried below conditions in the ctl file :
when String_Value is not null
when String_Value <> ''
but the rows are still getting inserted

This worked for me using either '<>' or '!='. I suspect the order of the clauses was incorrect for you. Note colc (also the third column in the data file) matches the column name in the table.
load data
infile 'c:\temp\x_test.dat'
TRUNCATE
into table x_test
when colc <> ''
FIELDS TERMINATED BY ','
OPTIONALLY ENCLOSED BY '"'
TRAILING NULLCOLS
(
cola char,
colb char,
colc char,
cold integer external
)

Related

value too long for type character varying(512)--Why can't import the data?

The maximum size of limited character types (e.g. varchar(n)) in Postgres is 10485760.
description on max length of postgresql's varchar
Please download the file for testing and extract it in /tmp/2019q4, we only use pre.txt to import data with.
sample data
Enter you psql and create a database:
postgres=# create database edgar;
postgres=# \c edgar;
Create table according to the webpage:
fields in pre table definations
edgar=# create table pre(
id serial ,
adsh varchar(20),
report numeric(6,0),
line numeric(6,0),
stmt varchar(2),
inpth boolean,
rfile char(1),
tag varchar(256),
version varchar(20),
plabel varchar(512),
negating boolean
);
CREATE TABLE
Try to import data:
edgar=# \copy pre(adsh,report,line,stmt,inpth,rfile,tag,version,plabel,negating) from '/tmp/2019q4/pre.txt' with delimiter E'\t' csv header;
We analyse the error info:
ERROR: value too long for type character varying(512)
CONTEXT: COPY pre, line 1005798, column plabel: "LIABILITIES AND STOCKHOLDERS EQUITY 0
0001493152-19-017173 2 11 BS 0 H LiabilitiesAndStockholdersEqu..."
Time: 1481.566 ms (00:01.482)
1.What size i set in the field is just 512 ,more less than 10485760.
2.the content in line 1005798 is not same as in error info:
0001654954-19-012748 6 20 EQ 0 H ReclassificationAdjustmentRelatingToAvailableforsaleSecuritiesNetOfTaxEffect 0001654954-19-012748 Reclassification adjustment relating to available-for-sale securities, net of tax effect" 0
Now i drop the previous table ,convert the plabel field as text,re-create it:
edgar=# drop table pre;
DROP TABLE
Time: 22.763 ms
edgar=# create table pre(
id serial ,
adsh varchar(20),
report numeric(6,0),
line numeric(6,0),
stmt varchar(2),
inpth boolean,
rfile char(1),
tag varchar(256),
version varchar(20),
plabel text,
negating boolean
);
CREATE TABLE
Time: 81.895 ms
Import the same data with same copy command:
edgar=# \copy pre(adsh,report,line,stmt,inpth,rfile,tag,version,plabel,negating) from '/tmp/2019q4/pre.txt' with delimiter E'\t' csv header;
COPY 275079
Time: 2964.898 ms (00:02.965)
edgar=#
No error info in psql console,let me check the raw data '/tmp/2019q4/pre.txt' ,which it contain 1043000 lines.
wc -l /tmp/2019q4/pre.txt
1043000 /tmp/2019q4/pre.txt
There are 1043000 lines,how much lines imported then?
edgar=# select count(*) from pre;
count
--------
275079
(1 row)
Why so less data imported without error info ?
The sample data you provided is obviously not the data you are really loading. It does still show the same error, but of course the line numbers and markers are different.
That file occasionally has double quote marks where there should be single quote marks (apostrophes). Because you are using CSV mode, these stray double quotes will start multi-line strings, which span all the way until the next stray double quote mark. That is why you have fewer rows of data than lines of input, because some of the data values are giant multiline strings.
Since your data clearly isn't CSV, you probably shouldn't be using \copy in CSV format. It loads fine in text format as long as you specify "header", although that option didn't become available in text format until v15. For versions before that, you could manually remove the header line, or use PROGRAM to skip the header like FROM PROGRAM 'tail +2 /tmp/pre.txt' Alternatively, you could keep using CSV format, but choose a different quote character, one that never shows up in your data such as with (delimiter E'\t', format csv, header, quote E'\b')

copy columns of a csv file into postgresql table

I have a CSV file with 12 - 11 - 10 or 5 columns.
After creating a PostgreSQL table with 12 columns, I want to copy this CSV into the table.
I use this request:
COPY absence(champ1, champ2, num_agent, nom_prenom_agent, code_gestion, code_service, calendrier_agent, date_absence, code_absence, heure_absence, minute_absence, periode_absence)
FROM 'C:\temp\absence\absence.csv'
DELIMITER '\'
CSV
My CSV file contains 80000 line.
Ex :
20\05\ 191\MARKEY CLAUDIE\GA0\51110\39H00\21/02/2020\1471\03\54\Matin
21\05\ 191\MARKEY CLAUDIE\GA0\51110\39H00\\8130\7H48\Formation avec repas\
30\05\ 191\MARKEY CLAUDIE\GA0\51430\39H00\\167H42\
22\9993\Temps de déplacement\98\37
when I execute the request, I get a message indicating that there is missing data for the lines with less than 12 fields.
Is there a trick?
copy is extremely fast and efficient, but less flexible because of that. Specifically it can't cope with files that have a different number of "columns" for each line.
You can either use a different import tool, or if you want to stick to built-in tools, copy the file into staging table that only has a single column, then use Postgres string functions to split the lines into the columns:
create unlogged table absence_import
(
line text
);
\COPY absence_import(line) FROM 'C:\temp\absence\absence.csv' DELIMITER E'\b' CSV
E'\b' is the "backspace" character which can't really appear in a text file, so no column splitting is taking place.
Once you have imported the file, you can split the line using string_to_array() and the insert that into the real table:
insert into absence(champ1, champ2, num_agent, nom_prenom_agent, code_gestion, code_service, calendrier_agent, date_absence, code_absence, heure_absence, minute_absence, periode_absence)
select line[1], line[2], line[3], .....
from (
select string_to_array(line, '\') as line
from absence_import
) t;
If there are non-text columns, might need to cast the values to the target data type explicitly: e.g. line[3]::int.
You can add additional expressions to deal with missing columns, e.g. something like: coalesce(line[10], 'default value')

How to remove commas from number

When I run this query in postgres :
select id, name from schema.table;
I get this output
In this the id number comes with commas. Because of this, when i download this file to .csv format, I have format issues in the .csv file. How to handle this ?
You should't have any coma in your integer id values since postgres does not store neither comas nor dots as numeric separators.
However, you may use the function ltrim to remove comas from string fields:
select ltrim(cloumn1, ','), column2 from schema.table
https://www.postgresql.org/docs/9.1/static/functions-string.html

How to store string spaces as null in numeric column

I want to get records from my local txt file to postgresql table.
I have created following table.
create table player_info
(
Name varchar(20),
City varchar(30),
State varchar(30),
DateOfTour date,
pay numeric(5),
flag char
)
And, my local txt file contains following data.
John|Mumbai| |20170203|55555|Y
David|Mumbai| |20170305| |N
Romcy|Mumbai| |20170405|55555|N
Gotry|Mumbai| |20170708| |Y
I am just executing this,
copy player_info (Name,
City,
State,
DateOfTour,
pay_id,
flag)
from local 'D:\sample_player_info.txt'
delimiter '|' null as ''
exceptions 'D:\Logs\player_info'
What I want is,
For my numeric column, If 3 spaces are there,
then I have to insert NULL as pay else whatever 5 digits numeric number.
pay is a column in my table whose datatype is numeric.
Is this correct or possible to do this ?
You cannot store strings in a numeric column, at all. 3 spaces is a string, so it cannot be stored in the column pay as that is defined as numeric.
A common approach to this conundrum is to create a staging table which uses less precise data types in the column definitions. Import the source data into the staging table. Then process that data so that it can be reliably added to the final table. e.g. in the staging table set a column called pay_str to NULL where pay_str = ' ' (or perhaps LIKE ' %')

"extra data after last expected column" while trying to import a csv file into postgresql

I try to copy the content of a CSV file into my postgresql db and I get this error "extra data after last expected column".
The content of my CSV is
agency_id,agency_name,agency_url,agency_timezone,agency_lang,agency_phone
100,RATP (100),http://www.ratp.fr/,CET,,
and my postgresql command is
COPY agency (agency_name, agency_url, agency_timezone) FROM 'myFile.txt' CSV HEADER DELIMITER ',';
Here is my table
CREATE TABLE agency (
agency_id character varying,
agency_name character varying NOT NULL,
agency_url character varying NOT NULL,
agency_timezone character varying NOT NULL,
agency_lang character varying,
agency_phone character varying,
agency_fare_url character varying
);
Column | Type | Modifiers
-----------------+-------------------+-----------
agency_id | character varying |
agency_name | character varying | not null
agency_url | character varying | not null
agency_timezone | character varying | not null
agency_lang | character varying |
agency_phone | character varying |
agency_fare_url | character varying |
Now you have 7 fields.
You need to map those 6 fields from the CSV into 6 fields into the table.
You cannot map only 3 fields from csv when you have it 6 like you do in:
\COPY agency (agency_name, agency_url, agency_timezone) FROM 'myFile.txt' CSV HEADER DELIMITER ',';
All fields from the csv file need to to be mapped in the copy from command.
And since you defined csv , delimiter is default, you don't need to put it.
Not sure this counts as an answer, but I just hit this with a bunch of CSV files, and found that simply opening them in Excel and re-saving them with no changes made the error go away. IOTW there is possibly some incorrect formatting in the source file that Excel is able to clean up automatically.
This error also occurs if you have same number of columns in both postgres table and csv file, even if you have specified delimiter ',' in \copy command. You also need to specify CSV.
In my case, one of my columns contained comma separated data and I execute:
db=# \copy table1 FROM '/root/db_scripts/input_csv.csv' delimiter ','
ERROR: invalid input syntax for integer: "id"
CONTEXT: COPY quiz_quiz, line 1, column id: "id"
It worked after adding CSV:
db=# \copy table1 FROM '/root/db_scripts/input_csv.csv' delimiter ',' CSV
COPY 47871
For future visitors, when I had this problem it was because I was using a loop that wrote to the same io.StringsIO() variable before committing the query to the database (context).
If you're encountering this problem, make sure your code is like this:
for tableName in tableNames:
output = io.StringsIO()
...
output.seek(0)
cur.copy_expert(f"COPY {tableName} FROM STDIN", output)
conn.commit()
And not like this:
output = io.StringsIO()
for tableName in tableNames:
...
output.seek(0)
cur.copy_expert(f"COPY {tableName} FROM STDIN", output)
conn.commit()
I tried your example and it works fine but ....
your command from the psql command line is missing \
database=# \COPY agency FROM 'myFile.txt' CSV HEADER DELIMITER ',';
And next time please include DDL
I created DDL from the csv headers