Postgres-pgloader-transformation in columns - postgresql

Loading flat file to postgres table.I need to do few transformations while reading the file and load it.
Like
-->Check for characters, if it is present, default some value. Reg_Exp can be used in oracle. How the functions can be called in below syntax?
-->TO_DATE function from text format
-->Check for Null and defaulting some value
-->Trim functions
-->Only few columns from source file should be loaded
-->Defaulting values, say for instance, source file has only 3 columns. But we need to load 4 columns. One column should be defaulted with some value
LOAD CSV
FROM 'filename'
INTO postgresql://role#host:port/database_name?tablename
TARGET COLUMNS
(
alphanm,alphnumnn,nmrc,dte
)
WITH truncate,
skip header = 0,
fields optionally enclosed by '"',
fields escaped by double-quote,
fields terminated by '|',
batch rows = 100,
batch size = 1MB,
batch concurrency = 64
SET work_mem to '32 MB', maintenance_work_mem to '64 MB';
Kindly help me, how this can be accomplished used pgloader?
Thanks

Here's a self-contained test case for pgloader that reproduces your use-case, as best as I could understand it:
/*
Sorry pgloader version "3.3.2" compiled with SBCL 1.2.8-1.el7 Doing kind
of POC, to implement in real time work. Sample data from file:
raj|B|0.5|20170101|ABCD Need to load only first,second,third and fourth
column; Table has three column, third column should be defaulted with some
value. Table structure: A B C-numeric D-date E-(Need to add default value)
*/
LOAD CSV
FROM inline
(
alphanm,
alphnumnn,
nmrc,
dte [date format 'YYYYMMDD'],
other
)
INTO postgresql:///pgloader?so.raja
(
alphanm,
alphnumnn,
nmrc,
dte,
col text using "constant value"
)
WITH truncate,
fields optionally enclosed by '"',
fields escaped by double-quote,
fields terminated by '|'
SET work_mem to '12MB',
standard_conforming_strings to 'on'
BEFORE LOAD DO
$$ drop table if exists so.raja; $$,
$$ create table so.raja (
alphanm text,
alphnumnn text,
nmrc numeric,
dte date,
col text
);
$$;
raj|B|0.5|20170101|ABCD
Now here's the extract from running the pgloader command:
$ pgloader 41287414.load
2017-08-15T12:35:10.258000+02:00 LOG Main logs in '/private/tmp/pgloader/pgloader.log'
2017-08-15T12:35:10.261000+02:00 LOG Data errors in '/private/tmp/pgloader/'
2017-08-15T12:35:10.261000+02:00 LOG Parsing commands from file #P"/Users/dim/dev/temp/pgloader-issues/stackoverflow/41287414.load"
2017-08-15T12:35:10.422000+02:00 LOG report summary reset
table name read imported errors total time
----------------------- --------- --------- --------- --------------
fetch 0 0 0 0.007s
before load 2 2 0 0.016s
----------------------- --------- --------- --------- --------------
so.raja 1 1 0 0.019s
----------------------- --------- --------- --------- --------------
Files Processed 1 1 0 0.021s
COPY Threads Completion 2 2 0 0.038s
----------------------- --------- --------- --------- --------------
Total import time 1 1 0 0.426s
And here's the content of the target table when the command is done:
$ psql -q -d pgloader -c 'table so.raja'
alphanm │ alphnumnn │ nmrc │ dte │ col
═════════╪═══════════╪══════╪════════════╪════════════════
raj │ B │ 0.5 │ 2017-01-01 │ constant value
(1 row)

Related

formatting the data while exporting to a file?

I want the output as below format in the output file.
Expected OUTPUT:
Table CUSTOMERMSTR
---------------------------
custno cjd custtype
-------------- ----------- ------------
cust123 01-OCT-1900 1
cust123 08-SEP-1997 1
cust123 01-JAN-1996 1
3 rows
AS of NOW:
Table CUSTOMERMSTR
----------------------------
custno|to_char|custtype
cust123|01-OCT-1900|1
cust123|08-SEP-1997|1
cust123|01-JAN-1996|1
This is my expdata.psql file in the UNIX server.
This is not giving the expecting format.
Please help me with what I have to add to get the desired output in my .psql script.
--col custno format 9999999 heading "custno"
--col cjd format A25 heading "cjd"
--col custtype format 999999999 heading "custtype"
\t off
\a
\echo
\echo Table CUSTOMERMSTR
\echo ----------------------------
\echo
select custno,TO_CHAR(cjd,'DD-MON-YYYY'),custtype from CUSTOMERMSTR;

oracle external table with date column and skip header

I have a file,
ID,DNS,R_D,R_A
1,123456,2014/11/17,10
2,987654,2016/05/20,30
3,434343,2017/08/01,20
that I'm trying to load to oracle using External Tables. I have to skip the header row and also load the date column.
This is my query:
DECLARE
FILENAME VARCHAR2(400);
BEGIN
FILENAME := 'actual_data.txt';
EXECUTE IMMEDIATE 'CREATE TABLE EXT_TMP (
ID NUMBER(25),
DNS VARCHAR2(20),
R_D DATE,
R_A NUMBER(25)
)
ORGANIZATION EXTERNAL (
TYPE ORACLE_LOADER
DEFAULT DIRECTORY USER_DIR
ACCESS PARAMETERS (
RECORDS DELIMITED BY NEWLINE
FIELDS TERMINATED BY '',''
MISSING FIELD VALUES ARE NULL
SKIP 1
(
"ID",
"DNS",
"R_D" date "dd-mon-yy",
"RECHARGE_AMOUNT"
)
)
LOCATION (''' || FILENAME || ''')
)
PARALLEL 5
REJECT LIMIT UNLIMITED';
END;
I get following exception:
ERROR at line 1:
ORA-29913: error in executing ODCIEXTTABLEOPEN callout
ORA-29400: data cartridge error
KUP-00554: error encountered while parsing access parameters
KUP-01005: syntax error: found "skip": expecting one of: "column, exit, (,
reject"
KUP-01007: at line 4 column 5
ORA-06512: at "SYS.ORACLE_LOADER", line 19
I'm using sqlplus.
Could some oracle veterans please help me out and tell me what I'm doing wrong here? I'm very new to oracle.
You don't want to create any kind of tables (including external ones) in PL/SQL; not that it is impossible, but it is opposite of the best practices.
Have a look at my attempt, based on information you provided - works OK.
SQL> alter session set nls_date_format = 'dd.mm.yyyy';
Session altered.
SQL> create table ext_tmp
2 (
3 id number,
4 dns varchar2(20),
5 r_d date,
6 r_a number
7 )
8 organization external
9 (
10 type oracle_loader
11 default directory kcdba_dpdir
12 access parameters
13 (
14 records delimited by newline
15 skip 1
16 fields terminated by ',' lrtrim
17 missing field values are null
18 (
19 id,
20 dns,
21 r_d date 'yyyy/mm/dd',
22 r_a
23 )
24 )
25 location ('actual_data.txt')
26 )
27 parallel 5
28 reject limit unlimited;
Table created.
SQL> select * from ext_tmp;
ID DNS R_D R_A
---------- -------------------- ---------- ----------
1 123456 17.11.2014 10
2 987654 20.05.2016 30
3 434343 01.08.2017 20
SQL>
In my case skip 1 didn't work even with placing it between records delimited by newline and fields terminated by ',' lrtrim until I used load when. Now skip 1 works with the following access parameters:
access parameters (
records delimited by newline
load when (someField != BLANK)
skip 1
fields terminated by '','' lrtrim
missing field values are null
reject rows with all null fields
)

PostgreSQL full text search yielding weird results

I have a schema like this (simplified):
CREATE TABLE users (
id SERIAL PRIMARY KEY,
name NOT NULL
);
CREATE INDEX users_idx
ON users
USING GIN (to_tsvector('finnish', name));
But I'm getting completely invalid results with my queries:
# select name from users where to_tsvector('finnish', name) ## to_tsquery('lemmin');
name
------
(0 rows)
# select name from users where to_tsvector('finnish', name) ## to_tsquery('lemmink');
name
--------------------
Riitta ja Lemminki
Riitta ja Lemminki
(2 rows)
# select name from users where name ilike 'lemmink%';
name
----------------------
Lemminkäinen Matilda
Lemminkäinen Matias
Lemminkäinen Kyösti
Lemminkäinen Tuomas
(4 rows)
Another example:
# select name from users where to_tsvector('finnish', name) ## to_tsquery('partu');
name
----------
Partuuna
(1 row)
# select name from users where to_tsvector('finnish', name) ## to_tsquery('partur');
name
------------------------
Parturi-Kampaamo Raija
Parturi-Kampaamo Siema
(2 rows)
I was expecting to get the bottom two results on both queries...
Using the following version:
psql (9.4.6, server 9.5.2)
WARNING: psql major version 9.4, server major version 9.5.
Some psql features might not work.
I don't speak Finnish, but it seems expected result. FTS looks for lexemes, not for parts of words, Eg, do is not a lexemme for dog, but dog is for dogs:
t=# select to_tsvector('english', 'Dogs eats bone') ## to_tsquery('do');
NOTICE: text-search query contains only stop words or doesn't contain lexemes, ignored
?column?
----------
f
(1 row)
t=# select to_tsvector('english', 'Dogs eats bone') ## to_tsquery('dog');
?column?
----------
t
(1 row)
So I believe in Parturi last i is optional ending - right?..
Update:
from https://en.wiktionary.org/wiki/parturi :
partur[i], partur[eita] => lexeme will be partur

Importing bytea data into PostgreSQL by using COPY FROM stdin

I generated a (UTF-8) file by an external program for importing into PostgreSQL 9.6.1. Problem is the bytea field (PWHASH).
Snippet from this file (using TAB as delimiter)
COPY USERS (ID,CODE,PWHASH,EMAIL) FROM stdin;
7 test1 E'\\\\x657B954D27B4AC56FA997D24A5FF2563' test#amce.org
\.
When importing with
psql mydb myrole -f test.sql
Everything goes well.
However, if i query the result, the byte array is not 16 bytes, but 37 bytes:
select passwordhash,length(passwordhash) from users;
passwordhash | length
------------------------------------------------------------------------------+--------
\x45275c78363537423935344432374234414335364641393937443234413546463235363327 | 37
What is the correct syntax for this?
The format of the input file is wrong. It should be like this:
7 test1 \\x657B954D27B4AC56FA997D24A5FF2563 test#amce.org
I will have to "prepare" data I believe. Smth like here:
t=# insert into u select 'x657B954D27B4AC56FA997D24A5FF2563';
INSERT 0 1
Time: 5990.809 ms
t=# select b from u;
b
----------------------------------------------------------------------
\x783635374239353444323742344143353646413939374432344135464632353633
(1 row)
Time: 0.234 ms
t=# insert into u select decode('657B954D27B4AC56FA997D24A5FF2563','hex');
INSERT 0 1
Time: 62.767 ms
t=# select b from u;
b
----------------------------------------------------------------------
\x783635374239353444323742344143353646413939374432344135464632353633
\x657b954d27b4ac56fa997d24a5ff2563
(2 rows)
Time: 0.208 ms
So in your case you can:
create table t as select ID,CODE,PWHASH::text,EMAIL from users where false;
COPY t (ID,CODE,PWHASH,EMAIL) FROM stdin;
insert into users select ID,CODE,decode(substr(PWHASH,4),'hex'),EMAIL from t;

Postgres copy data & evaluate expression

Is it possible a copy command to evaluate expressions upon insertion?
For example consider the following table
create table test1 ( a int, b int)
and we have a file to import
5 , case when b = 1 then 100 else 101
25 , case when b = 1 then 100 else 101
145, case when b = 1 then 100 else 101
The following command fill fail
COPY test1 FROM 'file' USING DELIMITERS ',';
with the following error
ERROR: invalid input syntax for integer
which means that it can not evaluate the case expression. Is there any workaround?
The command COPY only copies data (obviously) and does not evaluate SQL code, as explained in the documentation: http://www.postgresql.org/docs/9.3/static/sql-copy.html
As far as I know there is not workarounds to making COPY evaluating sql code.
You must preprocess your csv file and convert it to a standard sql script with INSERT statements in this form:
INSERT INTO your_table VALUES(145, CASE WHEN 1 = 1 THEN 100 ELSE 101 END);
Then execute the sql script with the client you are using. I.e. with psql you would use the -f option:
psql -d your_database -f your_sql_script