Load contents from a CSV file into a PostgreSQL table - postgresql

Below is a description of the procedure I went through to try and load data from a file into a PostgreSQL 8.0 database running on a Linux RedHat 7.2 host.
Now, my issue is that the FOR EVERY ROW trigger is getting called and the procedure is executing.
What I'd like it to do, however, is have it check the appropriate row of my table once I have given in the filename and decide based on the contents of the record whether to do a DUMP BULK DATA or a DUMP WHOLE CSV FILE only once (on the trigger).
Please help me solve this issue...
My logfile.tmp is as follows:
27/Apr/2013:17:03:42 +0530#192.168.1.3#16#0##$http://localhost/images/
banner-left.jpg##$10.1ff.ff.ff#-#Y#-
27/Apr/2013:17:03:42 +0530#192.168.1.3#16#0##$http://localhost/images/
banner-left.jpg##$10.ff.ff.2ff05#-#Y#-
The COPY command I am using:
/usr/local/pgsql/bin/psql localhost -d d1 -U u1 -tc "COPY tblaccesslog ( accesstime, clientip, username, request,bytes, urlpath, url, contenttype, issite, webcatname) FROM 'logfile.tmp' WITH DELIMITER AS '#';" >> /tmp/parselog.log 2>&1
The trigger (insert_accesslog_trigger) in question:
insert_accesslog_trigger BEFORE INSERT ON tblaccesslog FOR EACH ROW EXECUTE PROCEDURE accesslog_insert_trigger()
and finally the trigger function (accesslog_insert_trigger()) being used:
accesslog_insert_trigger()
DECLARE
tablemaxtuples NUMERIC(10);
tableno NUMERIC(10);
newtable TEXT;
query TEXT;
tablecount NUMERIC(10);
min_limit NUMERIC(10);
max_limit NUMERIC(10);
BEGIN
tablemaxtuples := 100000;
tableno := ( NEW.id - ( NEW.id % tablemaxtuples ) ) / tablemaxtuples +1;
newtable := 'tblaccesslog'||to_char(CURRENT_DATE,'YYYYMMDD')||'_child_'||tableno;
SELECT trim(count(tablename)) INTO tablecount FROM pg_tables WHERE tablename=newtable ;
IF tablecount = 0
THEN
min_limit := (tableno-1)*tablemaxtuples;
max_limit := min_limit + tablemaxtuples;
query := 'CREATE TABLE '||newtable||'( PRIMARY KEY (id),CHECK ( id >= '||min_limit||' AND id <'||max_limit||' ) ) INHERITS (tblaccesslog)';
EXECUTE query;
END IF;
query := 'INSERT INTO '|| newtable ||' ( id, username, clientip, url, accesstime, requestbytes, contenttype, issite, urlpath, webcatname ) VALUES ('||NEW.id||','''||NEW.username||''','''||NEW.clientip||''','''||NEW.url||''','''||NEW.accesstime||''','''||NEW.requestbytes||''','''||NEW.contenttype||''','''||NEW.issite||''','''|| replace(NEW.urlpath,'\'','') ||''','''||NEW.webcatname||''')';
EXECUTE query;
RETURN NULL;
END;

The PostgreSQL documentation overview of triggers makes clear that there is no type of trigger that suits your requirements: a FOR EACH ROW trigger will, as its name says, be executed once for each row, and as the manual page states "Statement-level triggers do not currently have any way to examine the individual row(s) modified by the statement."
However, what you can do instead is put your actual COPY command inside a function. The function could COPY TO a temporary table and then perform the appropriate steps to determine where it should go from there.
Then your copy command (which I'm guessing is in a cron job or similar) would just run SELECT bulk_insert_access_log(); rather than the long line currently listed.

Related

postgresql unique non-sequential id for url

I've looked a few methods of creating alphanumeric IDs on Stackoverflow, but they all had their weaknesses, some did not check for collision and others used sequences which are not a good option when using logical replication.
After some Googling I found this website that has the following script which checks for collisions and does not use sequences. However this is done as a trigger when a row is inserted into the table.
-- Create a trigger function that takes no arguments.
-- Trigger functions automatically have OLD, NEW records
-- and TG_TABLE_NAME as well as others.
CREATE OR REPLACE FUNCTION unique_short_id()
RETURNS TRIGGER AS $$
-- Declare the variables we'll be using.
DECLARE
key TEXT;
qry TEXT;
found TEXT;
BEGIN
-- generate the first part of a query as a string with safely
-- escaped table name, using || to concat the parts
qry := 'SELECT id FROM ' || quote_ident(TG_TABLE_NAME) || ' WHERE id=';
-- This loop will probably only run once per call until we've generated
-- millions of ids.
LOOP
-- Generate our string bytes and re-encode as a base64 string.
key := encode(gen_random_bytes(6), 'base64');
-- Base64 encoding contains 2 URL unsafe characters by default.
-- The URL-safe version has these replacements.
key := replace(key, '/', '_'); -- url safe replacement
key := replace(key, '+', '-'); -- url safe replacement
-- Concat the generated key (safely quoted) with the generated query
-- and run it.
-- SELECT id FROM "test" WHERE id='blahblah' INTO found
-- Now "found" will be the duplicated id or NULL.
EXECUTE qry || quote_literal(key) INTO found;
-- Check to see if found is NULL.
-- If we checked to see if found = NULL it would always be FALSE
-- because (NULL = NULL) is always FALSE.
IF found IS NULL THEN
-- If we didn't find a collision then leave the LOOP.
EXIT;
END IF;
-- We haven't EXITed yet, so return to the top of the LOOP
-- and try again.
END LOOP;
-- NEW and OLD are available in TRIGGER PROCEDURES.
-- NEW is the mutated row that will actually be INSERTed.
-- We're replacing id, regardless of what it was before
-- with our key variable.
NEW.id = key;
-- The RECORD returned here is what will actually be INSERTed,
-- or what the next trigger will get if there is one.
RETURN NEW;
END;
$$ language 'plpgsql';
I have have a table which already contains data, I have added a new column called pid would it be possible to modify this and use the function call as default so all my prior data gets a short id?
Suppose you have a table test:
DROP TABLE IF EXISTS test;
CREATE TABLE test (foo text, bar int);
INSERT INTO test (foo, bar) VALUES ('A', 1), ('B', 2);
You could add an id column to it:
ALTER TABLE test ADD COLUMN id text;
and attach the trigger:
DROP TRIGGER IF EXISTS unique_short_id_on_test ON test;
CREATE TRIGGER unique_short_id_on_test
BEFORE INSERT ON test
FOR EACH ROW EXECUTE PROCEDURE unique_short_id();
Now make a temporary table, temp, with the same structure as test (but with no data):
DROP TABLE IF EXISTS temp;
CREATE TABLE temp (LIKE test INCLUDING ALL);
CREATE TRIGGER unique_short_id_on_temp
BEFORE INSERT ON temp
FOR EACH ROW EXECUTE PROCEDURE unique_short_id();
Pouring test into temp:
INSERT INTO temp (foo, bar)
SELECT foo, bar
FROM test
RETURNING *
yields something like:
| foo | bar | id |
|------------+-----+----------|
| A | 1 | 9yt9XQwm |
| B | 2 | LCeiA-P8 |
If other tables have foreign key references on the test table or if test must remain online,
it may not be possible to drop test and rename temp to test.
Instead, it is safer to update test with the ids from temp.
Assuming test has a primary key (for concreteness, let's call it, testid), then
you could update test with the ids from temp using:
UPDATE test
SET id = temp.id
FROM temp
WHERE test.testid = temp.testid;
Then you could drop the temp table:
DROP TABLE temp;

Postgres SQL - error: must be superuser to copy to or from a file

I have copied function(from one of the webportal and modified accordingly) to copy data from csv file to table.
create or replace function public.load_csv_file
(
target_table text,
csv_path text,
col_count integer
)
returns void as $$
declare
iter integer; -- dummy integer to iterate columns with
col text; -- variable to keep the column name at each iteration
col_first text; -- first column name, e.g., top left corner on a csv file or spreadsheet
begin
set schema 'public';
create table insert_from_csv ();
-- add just enough number of columns
for iter in 1..col_count
loop
execute format('alter table insert_from_csv add column col_%s text;', iter);
end loop;
-- copy the data from csv file
execute format('copy insert_from_csv from %L with delimiter '','' quote ''"'' csv ', csv_path);
iter := 1;
col_first := (select col_1 from insert_from_csv limit 1);
-- update the column names based on the first row which has the column names
for col in execute format('select unnest(string_to_array(trim(temp_table::text, ''()''), '','')) from temp_table where col_1 = %L', col_first)
loop
execute format('alter table insert_from_csv rename column col_%s to %s', iter, col);
iter := iter + 1;
end loop;
-- delete the columns row
execute format('delete from insert_from_csv where %s = %L', col_first, col_first);
-- change the temp table name to the name given as parameter, if not blank
if length(target_table) > 0 then
execute format('alter table insert_from_csv rename to %I', target_table);
end if;
end;
$$ language plpgsql;
And
passing parameters as
select load_csv_file('Customer','C:\Insert_postgres.csv' ,4)
but getting error message
ERROR: must be superuser to COPY to or from a file
Hint: Anyone can COPY to stdout or from stdin. psql's \copy command also works for anyone.
The idea is, i will create a automated test, and if i want to test on different instance then test should automatically create function and copy data from csv file.
Is there any work around to copy data without superuser?
Looks Insert_postgres.csv is in C drive which usually does not have Read/Write permission. Move the file to your directory where Read/Write given atleast to some groups or everyone.
Hope it will resolve the issue
Try running the following command in terminal. This allows us to use COPY command, since you can't use COPY command without being a super user.
ALTER USER <username> WITH SUPERUSER;

PostgreSQL trigger not firing on update

I am using PostgreSQL 9.3.4
There are two linked tables updated remotely: table1 and table2
table2 uses foreign key dependance from table1.
Tables are updated in the following manner on remote server:
Insert into table1 returning id;
Insert into table2 using previous id
Query is sent ( 1 transaction with 2 insert statements)
I need to duplicate new rows to remote db using dblink so I created two 'before update' triggers for table1 and table2;
The problem is that only table2 trigger is firing; the first isn't ( from remote update;
doing test query from pgadmin under the same user, I get both triggers fired OK )
I assumed it is because the update is being processed in 1 transaction/query on remote server. So I tried to process both tables in second trigger, but still no luck - only table2 is processed.
What could be the reason ?
Thanks
P.S.
Trigger codes
Version 1
PROCEDURE fn_replicate_data:
DECLARE
BEGIN
PERFORM DBLINK_EXEC('myconn','INSERT INTO table1(dataid,sessionid,uid) VALUES('||new.dataid||','||new.sessionid||',
'||new.uid||') ');
RETURN new;
END;
PROCEDURE fn_replicate_data2:
DECLARE
BEGIN
PERFORM DBLINK_EXEC('myconn','INSERT INTO table2(dataid,data) VALUES('||new.dataid||','''||new.data||''') ');
RETURN new;
END;
CREATE TRIGGER tr_remote_insert_data
BEFORE INSERT OR UPDATE ON table1
FOR EACH ROW EXECUTE PROCEDURE fn_replicate_data();
CREATE TRIGGER tr_remote_insert_data2
BEFORE INSERT OR UPDATE ON table2
FOR EACH ROW EXECUTE PROCEDURE fn_replicate_data2();
VERSION2
PROCEDURE fn_replicate_data:
DECLARE
var table1%ROWTYPE;
BEGIN
select * from table1 into var where dataid = new.dataid;
PERFORM DBLINK_EXEC('myconn','INSERT INTO table1(dataid,sessionid,uid) VALUES('||var.dataid||','||var.sessionid||','||var.uid||') ');
PERFORM
DBLINK_EXEC('myconn','INSERT INTO table2(dataid,data) VALUES('||new.dataid||','''||new.data||''') ');
RETURN new;
END;
CREATE TRIGGER tr_remote_insert_data
BEFORE INSERT OR UPDATE ON table2
FOR EACH ROW EXECUTE PROCEDURE fn_replicate_data();
The reason was in NULL value if uid field. It had bigint type and had no default value in db, which causes the trigger to not work properly.
The fix is either
IF (NEW.uid IS NULL )
THEN
uid := 'DEFAULT';
else
uid := NEW.uid;
END IF;
before insert query; or (simpler) adding default value to db;

Error while passing parameters to UDA from nzplsql in netezza

I have a table test, which have columns listed as x0,x1,x2,x3
I have an UDA which takes in two columns as arguments and does some computation
I am trying to call the UDA from my nzplsq
when I directly call UDA like:
create table newtable as select ncorrFactor(x0,x2) from test;
It works
but when I try to do this:
p varchar;
p := X || 0 || '';
create table newtable as select ncorrFactor(p,x2) from test;
It gives me this error:
ERROR: pg_atoi: error in "x0": can't parse "x0"
What do I need to fix?
Assuming the first snippet is a store procedure written in NZPLSQL for p to be treated as 'X0' you need to build your query dynamically and use "execute immediate" on that query.
eg:
declare
query varchar;
begin
query:='create table newtable as select ncorrFactor('|| 0 ||',x2) from test';
execute immediate query;
end;

postgresql copy with schema support

I'm trying to load some data from CSV using the postgresql COPY command. The trick is that I'd like to implement multi-tenancy on a userid (which is contained in the CSV). Is there an easy way to tell the postgres copy command to filter based on this userid when loading the csv?
i.e. all rows with userid=x go to schema=x, rows with userid=y go to schema=y.
There is not a way of doing this with just the COPY command, but you could copy all your data into a master table, and then put together a simple PL/PGSQL function that does this for you. Something like this -
CREATE OR REPLACE FUNCTION public.spike()
RETURNS void AS
$BODY$
DECLARE
user_id integer;
destination_schema text;
BEGIN
FOR user_id IN SELECT userid FROM master_table GROUP BY userid LOOP
CASE user_id
WHEN 1 THEN
destination_schema := 'foo';
WHEN 2 THEN
destination_schema := 'bar';
ELSE
destination_schema := 'baz';
END CASE;
EXECUTE 'INSERT INTO '|| destination_schema ||'.my_table SELECT * FROM master_table WHERE userid=$1' USING user_id;
-- EXECUTE 'DELETE FROM master_table WHERE userid=$1' USING user_id;
END LOOP;
TRUNCATE TABLE master_table;
RETURN;
END;
$BODY$
LANGUAGE 'plpgsql' VOLATILE
COST 100;
This gets all unique user_ids from the master_table, uses a CASE statement to determine the destination schema, and then executes an INSERT SELECT to move rows, and finally deletes the moved rows.