How to import XML data into postgres database table? - plpgsql

I have XML format file for account table created using query as:
SELECT table_to_xml('account', true, FALSE, '');
-----The table structure is as:
CREATE TABLE public.account (
account_id INTEGER NOT NULL,
name VARCHAR(1) NOT NULL,
type VARCHAR(20),
group_name VARCHAR(50),
CONSTRAINT account_pkey PRIMARY KEY(account_id));
Question: How can I directly load data into account table using XML file in PostgreSQL?

I had to use varchar(2) due to the conversion from xml.
I've used a select into (creates public.account)
select account_id::text::int, account_name::varchar(2),
account_type::varchar(20) , account_group::varchar(50) INTO
public.account from(
WITH x AS ( SELECT
'<accounts>
<account>
<account_id>1</account_id>
<account_name> A </account_name>
<account_type> off shore</account_type>
<account_group> slush fund </account_group>
</account>
<account>
<account_id>3</account_id>
<account_name> C </account_name>
<account_type> off shore</account_type>
<account_group> slush fund </account_group>
</account>
</accounts> '::xml AS t)
SELECT unnest(xpath('/accounts/account/account_id/text()', t))
AS account_id,
unnest(xpath('/accounts/account/account_name/text()', t))
AS account_name,
unnest(xpath('/accounts/account/account_type/text()', t))
AS account_type,
unnest(xpath('/accounts/account/account_group/text()', t))
AS account_group
FROM x) as accounts
If you're interested in reading the xml file in then this may be useful.
Ref stackexchange sql to read xml from file into postgresql
I hope this helps

Related

Unable to create a table and import the data from excel using SQL query in PostgreSQL

I'm using the below SQL query to create a table and copy the values from an excel:
CREATE TABLE teacher (
DAILY_ALLOWANCE INTEGER
,DATE_OF_BIRTH DATE
,EMAIL VARCHAR(35)
,FIRST_NAME VARCHAR(35)
,FULL_NAME VARCHAR(35)
,ID VARCHAR(35)
,NAME VARCHAR(35)
,NUMBER_OF_STUDENTS INTEGER
,OWNERID VARCHAR(35)
,SALARY INTEGER
,TEACHER_UNIQUE_ID VARCHAR(15)
,YEARS_OF_EXPERIENCE INTEGER
) copy teacher (
DAILY_ALLOWANCE
,DATE_OF_BIRTH
,EMAIL
,FIRST_NAME
,FULL_NAME
,ID
,NAME
,NUMBER_OF_STUDENTS
,OWNERID
,SALARY
,TEACHER_UNIQUE_ID
,YEARS_OF_EXPERIENCE
)
FROM 'C:\Users\Surendra Anand R\Desktop\Note!\Files\Teacher.csv' csv header;
But I'm getting an error:
ERROR: syntax error at or near "copy"
LINE 14: ) copy teacher (
Could anyone please explain what I'm missing?.
As #a_horse_with_no_name clarified,
The query needs to be run one by one.

how can I access large object store on foreign table

I have set up postgres_fdw to access a 'remote' database (in fact its on the same server). Works fine. Except one of the columns is the oid of a large object, how can I read that data?
I worked out how to do this. The large object store can also be accessed via the pg_largeobject table. So I did
create foreign table if not exists global_lo (
loid oid not null,
pageno integer not null,
data bytea
)
server glob_serv options(table_name 'pg_largeobject', schema_name 'pg_catalog');
Now I can read a large object (all of it, I cannot stream etc) with
select data from global_lo where loid = 1234
If you have access to the foreign database, you could create a view on it to convert the lobs to either bytea or text so they can be used by the local database.
On the foreign database, you would create the view:
drop view if exists tmp_view_produto_descricao;
create view tmp_view_produto_descricao as
select * from (
select dado.*, lo_get(dado.descricaoExtendida_oid) as descricaoEstendida
from (
select
itm.id as item_id,
case when itm.descricaoExtendida is Null then null else Cast(itm.descricaoExtendida as oid) end descricaoExtendida_oid
from Item itm
where itm.descricaoExtendida is Not Null
and Cast(itm.descricaoExtendida as Text) != ''
) dado
) dado
where Cast(descricaoEstendida as Text) != '';
On the local database, you would declare the foreign view so you could use it:
create foreign table tmp_origem.tmp_view_produto_descricao (
item_id bigint,
descricaoExtendida_oid oid,
descricaoEstendida bytea
) server tmp_origem options (schema_name 'public');
This is slightly messier and wordier, but will give you better performance than you would get by acessing pg_largeobject directly.

Psql insert large sql data

I have been searching around stack overflow for some relevant problems, but I did not find any.
I have a table in sql on this format (call this file for create_table.sql):
CREATE TABLE object (
id BIGSERIAL PRIMARY KEY,
name_c VARCHAR(10) NOT NULL,
create_timestamp TIMESTAMP NOT NULL,
change_timestamp TIMESTAMP NOT NULL,
full_id VARCHAR(10),
mod VARCHAR(10) NOT NULL CONSTRAINT mod_enum CHECK (mod IN ('original', 'old', 'delete')),
status VARCHAR(10) NOT NULL CONSTRAINT status_enum CHECK (status IN ('temp', 'good', 'bad')),
vers VARCHAR(10) NOT NULL REFERENCES vers (full_id),
frame_id BIGINT NOT NULL REFERENCES frame (id),
name VARCHAR(10),
definition VARCHAR(10),
order_ref BIGINT REFERENCES order_ref (id),
UNIQUE (id, name_c)
);
This table is stored in google cloud. I have about 200000 insert statement, where I use a "insert block" method. Look like this (call this file for object_name.sql):
INSERT INTO object(
name,
create_timestamp,
change_timestamp,
full_id,
mod,
status,
vers,
frame_id,
name)
VALUES
('Element', current_timestamp, current_timestamp, 'Element:1', 'current', 'temp', 'V1', (SELECT id FROM frame WHERE frame_id='Frame:data'), 'Description to element 1'),
('Element', current_timestamp, current_timestamp, 'Element:2', 'current', 'temp', 'V1', (SELECT id FROM frame WHERE frame_id='Frame:data'), 'Description to element 2'),
...
...
('Element', current_timestamp, current_timestamp, 'Element:200000', 'current', 'temp', 'V1', (SELECT id FROM frame WHERE frame_id='Frame:data'), 'Description to object 200000');
I have a bash script where a postgres command is used to upload the data in object_name.sql to the table in google cloud:
PGPASSWORD=password psql -d database --username username --port 1234 --host 11.111.111 << EOF
BEGIN;
\i object_name.sql
COMMIT;
EOF
(Source: single transaction)
When I run this, I get this error:
BEGIN
psql:object_name.sql:60002: SSL SYSCALL error: EOF detected
psql:object_name.sql:60002: connection to server was lost
The current "solution" I have done now, is to chunk the file so each file can only have max 10000 insert statements. Running the psql command on these files works, but they take around 7 min.
Instead of having one file with 200000 insert statements, I divided them into 12 files where each file had max 10000 insert statements.
My question:
1. Is there a limit how large a file can contain?
2. I also saw this post about how to speed up insert, but I could not get COPY to work.
Hope someone out there have time to help me 🙂

PostgreSQL: foreign table and discarded records

where PostgreSQL stores records which were discarded from the foreign table during select? I have following table:
CREATE FOREIGN TABLE ext.alternatenamesext (
altid BIGINT,
geoid BIGINT,
isolanguage VARCHAR(7),
alternatename TEXT,
ispreferredname INTEGER,
isshortname INTEGER,
iscolloquial INTEGER,
ishistoric INTEGER
)
SERVER edrive_server
OPTIONS (
delimiter E'\t',
encoding 'UTF-8',
filename '/mnt/storage/edrive/data/alternateNames.txt',
format 'csv');
alternateNames.txt contains ~11 mln records. But when I do "SELECT * FROM ext.alternatenamesext" it returns only ~9.5mln records. Where the rest of 2mln are? Is there a way to put them into the separate file, like Oracle's sql*ldr?
Problem has been solved by the following syntax of CREATE FOREIGN TABLE...:
CREATE FOREIGN TABLE ext.alternatenamesext (
altid BIGINT,
geoid BIGINT,
isolanguage VARCHAR(7),
alternatename VARCHAR(400),
isPreferredName INT,
isShortName INT,
isColloquial INT,
isHistoric INT
)
SERVER edrive_server
OPTIONS (
delimiter E'\t',
encoding 'UTF-8',
filename '/mnt/storage/edrive/data/alternateNames.txt',
format 'text', -- not 'csv'!
null ''); -- eliminate null values (some kind of TRAILING NULLCOLLS in Oracle I guess)

How to insert values from another table in PostgreSQL?

I have a table which references other tables:
CREATE TABLE scratch
(
id SERIAL PRIMARY KEY,
name TEXT NOT NULL,
rep_id INT NOT NULL REFERENCES reps,
term_id INT REFERENCES terms
);
CREATE TABLE reps (
id SERIAL PRIMARY KEY,
rep TEXT NOT NULL UNIQUE
);
CREATE TABLE terms (
id SERIAL PRIMARY KEY,
terms TEXT NOT NULL UNIQUE
);
I wish to add a new record to scratch given the name, the rep and the terms values, i.e. I have neither corresponding rep_id nor term_id.
Right now the only idea that I have is:
insert into scratch (name, rep_id, term_id)
values ('aaa', (select id from reps where rep='Dracula' limit 1), (select id from terms where terms='prepaid' limit 1));
My problem is this. I am trying to use the parameterized query API (from node using the node-postgres package), where an insert query looks like this:
insert into scratch (name, rep_id, term_id) values ($1, $2, $3);
and then an array of values for $1, $2 and $3 is passed as a separate argument. At the end, when I am comfortable with the parameterized queries the idea is to promote them to prepared statements to utilize the most efficient and safest way to query the database.
However, I am puzzled how can I do this with my example, where different tables have to be subqueried.
P.S. I am using PostgreSQL 9.2 and have no problem with a PostgreSQL specific solution.
EDIT 1
C:\Users\markk>psql -U postgres
psql (9.2.4)
WARNING: Console code page (437) differs from Windows code page (1252)
8-bit characters might not work correctly. See psql reference
page "Notes for Windows users" for details.
Type "help" for help.
postgres=# \c dummy
WARNING: Console code page (437) differs from Windows code page (1252)
8-bit characters might not work correctly. See psql reference
page "Notes for Windows users" for details.
You are now connected to database "dummy" as user "postgres".
dummy=# DROP TABLE scratch;
DROP TABLE
dummy=# CREATE TABLE scratch
dummy-# (
dummy(# id SERIAL NOT NULL PRIMARY KEY,
dummy(# name text NOT NULL UNIQUE,
dummy(# rep_id integer NOT NULL,
dummy(# term_id integer
dummy(# );
NOTICE: CREATE TABLE will create implicit sequence "scratch_id_seq" for serial column "scratch.id"
NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "scratch_pkey" for table "scratch"
NOTICE: CREATE TABLE / UNIQUE will create implicit index "scratch_name_key" for table "scratch"
CREATE TABLE
dummy=# DEALLOCATE insert_scratch;
ERROR: prepared statement "insert_scratch" does not exist
dummy=# PREPARE insert_scratch (text, text, text) AS
dummy-# INSERT INTO scratch (name, rep_id, term_id)
dummy-# SELECT $1, r.id, t.id
dummy-# FROM reps r, terms t
dummy-# WHERE r.rep = $2 AND t.terms = $3
dummy-# RETURNING id, name, $2 rep, $3 terms;
PREPARE
dummy=# DEALLOCATE insert_scratch2;
ERROR: prepared statement "insert_scratch2" does not exist
dummy=# PREPARE insert_scratch2 (text, text, text) AS
dummy-# INSERT INTO scratch (name, rep_id, term_id)
dummy-# VALUES ($1, (SELECT id FROM reps WHERE rep=$2 LIMIT 1), (SELECT id FROM terms WHERE terms=$3 LIMIT 1))
dummy-# RETURNING id, name, $2 rep, $3 terms;
PREPARE
dummy=# EXECUTE insert_scratch ('abc', 'Snowhite', '');
id | name | rep | terms
----+------+-----+-------
(0 rows)
INSERT 0 0
dummy=# EXECUTE insert_scratch2 ('abc', 'Snowhite', '');
id | name | rep | terms
----+------+----------+-------
1 | abc | Snowhite |
(1 row)
INSERT 0 1
dummy=# EXECUTE insert_scratch ('abcd', 'Snowhite', '30 days');
id | name | rep | terms
----+------+----------+---------
2 | abcd | Snowhite | 30 days
(1 row)
INSERT 0 1
dummy=# EXECUTE insert_scratch2 ('abcd2', 'Snowhite', '30 days');
id | name | rep | terms
----+-------+----------+---------
3 | abcd2 | Snowhite | 30 days
(1 row)
INSERT 0 1
dummy=#
EDIT 2
We can utilize the fact that rep_id is required, even though terms_id is optional and use the following version of INSERT-SELECT:
PREPARE insert_scratch (text, text, text) AS
INSERT INTO scratch (name, rep_id, term_id)
SELECT $1, r.id, t.id
FROM reps r
LEFT JOIN terms t ON t.terms = $3
WHERE r.rep = $2
RETURNING id, name, $2 rep, $3 terms;
This version, however, has two problems:
No distinction is made between a missing terms value (i.e. '') and an invalid terms value (i.e. a non empty value missing from the terms table entirely). Both are treated as missing terms. (But the INSERT with two subqueries suffers from the same problem)
The version depends on the fact that the rep is required. But what if rep_id was optional too?
EDIT 3
Found the solution for the item 2 - eliminating dependency on rep being required. Plus using the WHERE statement has the problem that the sql does not fail if the rep is invalid - it just inserts 0 rows, whereas I want to fail explicitly in this case. My solution is simply using a dummy one row CTE:
PREPARE insert_scratch (text, text, text) AS
WITH stub(x) AS (VALUES (0))
INSERT INTO scratch (name, rep_id, term_id)
SELECT $1, r.id, t.id
FROM stub
LEFT JOIN terms t ON t.terms = $3
LEFT JOIN reps r ON r.rep = $2
RETURNING id, name, rep_id, term_id;
If rep is missing or invalid, this sql will try to insert NULL into the rep_id field and since the field is NOT NULL an error would be raised - precisely what I need. And if further I decide to make rep optional - no problem, the same SQL works for that too.
INSERT into scratch (name, rep_id, term_id)
SELECT 'aaa'
, r.id
, t.id
FROM reps r , terms t -- essentially a cross join
WHERE r.rep = 'Dracula'
AND t.terms = 'prepaid'
;
Notes:
You don't need the ugly LIMITs, since r.rep and t.terms are unique (candidate keys)
you could replace the FROM a, b by a FROM a CROSS JOIN b
the scratch table will probably need an UNIQUE constraint on (rep_id, term_it) (the nullability of term_id is questionable)
UPDATE: the same as prepared query as found in the Documentation
PREPARE hoppa (text, text,text) AS
INSERT into scratch (name, rep_id, term_id)
SELECT $1 , r.id , t.id
FROM reps r , terms t -- essentially a cross join
WHERE r.rep = $2
AND t.terms = $3
;
EXECUTE hoppa ('bbb', 'Dracula' , 'prepaid' );
SELECT * FROM scratch;
UPDATE2: test data
DROP SCHEMA tmp CASCADE;
CREATE SCHEMA tmp ;
SET search_path=tmp;
CREATE TABLE reps ( id SERIAL PRIMARY KEY, rep TEXT NOT NULL UNIQUE);
CREATE TABLE terms ( id SERIAL PRIMARY KEY, terms TEXT NOT NULL UNIQUE);
CREATE TABLE scratch ( id SERIAL PRIMARY KEY, name TEXT NOT NULL, rep_id INT NOT NULL REFERENCES reps, term_id INT REFERENCES terms);
INSERT INTO reps(rep) VALUES( 'Dracula' );
INSERT INTO terms(terms) VALUES( 'prepaid' );
Results:
NOTICE: drop cascades to 3 other objects
DETAIL: drop cascades to table tmp.reps
drop cascades to table tmp.terms
drop cascades to table tmp.scratch
DROP SCHEMA
CREATE SCHEMA
SET
CREATE TABLE
CREATE TABLE
CREATE TABLE
INSERT 0 1
INSERT 0 1
INSERT 0 1
PREPARE
INSERT 0 1
id | name | rep_id | term_id
----+------+--------+---------
1 | aaa | 1 | 1
2 | bbb | 1 | 1
(2 rows)