How to COPY CSV as JSON fields - postgresql

Is there a way to COPY the CSV file data directly into a JSON or JSONb array?
Example:
CREATE TABLE mytable (
id serial PRIMARY KEY,
info jSONb -- or JSON
);
COPY mytable(info) FROM '/tmp/myfile.csv' HEADER csv;
NOTE: each CSV line is mapped to a JSON array. It is a normal CSV.
Normal CSV (no JSON-embeded)... /tmp/myfile.csv =
a,b,c
100,Mum,Dad
200,Hello,Bye
The correct COPY command must be equivalent to the usual copy bellow.
Usual COPY (ugly but works fine)
CREATE TEMPORARY TABLE temp1 (
a int, b text, c text
);
COPY temp1(a,b,c) FROM '/tmp/myfile.csv' HEADER csv;
INSERT INTO mytable(info) SELECT json_build_array(a,b,c) FROM temp1;
It is ugly because:
need the a priory knowledge about fields, and a previous CREATE TABLE with it.
for "big data" need a big temporary table, so lost CPU, disk and my time — the table mytable have CHECKs and UNIQUEs constraints for each line.
... Needs more than 1 SQL command.

Perfect solution!
Not need to know all the CSV columns, only extract what you know.
Use at SQL CREATE EXTENSION PLpythonU;: if the command produce an error like "could not open extension control file ... No such file" you need to install pg-py extra-packages. In standard UBUNTU (16 LTS) is simple, apt install postgresql-contrib postgresql-plpython.
CREATE FUNCTION get_csvfile(
file text,
delim_char char(1) = ',',
quote_char char(1) = '"')
returns setof text[] stable language plpythonu as $$
import csv
return csv.reader(
open(file, 'rb'),
quotechar=quote_char,
delimiter=delim_char,
skipinitialspace=True,
escapechar='\\'
)
$$;
INSERT INTO mytable(info)
SELECT jsonb_build_array(c[1],c[2],c[3])
FROM get_csvfile('/tmp/myfile1.csv') c;
The split_csv() function was defined here. The csv.reader is very reliable (!).
Not tested for big-big CSV... But expected Python do job.
PostgreSQL workaround
It is not a perfect solution, but it solves the main problem, that is the
... big temporary table, so lost CPU, disk and my time"...
This is the way we do it, a workaround with file_fdw!
Adopt your conventions to avoid file-copy and file-permission confusions... The standard file path for a CSV. Example: /tmp/pg_myPrj_file.csv
Initialise your database or SQL script with the magic extension,
CREATE EXTENSION file_fdw;
CREATE SERVER files FOREIGN DATA WRAPPER file_fdw;
For each CSV file, myNewData.csv,
3.1. make a symbolic link (or scp remote copy) for your new file ln -sf $PWD/myNewData.csv /tmp/pg_socKer_file.csv
3.2. configure the file_fdw for your new table (suppose mytable).
CREATE FOREIGN TABLE temp1 (a int, b text, c text)
SERVER files OPTIONS (
filename '/tmp/pg_socKer_file.csv',
format 'csv',
header 'true'
);
PS: after running SQL script with psql, when having some permission problem, change owner of the link by sudo chown -h postgres:postgres /tmp/pg_socKer_file.csv.
3.3. use the file_fdw table as source (suppose populating mytable).
INSERT INTO mytable(info)
SELECT json_build_array(a,b,c) FROM temp1;
Thanks to #JosMac (and his tutorial)!
NOTE: if there is a STDIN way to do it (exists??), will be easy, avoiding permission problems and use of absolute paths. See this answer/discussion.

Related

Populate a table using COPY into an extension?

I have doing an extension in Postgres.
For do that, I make a backup in plain text of my functions, types, etc and I use this file for my extension.
Now I want to add an auxiliar table too. But the dump in the file for the table is like that (after it has create the table "tAcero" and the sequence):
COPY sdmed."tAcero" (id, area, masa, tipo, tamanno) FROM stdin;
44 65.30 502.000 HEB 180
45 78.10 601.000 HEB 200
.....
more values
\.
and I wonder if could be possible to use this COPY statement for populate the table into the extension, or I only can do it using "INSERT"?
Thank you.
You can indeed load tables in PostgreSQL using the COPY statement.
An example using the psql client and a CSV file:
CREATE TABLE test_of_copy (my_column text);
\COPY test_of_copy FROM './a_file_stored_locally' CSV HEADER;
Where the contents of a_file_stored_locally are:
my_column
"test_input"
Please have a read of the documentation: https://www.postgresql.org/docs/9.2/sql-copy.html. If you have any issues with this, perhaps add some more detail to your question.

Downloading a binary file from a bytea field in postgres

I successfully put a binary file (a jpg) into a bytea field in postgres, with the following code.
CREATE TABLE file_locker_test
(
ID integer PRIMARY KEY,
THE_FILE_ITSELF bytea
);
INSERT INTO file_locker_test (ID, THE_FILE_ITSELF)
VALUES (1, bytea('\\Users\\My Name\\Pictures\\picture.jpg'));
Now, I'm trying to download the file back to make sure that it has uploaded correctly.
I tried this:
\copy (SELECT encode(file_locker_test(the_file_itself), 'hex') FROM file_locker_test LIMIT 1) TO '\\Users\\My Name\\Desktop\\picture.hex';
And got this error:
//Users/My Name/Desktop/picture.hex: No such file or directory
Does anyone have any insights?
You are confused.
The INSERT didn't insert the binary file, but the (binary) string \\Users\\My Name\\Pictures\\picture.jpg.
The file /Users/My Name/Desktop/picture.hex cannot be created on the database server, probably because one of the directories on the path does not exist.
If you want to insert the content of the binary file into the database, you'll have to write a program that opens and reads the file into memory and then inserts that.

COPY FROM csv file into Postgresql table and skip id first row

simple question I think but I can't seem to find the answer through Googling etc.
I am importing csv data into a postgresql table via psql. I can do this fine through the pgAdmin III GUI but am now using Codio Online IDE where it is all done through psql.
How can I import into the Postgresql table and skip the first 'id' auto incrementing column?
In pgAdmin it was as simple as unselecting the id column on the 'columns to import' tab.
So far I have in the SQL Query toolbox
COPY products FROM '/media/username/rails_projects/app/db/import/bdname_products.csv' DELIMITER ',' CSV;
Alternatively, is it possible to get an output on the SQL that PgAdmin III used after you execute an Import using the menu Import command?
Thank you for your consideration.
As explained in the manual, copy allows you to specify a field list to read, like this:
COPY table_name ( column_name , ... ) FROM 'filename'

PostgreSQL Creating an Insert Trigger which Remaps Columns

I'm wondering if I can use a trigger on a table to "ignore" columns that are in a COPY statement from STDIN but which are not in the target table. Sorry if the wording/syntax of the question is off, but here is and explanation of what I'm trying to say. I'm new to triggers so any advice is helpful.
I'm using the PostGIS Shapefile importer to copy shapefiles to the spatial tables in my PostgreSQL database.
This creates a COPY statement which contains all the fields in the shapefile something like:
COPY "public"."stations" ("column1","column2","column3","column4", geom) FROM stdin;
column1 and column2 are in the file but not in the target table, so the COPY fails.
Is there a way to create a trigger to create something that would have the same result as:
COPY "public"."stations" ("column3","column4", geom) FROM stdin;
No, you cannot skip columns that are present in the input file. This will error out, before triggers are even invoked. And you cannot use rules either. I quote the manual:
COPY FROM will invoke any triggers and check constraints on the
destination table. However, it will not invoke rules.
You can either edit the file or use a temporary staging table:
COPY to a temporary table with matching columns.
Use INSERT to write the desired columns to the final target table(s) - or the whole range of SQL DDL commands for more sophisticated matters.

Postgresql: inserting value of a column from a file

For example, there is a table named 'testtable' that has following columns: testint (integer) and testtext (varchar(30)).
What i want to do is pretty much something like that:
INSERT INTO testtable VALUES(15, CONTENT_OF_FILE('file'));
While reading postgresql documentation, all I could find is COPY TO/FROM command, but that one's applied to tables, not single columns.
So, what shall I do?
If this SQL code is executed dynamically from your programming language, use the means of that language to read the file, and execute a plain INSERT statement.
However, if this SQL code is meant to be executed via the psql command line tool, you can use the following construct:
\set content `cat file`
INSERT INTO testtable VALUES(15, :'content');
Note that this syntax is specific to psql and makes use of the cat shell command.
It is explained in detail in the PostgreSQL manual:
psql / SQL Interpolation
psql / Meta-Commands
If I understand your question correctly, you could read the single string(s) into a temp table and use that for insert:
DROP SCHEMA str CASCADE;
CREATE SCHEMA str;
SET search_path='str';
CREATE TABLE strings
( string_id INTEGER PRIMARY KEY
, the_string varchar
);
CREATE TEMP TABLE string_only
( the_string varchar
);
COPY string_only(the_string)
FROM '/tmp/string'
;
INSERT INTO strings(string_id,the_string)
SELECT 5, t.the_string
FROM string_only t
;
SELECT * FROM strings;
Result:
NOTICE: drop cascades to table str.strings
DROP SCHEMA
CREATE SCHEMA
SET
NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "strings_pkey" for table "strings"
CREATE TABLE
CREATE TABLE
COPY 1
INSERT 0 1
string_id | the_string
-----------+---------------------
5 | this is the content
(1 row)
Please note that the file is "seen" by the server as the server sees the filesystem. The "current directory" from that point of view is probably $PG_DATA, but you should assume nothing, and specify the complete pathname, which should be reacheable and readable by the server. That is why I used '/tmp', which is unsafe (but an excellent rendez-vous point ;-)