Import CSV file via PSQL With Array Values - postgresql

I have a very simple table where I would like to insert data from a CSV into it.
create table products(
id integer primary key,
images text[]
);
Here is what I am currently trying with my csv:
1,"['hello.jpg', 'world.jpg']"
2,"['hola.jpg', 'mundo.jpg']"
When I do the following, I get a syntax error from psql, with no additional information what could have gone wrong.
\copy products 'C:\Users\z\Downloads\MOCK_DATA.csv' WITH DELIMITER ',';
Does anyone know how to format my array values properly?

If you remove the square brackets from the csv file then I would have the table like this (images as text rather than text[]):
create table products_raw
(
id integer primary key,
images text
);
plus this view
create view products as
select id, ('{'||images||'}')::text[] as images
from products_raw;
and use the view henceforth. Anyway I would rather have the CSV file like this, no formatting, just data:
1,"hello.jpg,world.jpg"
2,"hola.jpg,mundo.jpg"
It is also worth considering to attach the csv file as a foreign table using file_fdw. It is a bit more complicated but usually pays off with several benefits.

Related

Basic questions about Cloud SQL

I'm trying to populate a cloud sql database using a cloud storage bucket, but I'm getting some errors. The csv has the headers (or column names) as first row and does not have all the columns (some columns in the database can be null, so I'm loading the data I need for now).
The database is in postgresql and this is the first database in GCP I'm trying to configure and I'm a little bit confused.
Does it matters if the csv file has the column names?
Does the order of the columns matter in the csv file? (I guess they do if there are not present in the csv)
The PK of the table is a serial number, which I'm not including in the csv file. Do I need to include also the PK? I mean, because its a serial number it should be "auto assigned", right?
Sorry for the noob questions and thanks in advance :)
This is all covered by the COPY documentation.
It matters in that you will have to specify the HEADER option so that the first line is skipped:
[...] on input, the first line is ignored.
The order matters, and if the CSV file does not contain all the columns in the same order as the table, you have to specify them with COPY:
COPY mytable (col12, col2, col4, ...) FROM '/dir/afile' OPTIONS (...);
Same as above: if you omit a table column in the column list, it will be filled with the default value, in that case that is the autogenerated number.

Handling the output of jsonb_populate_record

I'm a real beginner when it comes to SQL and I'm currently trying to build a database using postgres. I have a lot of data I want to put into my database in JSON files, but I have trouble converting it into tables. The JSON is nested and contains many variables, but the behavior of jsonb_populate_record allows me to ignore the structure I don't want to deal with right now. So far I have:
CREATE TABLE raw (records JSONB);
COPY raw from 'home/myuser/mydocuments/mydata/data.txt'
create type jsonb_type as (time text, id numeric);
create table test as (
select jsonb_populate_record(null::jsonb_type, raw.records) from raw;
When running the select statement only (without the create table) the data looks great in the GUI I use (DBeaver). However it does not seem to be an actual table as I cannot run select statements like
select time from test;
or similar. The column in my table 'test' also is called 'jsonb_populate_record(jsonb_type)' in the GUI, so something seems to be going wrong there. I do not know how to fix it, I've read about people using lateral joins when using json_populate_record, but due to my limited SQL knowledge I can't understand or replicate what they are doing.
jsonb_populate_record() returns a single column (which is a record).
If you want to get multiple columns, you need to expand the record:
create table test
as
select (jsonb_populate_record(null::jsonb_type, raw.records)).*
from raw;
A "record" is a a data type (that's why you need create type to create one) but one that can contain multiple fields. So if you have a column in a table (or a result) that column in turn contains the fields of that record type. The * then expands the fields in that record.

Do I need to match the column order of my CREATE TABLE statement to that of my import data sheet?

I have an Excel CSV data sheet and I would like it to be imported into a new PostgreSQL table.
The Excel CSV data sheet has the columns in this order:
OrderDate, Region, Rep, Item, Units, Unit Price
This is my CREATE TABLE statement:
CREATE TABLE officesupplies (
region varchar(20)
order_date,
rep_first_name varchar(30),
unit_price float
units float
)
Notice how the order of my columns in my Create Table statement do not match the Excel. I've tested this and it does not work but I'm wondering the 'why' behind it not being able to import. Just wondering, thanks in advance!
You can specify the columns in COPY. That way you can load data where the column order is different from your table.
In your case, you should use
COPY officesupplies
(order_date, region, rep_first_name, ...)
FROM 'filenaame';
I see that you have a column Item in your file that does not seem to match any table colum. That won't work — in that case you will have to load the data into a –staging table” first or use third party software like pgLoader.

Import column from file with additional fixed fields

Can I somehow import a column or columns from a file, where I specify one or more fields held fixed for all rows?
For example:
CREATE TABLE users(userid int PRIMARY KEY, fname text, lname text);
COPY users (userid,fname) from 'users.txt';
but where lname is assumed to be 'SMITH' for all the rows in users.txt?
My actual setting is more complex, where the field I want to supply for all rows is part of the PRIMARY KEY.
Possibly something of this nature:
COPY users (userid,fname,'smith' as lname) from 'users.txt';
Since I can't find a native solution to this in Cassandra, my solution was to perform a preparation step with Perl so the file contained all the relevant columns prior to calling COPY. This works fine, although I would prefer an answer that avoided this intermediate step.
e.g. adding a column with 'Smith' for every row to users.txt and calling:
COPY users (userid,fname,lname) from 'users.txt';

How do I specify columns when loading new rows into PostgreSQL using pg_bulkload

I'm experimenting with using the pg_bulkload project to import millions of rows of data into a database. However, none of the new rows have a primary key and only two of several columns are avalable in my input file. How do I tell pg_bulkload which columns I'm importing and how do I generate the primary key field? Do I need to edit my import file to match exactly what the output of a COPY command would be and generate the id field myself?
For example, lets say my database columns might be:
id title body published
The data that I have is limited to title and published and are listed in a tab delimited file. My .ctl file looks like this:
TABLE = posts
INFILE = stdin
TYPE = CSV
DELIMITER = " "
You can use FILTER functionality of pg_loader. Something like:
In database
CREATE FUNCTION pg_bulkload_filter(text, text) RETURNS record
AS $$
SELECT nextval('tablename_id_seq'), NULL, NULL, $1, $2, NULL
$$ LANGUAGE SQL;
And in pg_bulkload control file:
FILTER = pg_bulkload_filter