Inserting timestamp int Postgres in bulk with copy or pgcopy - postgresql

I'm trying to bulk import some data into Postgres using COPY but I can't get my timestamp to insert in bulk.
Reduced example:
DROP TABLE IF EXISTS "demo";
CREATE TABLE "demo"(
time TIMESTAMP WITH TIME ZONE NOT NULL
);
I have a CSV file which looks like this:
1609459247.579
I try and insert my data using COPY:
COPY public.demo FROM '/home/myuser/demo.csv' WITH (FORMAT csv)
I get:
invalid input syntax for type timestamp with time zone: "1609459247.579"
If I want to insert the data manually I need to use to_timestamp.
INSERT INTO public.demo VALUES(to_timestamp(1609459247.579))
Is there either a way I can get COPY to use to_timestamp at insert time, or some pre-processing I can do to the CSV file so that copy will insert it? It's very slow doing it one INSERT at a time.
Thanks!

Have solved this. I used a COPY TO from my table to create me a csv the opposite way round with a format I know Postgres doesn't mind, then mucked about with Python string formatting to come up with a format that works.
Short version, in Python you can use a datetime formatted with datetime.isoformat() to create something Postgres will ingest using COPY.

Related

Selecting data from a BYTEA data type in Postgres that contains CSV data and storing it in a table

I have a table ("file_upload") in a postgreSQL (11,8) database, which we use for storing the original CSV file that was used for loading some data to our system (I guess the question of best practices is up for debate here, but for now lets just assume it is).
The files are stored in a column ("file") which is of the data type "bytea"
So one row of this table contains
id - file_name - upload_date - uploaded_by - file <-- this being the column in question.
This column then stores the data of a csv file:
item_id;item_type_id;item_date;item_value
11;1;2022-09-22;123.45
12;4;2022-09-20;235.62
13;1;2022-09-21;99.99
14;2;2022-09-19;654.32
What I need to be able to do is query this column, extracrt the data and store it in a temporary table (note: the structure of these csv files are all the same, so the table structure can be pre-defined and does not have to be dynamic or anything).
Any help would be greatly appreciated
Use
COPY (SELECT file FROM file_upload WHERE id =1)
TO '/tmp/blob' (FORMAT 'binary');
to re-export the data to a file. Then create the temporary table and use COPY to read them in again. Make sure to use the proper ENCODING.
You can wrap that in a loop that performs this operation for all rows in your table.

Load data with default values into Redshift from a parquet file

I need to load data with a default value column into Redshift, as outlined in the AWS docs.
Unfortunately the COPY command doesn't allow loading data with default values from a parquet file, so I need to find a different way to do that.
My table requires a column with the getdate function from Redshift:
LOAD_DT TIMESTAMP DEFAULT GETDATE()
If I use the COPY command and add the column names as arguments I get the error:
Column mapping option argument is not supported for PARQUET based COPY
What is a workaround for this?
Can you post a reference for Redshift not supporting default values for a Parquet COPY? I haven't heard of this restriction.
As to work-arounds I can think of two.
Copy the file to a temp table and then insert from this temp table into your table with the default value.
Define an external table that uses the parquet file as source and insert from this table onto the table with the default value.

Convert a BLOB to VARCHAR instead of VARCHAR FOR BIT

I have a BLOB field in a table that I am selecting. This field data consists only of JSON data.
If I do the following:
Select CAST(JSONBLOB as VARCHAR(2000)) from MyTable
--> this returns the value in VARCHAR FOR BIT DATA format.
I just want it as a standard string or varcher - not in bit format.
That is because I need to use JSON2BSON function to convert the JSON to BSON. JSON2BSON accepts a string but it will not accept a VarChar for BIT DATA type...
This conversation should be easy.
I am able to do the select as a VARCHAR FOR BIT DATA.. Manually COPY it using the UI. Paste it into a select literal and convert that to BSON. I need to migrate a bunch of data in this BLOB from JSON to BSON, and doing it manually won't be fast. I just want to explain how simple of a use case this should be.
What is the select command to essentially get this to work:
Select JSON2BSON(CAST(JSONBLOB as VARCHAR(2000))) from MyTable
--> Currently this fails because the CAST converts this (even though its only text characters) to VARCHAR for BIT DATA type and not standard VARCHAR.
What is the suggestion to fix this?
DB2 11 on Windows.
If the data is JSON, then the table column should be CLOB in the first place...
Having the table column a BLOB might make sense if the data is actually already BSON.
You could change the blob into a clob using the converttoclob procedure then you should be ok.
https://www.ibm.com/support/knowledgecenter/SSEPGG_11.5.0/com.ibm.db2.luw.apdv.sqlpl.doc/doc/r0055119.html
You can use this function to remove the "FOR BIT DATA" flag on a column
CREATE OR REPLACE FUNCTION DB_BINARY_TO_CHARACTER(A VARCHAR(32672 OCTETS) FOR BIT DATA)
RETURNS VARCHAR(32672 OCTETS)
NO EXTERNAL ACTION
DETERMINISTIC
BEGIN ATOMIC
RETURN A;
END
or if you are on Db2 11.5 the function SYSIBMADM.UTL_RAW.CAST_TO_VARCHAR2 will also work

Can I import CSV data into a table without knowing the columns of the CSV?

I have a CSV file file.csv.
In Postgres, I have made a table named grants:
CREATE TABLE grants
(
)
WITH (
OIDS=FALSE
);
ALTER TABLE grants
OWNER TO postgres;
I want to import file.csv data without having to specify columns in Postgres.
But if I run COPY grants FROM '/PATH/TO/grants.csv' CSV HEADER;, I get this error: ERROR: extra data after last expected column.
How do I import the CSV data without having to specify columns and types?
The error is normal.
You created a table with no column. The COPY command try to import data into the table with the good structure.
So you have to create the table corresponding to your csv file before execute the COPY command.
I discovered pgfutter :
"Import CSV and JSON into PostgreSQL the easy way. This small tool abstract all the hassles and swearing you normally have to deal with when you just want to dump some data into the database"
Perhaps a solution ...
The best method for me was to convert the csv to dataframe and then follow
https://github.com/sp-anna-jones/data_science/wiki/Importing-pandas-dataframe-to-postgres
No, it is not possible using the COPY command
If a list of columns is specified, COPY will only copy the data in the
specified columns to or from the file. If there are any columns in the
table that are not in the column list, COPY FROM will insert the
default values for those columns.
COPY does not create columns for you.

Importing csv into Postgres database with improper date value

I have a query which has a date field with values that look like this in the query results window:
2013-10-01 00:00:00
However, when I save the results to csv, it gets saved like this:
2013-10-01T00:00:00
This is causing a problem when I'm trying to COPY the csv into a table in Redshift, where it gives me an error stating that the value is not a valid timestamp (the field I'm importing to is a timestamp field).
How can I get it so that it either strips out the time component completely, leaving just the date, or at least that the "T" is removed from the results?
I'm exporting results to csv using Aginity SQL Workbench for Redshift.
According to this knowledgebase article:
After import, add new TIMESTAMP columns and use the CAST() function to
populate them:
ALTER TABLE events ADD COLUMN received_at TIMESTAMP DEFAULT NULL;
UPDATE events SET received_at = CAST(received_at_raw as timestamp);
ALTER TABLE events ADD COLUMN generated_at TIMESTAMP DEFAULT NULL;
UPDATE events SET generated_at = CAST(generated_at_raw as timestamp);
Finally, if you forsee no more imports to this table, the raw VARCHAR
timestamp columns may be removed. If you forsee importing more events
from S3, do not remove these columns. To remove the columns, run:
ALTER TABLE events DROP COLUMN received_at_raw; ALTER TABLE events
DROP COLUMN generated_at_raw;
Hope that helps...