\copy in is unable to fit utf-8 into varying(30) - postgresql

In order to build the staging database from my production data to test migrations, etc. I do a regular \copy of a subset of production records to CSV files, and import the result. I do this for the specific tables with are very large (600G), as I don't need them all for testing.
One such column is a varying(30) [for hysterical raisins involving django 1].
I have data in that column which is UTF-8 encoded. Some of it is exactly 30 glyphs wide. But, that takes more than 30 bytes to encode. Strangely, it fits just fine in the original database, but after creating a new database, it does not fit.
I copy in with:
\copy public.cdrviewer_cdr from '/sg1/backups/2017-02-20/cdr.csv'
with (format 'csv', header, encoding 'utf-8') ;
This seems like it may be a bug, or maybe it's just a limitation of copy.
(I am using postgresql 9.6 on Devuan Jessie)

Related

Is it possible to dump from Timescale without hypertable insertions?

I followed the manual on: https://docs.timescale.com/v1.0/using-timescaledb/backup
When I dump it into a binary file everything work out as expected (can restore it easily).
However, when I dump it into plain text SQL, insertions to hyper tables will be created. Is that possible to create INSERTION to the table itself?
Say I have an 'Auto' table with columns of id,brand,speed
and with only one row: 1,Opel,170
dumping into SQL will result like this:
INSERT INTO _timescaledb_catalog.hypertable VALUES ...
INSERT INTO _timescaledb_internal._hyper_382_8930_chunk VALUES (1, 'Opel',170);
What I need is this (and let TS do the work in the background):
INSERT INTO Auto VALUES (1,'Opel',170);
Is that possible somehow? (I know I can exclude tables from pg_dump but that wouldn't create the needed insertion)
Beatrice. Unfortunately, pg_dump will dump commands that mirror the underlying implementation of Timescale. For example, _hyper_382_8930_chunk is a chunk underlying the auto hypertable that you have.
Might I ask why you don't want pg_dump to behave this way? The SQL file that Postgres creates on a dump is intended to be used by pg_restore. So as long as you dump and restore and see correct state, there is no problem with dump/restore.
Perhaps you are asking a different question?

POSTGRESQL 9.6 COPY created file bigger than the table

Im trying to export oracle table into a local postgresql dump via the copy command :
\copy (select * from remote_oracle_table) to /postgresql/table.dump with binary;
The oracle table`s size is 25G. Howvere, the copy command created a 50G file. How is it possible ?
I'm capable of selecting from remote oracle table because i have the oracle_fdw
extension.
A few factors are likely at work here, including:
Small numbers in integer and numeric fields use more space in binary format than text format;
Oracle probably stores the table with some degree of compression, which the binary dump won't have.
You'll likely find that if you compress the resulting dump it'll be a lot smaller.

Encoding in temp tables in RedShift

I am using temp staging tables, TempStaging, for doing some merges. The data in some columns for main table, MainTable, is encoded in lzo, say C1. The merge output goes back into MainTable.
In order to ensure same dist key for TempStaging, I am creating it using create table. For some reasons I cannot use Create Table as.
So should I encode the column C1 in to lzo? Or leave it without encoding? Would RedShift short circuit the [decode while select from MainTable, encode while writing into TempStaging, decode while selecting from TempTable for merge, Encode back again while writing it into MainTable]
Because I am thinking that if that short circuiting is not happening, I am better of leaving the encoding, trading away some memory to CPU gains.
-Amit
Data in Redshift is always decoded when it's read from the table AFAIK. There are a few DBs that can operate directly on compressed data but Redshift does not.
There is no absolute rule on whether you should use encoding in a temp table. It depends on how much data is being written. I've found it's faster with encoding 90+% of the time so that's my default approach.
As you note, ensuring that the temp table uses the same dist key is No. 1 priority. You can specify the dist key (and column encoding) in a CREATE TABLE AS though:
CREATE TABLE my_new_table
DISTKEY(my_dist_key_col)
AS
SELECT *
FROM my_old_table
;

DB2 DBCLOB data INSERT with Unicode data

The problem at hand is to insert data into a DB2 table which has a DBCLOB column. The table's encoding is Unicode. The subsystem is a MIXED YES with Japanese CCSID set of (290, 930, 300). The application is bound ENCODING CCSID.
I was successful in FETCHING the DBCLOB's data in Unicode, no problem there. But when I turn around and try to INSERT it back, the data inserted is being interpreted as not being Unicode, seems DB2 thinks its EBCDIC DBCS/GRAPHIC, and the inserted row shows Unicode 0xFEFE. When I manually update the data being inserted to valid DBCS then the data inserts OK and shows the expected Unicode DBCS values.
To insert the data I am using a dynamically prepared INSERT statement with a placeholder for the DBCLOB column. The SQLVAR entry associated with the placeholder is a DBCLOB_LOCATOR with the CCSID set to 1200.
A DBCLOB locator is being created doing a SET dbclobloc = SUBSTR(dbclob, 1, length). The created locator is being put into SQLDA. Then the prepared INSERT is being executed.
It seems DB2 is ignoring the 1200 CCSID associated with the DBCLOB_LOCATOR SQLVAR. Attempts to put a CAST(? AS DBCLOB CCSID UNICODE) on the placeholder in the INSERT do not help because at that time DB2 seems to have made up its mind about the encoding of the data to be inserted.
I am stuck :( Any ideas?
Greg
I think I figured it out and it is not good: the SET statement for the DBCLOB_LOCATOR is static SQL and the DBRM is bound ENCODING EBCDIC. Hence DB2 has no choice but to assume the data is in the CCSID of the plan.
I also tried what the books suggest and used a SELECT ... FROM SYSIBM.SYSDUMMYU to set the DBCLOB_LOCATOR. This should have told DB2 that the data was coming in Unicode. But it failed again, with symptoms indicating it still assumed the DBCS EBCDIC CCSID.
Not good.

How to insert (raw bytes from file data) using a plain text script

Database: Postgres 9.1
I have a table called logos defined like this:
create type image_type as enum ('png');
create table logos (
id UUID primary key,
bytes bytea not null,
type image_type not null,
created timestamp with time zone default current_timestamp not null
);
create index logo_id_idx on logos(id);
I want to be able to insert records into this table in 2 ways.
The first (and most common) way rows will be inserted in the table will be that a user will provide a PNG image file via an html file upload form. The code processing the request on the server will receive a byte array containing the data in the PNG image file and insert a record in the table using something very similar to what is explained here. There are plenty of example of how to insert byte arrays into a postgresql field of type bytea on the internet. This is an easy exercise. An example of the insert code would look like this:
insert into logos (id, bytes, type, created) values (?, ?, ?, now())
And the bytes would be set with something like:
...
byte[] bytes = ... // read PNG file into a byte array.
...
ps.setBytes(2, bytes);
...
The second way rows will be inserted in the table will be from a plain text file script. The reason this is needed is only to populate test data into the table for automated tests, or to initialize the database with a few records for a remote development environment.
Regardless of how the data is entered in the table, the application will obviously need to be able to select the bytea data from the table and convert it back into a PNG image.
Question
How does one properly encode a byte array, to be able to insert the data from within a script, in such a way that only the original bytes contained in the file are stored in the database?
I can write code to read the file and spit out insert statements to populate the script. But I don't know how to encode the byte array for the plain text script such that when running the script from psql the image data will be the same as if the file was inserted using the setBytes jdbc code.
I would like to run the script with something like this:
psql -U username -d dataBase -a -f test_data.sql
The easiest way, IMO, to represent bytea data in an SQL file is to use the hex format:
8.4.1. bytea Hex Format
The "hex" format encodes binary data as 2 hexadecimal digits per byte, most significant nibble first. The entire string is preceded by the sequence \x (to distinguish it from the escape format). In some contexts, the initial backslash may need to be escaped by doubling it, in the same cases in which backslashes have to be doubled in escape format; details appear below. The hexadecimal digits can be either upper or lower case, and whitespace is permitted between digit pairs (but not within a digit pair nor in the starting \x sequence). The hex format is compatible with a wide range of external applications and protocols, and it tends to be faster to convert than the escape format, so its use is preferred.
Example:
SELECT E'\\xDEADBEEF';
Converting an array of bytes to hex should be trivial in any language that a sane person (such a yourself) would use to write the SQL file generator.