Postgres copy query connection lost error - postgresql

I'm trying to bulk load around 200M lines (3.5GB) of data to an Amazon RDS postgresql DB using the following command:
cat data.csv | psql -h<host>.rds.amazonaws.com -U<user> <db> -c "COPY table FROM STDIN DELIMITER AS ','"
After a couple of minutes I get this error:
connection not open
connection to server was lost
If I run head -n 100000000 data.csv to send the first 100M lines instead of all 200M then the command succeeds instead. I'm guessing that there's a timeout somewhere that's causing the query with the full dataset to fail. I've not been able to find any timeout settings or parameters though.
How can I make the bulk insert succeed with the full dataset?

As I read the statement you're using, it basically creates a giant string, then connects to SQL and then it tries to feed the entire string as argument.
If you load psql and run something like \copy ... from '/path/to/data.csv' ..., I'd imagine the connection might stay alive while the file's content is streamed chunk by chunk.
That would be my hunch as to why 10M lines works (= argument pushed entirely before the connection times out) but not the entire file (= argument still uploading).

Related

Postgres Error | COPY Command Disk Quota Exceeded

I have 2 tables with 7.7 million records and other with 160 million records.
I want dump of the tables on my NAS drive (700GB+ available memory). I'm using following command to export the data to csv file:
\COPY (select * from table_name) to '/path_to_nas_drive/test.csv' csv header;
After running the above command, it's throwing the issue
Could not write COPY data : disk quota exceeded
Why it's throwing the error? is it because of the space issue on database server that it's not able to create buffer/temporary file or is there any way to handle this?
Let's try scaling back the request and just getting one row just to make sure you can use COPY even for just one row:
\COPY (select * from table_name limit 1) to '/path_to_nas_drive/test.csv' csv header;
You can also try compressing the output with gzip like so:
psql -c "\COPY (select * from table_name) to stdout" yourdbname | gzip > /path_to_nas_drive/test.csv.gz
It is a client issue, not a server issue. "could not write COPY data" is generated by psql, and "disk quota exceeded" comes from the OS and is then passed on by psql.
So your NAS is telling you that you are not allowed to write as much to it as you would like to. Maybe it has plenty of space, but it isn't going to give it to you, where "you" is whoever is running psql, not whoever is running PostgreSQL server.

Backing a very large PostgreSQL DB "by parts"

I have a PostgreQL DB that is about 6TB. I want to transfer this database to another server using for example pg_dumpall. The problem I have is that I only have a 1TB HD. How can I do to copy this database to the other new server that has enough space? Let's suppose I can not get another HD. Is there the possibility to do partial backup files, upload them to the new server, erase the HD and get another batch of backup files until the transfer is complete?
This works here(proof of concept):
shell-command issued from the receiving side
remote side dumps through the network-connection
local side psql just accepts the commands from this connection
the data is never stored in a physical file
(for brevity, I only sent the table definitions, not the actual data: --schema-only)
you could have some problems with users and tablespaces (these are global for an installation in Postgres) pg_dumpall will dump+restore these, too, IIRC.
#!/bin/bash
remote=10.224.60.103
dbname=myremotedbname
pg_dump -h ${remote} --schema-only -c -C ${dbname} | psql
#eof
As suggested above if you have a fast network connection between source and destination you can do it without any extra disk.
However for a 6 TB DB (which includes indexes I assume) using the archive dump format (-Fc) could yield a database dump of less than 1 TB.
Regarding the "by parts" question: yes, it possible using the table pattern (-t, --table):
pg_dump -t TABLE_NAME ...
You can also exclude tables using -T, --exclude-table:
pg_dump -T TABLE_NAME ...
The above options (-t , -T) can be specified multiple times and can be even combined.
They also support patterns for specifying the tables:
pg_dump -t 'employee_*' ...

Error importing postgres DB dump into empty database

I need to import my psql dump into my fresh psql database. When I execute the following command, I get errors.
psql -U user new_database < filename.sql
Error I got:
ERROR: out of memory
DETAIL: Cannot enlarge string buffer containing 0 bytes by 1208975751 more bytes.
How do i fix this. Aldo, is there any method to log the import process?
Thanks.
I think the most common cause is a "corrupt" SQL file. There's no easy fix. Split the file in half (man split), fix the SQL statements at the tail end of one resulting file and at the head end of the other, and run again. In the past, I seem to recall seeing "Invalid byte sequence for UTF-8", or something like that. I'm sure there are other causes.
PostgreSQL has a lot of logging options; set them in postgresql.conf and restart PostgreSQL. Look at
log_destination
logging_collector
client_min_messages
log_error_verbosity
log_statement

Create query that copies from a CSV file on my computer to the DB located on another computer in Postgres

I am trying to create a query that will copy data from a CSV file that is located on my computer to a Postgres DB that is on a different computer.
Our Postgres DB is located on another computer, and I work on my own to import and query data. I have successfully copied data from the CSV file on MY computer TO the DB in PSQL Console using the following:
\COPY table_name FROM 'c:\path\to\file.csv' CSV DELIMITER E'\t' HEADER;
But when writing a query using the SQL Editor, I use the same code above without the '\' in the beginning. I get the following error:
ERROR: could not open file "c:\pgres\dmi_vehinventory.csv" for reading: No such file or directory
********** Error **********
ERROR: could not open file "c:\pgres\dmi_vehinventory.csv" for reading: No such file or directory
SQL state: 58P01
I assume the query is actually trying to find the file on the DB's computer rather than my own.
How do I write a query that tells Postgres to look for the file on MY computer rather than the DB's computer?
Any help will be much appreciated !
\COPY is a correct way if you want to upload file from local computer (computer where you've stared psql)
COPY is correct when you want to upload on remote host from remote directory
here is an example, i've connected with psql to remote server:
test=# COPY test(i, i1, i3)
FROM './test.csv' WITH DELIMITER ',';
ERROR: could not open file "./test.csv" for reading: No such file
test=# \COPY test(i, i1, i3)
FROM './test.csv' WITH DELIMITER ',';
test=# select * from test;
i | i1 | i3
---+----+----
1 | 2 | 3
(1 row)
There are several common misconceptions when dealing with PostgreSQL's COPY command.
Even though psql's \COPY FROM '/path/to/file/on/client' command has identical syntax (other than the backslash) to the backend's COPY FROM '/path/to/file/on/server' command, they are totally different. When you include a backslash, psql actually rewrites it to a COPY FROM STDIN command instead, and then reads the file itself and transfers it over the connection.
Executing a COPY FROM 'file' command tells the backend to itself open the given path and load it into a given table. As such, the file must be mapped in the server's filesystem and the backend process must have the correct permissions to read it. However, the upside of this variant is that it is supported by any postgresql client that supports raw sql.
Successfully executing a COPY FROM STDIN places the connection into a special COPY_IN state during which an entirely different (and much simpler) sub-protocol is spoken between the client and server, which allows for data (which may or may not come from a file) to be transferred from the client to the server. As such, this command is not well supported outside of libpq, the official client library for C. If you aren't using libpq, you may or may not be able to use this command, but you'll have to do your own research.
COPY FROM STDIN/COPY TO STDOUT doesn't really have anything to do with standard input or standard output; rather the client needs to speak the sub-protocol on the database connection. In the COPY IN case, libpq provides two commands, one to send data to the backend, and another to either commit or roll back the operation. In the COPY OUT case, libpq provides one function that receives either a row of data or an end of data marker.
I don't know anything about SQL Editor, but it's likely that issuing a COPY FROM STDIN command will leave the connection in an unusable state from its point of view, especially if it's connecting via an ODBC driver. As far as I know, ODBC drivers for PostgreSQL do not support COPY IN.

Can PostgreSQL COPY read CSV from a remote location?

I've been using JDBC with a local Postgres DB to copy data from CSV files into the database with the Postgres COPY command. I use Java to parse the existing CSV file into a CSV format matches the tables in the DB. I then save this parsed CSV to my local disk. I then have JDBC execute a COPY command using the parsed CSV to my local DB. Everything works as expected.
Now I'm trying to perform the same process on a Postgres DB on a remote server using JDBC. However, when JDBC tries to execute the COPY I get
org.postgresql.util.PSQLException: ERROR: could not open file "C:\data\datafile.csv" for reading: No such file or directory
Am I correct in understanding that the COPY command tells the DB to look locally for this file? I.E. the remote server is looking on its C: drive (doesn't exist).
If this is the case, is there anyway to indicate to the copy command to look on my computer rather than "locally" on the remote machine? Reading through the copy documentation I didn't find anything that indicated this functionality.
If the functionality doesn't exist, I'm thinking of just populating the whole database locally then copying to database to the remote server but just wanted to check that I wasn't missing anything.
Thanks for your help.
Create your sql file as follows on your client machine
COPY testtable (column1, c2, c3) FROM STDIN WITH CSV;
1,2,3
4,5,6
\.
Then execute, on your client
psql -U postgres -f /mylocaldrive/copy.sql -h remoteserver.example.com
If you use JDBC, the best solution for you is to use the PostgreSQL COPY API
http://jdbc.postgresql.org/documentation/publicapi/org/postgresql/copy/CopyManager.html
Otherwise (as already noted by others) you can use \copy from psql which allows accessing the local files on the client machine
To my knowledge, the COPY command can only be used to read locally (either from stdin or from file) from the machine where the database is running.
You could make a shell script where you run you the java conversion, then use psql to do a \copy command, which reads from a file on the client machine.