My goal is to migrate my local csv file to a Postgres DB which exist already on azure.
My attempt so far was as follow:
In my K8S cluster I connect to the pod (spring app) which is connected with the Postgres DB on azure. I copy (Kubectl cp) the csv file to the pod folder tmp. On the pod I install the postgress sql client to connect to the DB. So far so good, now I can run psql command like \copy to migrate the csv file to the specific table, but my problem is, that the file path cannot be found. How can I access the file on the pod folder tmp?
For example:
\copy my_imports from "HERE PATH from POD ->/tmp/my_import.csv" csv header;
Is my attempt totally wrong or how can I make this happen, any ideas?
Related
As part of my Flask and Celery application, I'm trying to move data from AWS-Aurora Postgres DB to Redshift.
I'll be running this application in Kubernetes.
My approach is to query the aurora Postgres database and write the result set to a CSV file which is saved on to an attached volume and then upload it to S3 and then import the file into Redshift.
However, I came across another article which lets us directly upload the result set as a CSV file to S3 instead of having an intermediate volume.
https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/AuroraMySQL.Integrating.SaveIntoS3.html
They've mentioned the usage of OUTFILE command. But it's mentioned about MySQL. But they haven't mentioned anything about Postgres DB.
Is it even possible to use the command on Aurora Postgres DB and export to S3.
If you can connect to the database with psql, you can use the \copy command to export the output from any select statement to a csv:
https://codeburst.io/two-handy-examples-of-the-psql-copy-meta-command-2feaefd5dd90
https://dba.stackexchange.com/questions/7651/postgres-client-copy-copy-command-doesnt-have-access-to-a-temporary-table
Yes, Aurora runs on MySQL so you can use the outfile command. Did you even try running a query with outfile?
I am currently running Neo4j with 3 core servers and 3 replicas within my Kubernetes (PKS) environment. I am able to successfully connect to the remote cluster from my local machine via Neo4j desktop. I am now trying to import a large (3gb) CSV file from our Hadoop environment into the Neo4j cluster. I downloaded the file on my local machine and was able to use "neo4j-admin import" to create a graph.db. I am now wondering how to get that graph.bd or csv file directly into the Neo4j Kubernetes cluster.
The documented way to transfer a neo4j DB is to use:
The neo4j-admin dump command on the originating machine to create a dump file.
The neo4j-admin load command on the destination machine to fill a fresh DB from that dump file.
Refer to the documentation for more details.
Till now I've been backing up my postgresql data using pg_dump, which exports the data to an sql file mydb.sql, and then restoring from that sql file using psql -U user -d db < mydb.sql.
For one reason or another it would be more convenient to restore the database content more directly, in an environment where psql does not exist... specifically on a host server where postgresql is installed in a docker container running on the host, but not on the host itself.
My plan is to back up the content of /var/lib/postgresql/data/ to a tar file, and when required (e.g. when a new server is created that hosts the postgresql container) just restore that to the same path. The folder /var/lib/postgresql/data/ in the docker container is mapped to a folder on the host server, so I would create this backup on the host, not inside the postgres container.
Is this a valid approach? Any "gotchas"? And are there any subfolders within /var/lib/postgresql/data/ that I can exclude from the tar file? I don't want to back up mere 'housekeeping' information.
You can do that, but you have to do it properly if you don't want your database to become corrupted.
Either stop PostgreSQL before copying the data directory or follow the instructions from the documentation for an online backup.
I've been given a project to extract data from a PostgreSQL database. I've no previous experience with PostgreSQL but the project I have is to bug fix existing code, so all the logic to connect to the engine and get data is already in place.
The problem I have is the database has been given to me in the form of the folders and files straight from the source HDD, not a backup (which isn't going to happen so "Get the customer to give you a backup instead isn't an option here).
The folders also contained the actual PostgreSQL binaries so I looked a the version (9.4.14) and downloaded the nearest (9.4.18) from the PostgreSQL site and installed it. Now all I have to do is some how is to get it to look at my given data files.
I tried the obvious of copying the contents of the data folder into the installed data folder but after the PostgreSQL service won't start.
I did find a option in the conf file:
#data_directory = 'ConfigDir'
I changed this to:
data_directory = 'C:\customer\data'
But again the service won't start after this.
The data directory used by the service is defined through the service command line which overwrites any property defined in postgresql.conf.
You need to re-create the service in order to change the data directory, e.g.:
Remove the service:
pg_ctl -unregister -N postgresql-9.1
postgresql-9.1 is the "real" name of the service, not the "Display Name". You can see that in the properties of the service inside the "services" app.
Then re-create the service with the correct data directory:
pg_ctl -register -D -D c:\customer\data -N postgresql-9.1
Another way of "debugging" startup errors in Windows, is to start Postgres from the command line (not through the service) because some errors during startup are not logged in the Postgres logfile but they are displayed on the command line. You can do that with e.g.:
pg_ctl start -D c:\customer\data`
If the bin directory is not in your PATH you need to specify the full path to it on the command line, e.g.: c:\Postgres9.1\bin\pg_ctl
I've been using JDBC with a local Postgres DB to copy data from CSV files into the database with the Postgres COPY command. I use Java to parse the existing CSV file into a CSV format matches the tables in the DB. I then save this parsed CSV to my local disk. I then have JDBC execute a COPY command using the parsed CSV to my local DB. Everything works as expected.
Now I'm trying to perform the same process on a Postgres DB on a remote server using JDBC. However, when JDBC tries to execute the COPY I get
org.postgresql.util.PSQLException: ERROR: could not open file "C:\data\datafile.csv" for reading: No such file or directory
Am I correct in understanding that the COPY command tells the DB to look locally for this file? I.E. the remote server is looking on its C: drive (doesn't exist).
If this is the case, is there anyway to indicate to the copy command to look on my computer rather than "locally" on the remote machine? Reading through the copy documentation I didn't find anything that indicated this functionality.
If the functionality doesn't exist, I'm thinking of just populating the whole database locally then copying to database to the remote server but just wanted to check that I wasn't missing anything.
Thanks for your help.
Create your sql file as follows on your client machine
COPY testtable (column1, c2, c3) FROM STDIN WITH CSV;
1,2,3
4,5,6
\.
Then execute, on your client
psql -U postgres -f /mylocaldrive/copy.sql -h remoteserver.example.com
If you use JDBC, the best solution for you is to use the PostgreSQL COPY API
http://jdbc.postgresql.org/documentation/publicapi/org/postgresql/copy/CopyManager.html
Otherwise (as already noted by others) you can use \copy from psql which allows accessing the local files on the client machine
To my knowledge, the COPY command can only be used to read locally (either from stdin or from file) from the machine where the database is running.
You could make a shell script where you run you the java conversion, then use psql to do a \copy command, which reads from a file on the client machine.