Is there any easy tool to load CSVs into PostgreSQL?
I know there's the POSTGIS DBF loader tool but I was wondering if there's any non-commercial or commercial add-on that allows one to easily load in a CSV.
The COPY command, built-in to PostgreSQL, does exactly what you want. It's most useful when used in its \copy variant via psql.
Check the documentation for your particular Pg version, as COPY options vary. In future please mention your Pg version when posting. Assuming you're on 9.1, then from a psql client cou could use:
\copy target_table from 'the_file.csv' with (format csv)
and possibly other options, as documented in the link above, depending on the details of your CSV dialect.
Note that the \copy command will not work from PgAdmin-III or other clients; it's specific to psql. Regular COPY works from any client, but requires that the file be accessible by the database server's postgres process, so it's much less convenient.
You can also use pg_bulkload or ETL tools like Talend and Pentaho if the job is huge or more complicated.
Related
I'm used to working with SQL Server and the SQL Server Management Studio has the option to automatically generate a script to drop and recreate everything in a database (tables/views/procedures/etc). I find that when developing a new application and writing a bunch of junk in a local database for basic testing it's very helpful to have the options to just nuke the whole thing and recreate it in a clean slate, so I'm looking for a similar functionality within postgres/pgadmin.
PGAdmin has an option to generate a create script for a specific table but right clicking each table would be very tedious and I'm wondering if there's another way to do it.
To recreate a clean schema only database you can use the pg_dump client included with a Postgres server install. The options to use are:
-c
--clean
Output commands to clean (drop) database objects prior to outputting the commands for creating them. (Unless --if-exists is also specified, restore might generate some harmless error messages, if any objects were not present in the destination database.)
This option is ignored when emitting an archive (non-text) output file. For the archive formats, you can specify the option when you call pg_restore.
and:
-s
--schema-only
Dump only the object definitions (schema), not data.
This option is the inverse of --data-only. It is similar to, but for historical reasons not identical to, specifying --section=pre-data --section=post-data.
(Do not confuse this with the --schema option, which uses the word “schema” in a different meaning.)
To exclude table data for only a subset of tables in the database, see --exclude-table-data.
clean in Flyway
The database migration tool Flyway offers a clean command that drops all objects in the configured schemas.
To quote the documentation:
Clean is a great help in development and test. It will effectively give you a fresh start, by wiping your configured schemas completely clean. All objects (tables, views, procedures, …) will be dropped.
Needless to say: do not use against your production DB!
EXPORT TO myFile.ixf OF ixf SELECT * FROM TABLE_NAME WHERE SSN='DATA' AND EMPLOYER_ID=DATA AND CREATED_TS='DATA'
I am using this statement to export a couple of rows. for privacy purposes DATA has been inserted where necessary. however the following error is produced. I have followed IBM's guide on export and feel like this should be correct but unsure exactly as to what is wrong. the error log is as follows
Error: DB2 SQL Error: SQLCODE=-104, SQLSTATE=42601, SQLERRMC=myFile;EXPORT TO ;JOIN, DRIVER=3.53.70
SQLState: 42601
ErrorCode: -104
As already remarked, you cannot directly run Db2-commands (such as import, export, load ... etc.) from plain SQL , as you are trying to do via JDBC.
Instead, if your Db2-server runs on Linux/Unix/Windows, you can either use a stored procedure, or (for any Db2-server operating system) you can use the command-line.
However, when you use stored-procedure SYSPROC.ADMIN_CMD for Db2-LUW, all file-names in stored-procedure parameters are relative to the Db2-server (and not your remote jdbc-client, if you are running remotely).
That means after a successful export via stored-procedure, if you really need the exported IXF file to be on your workstation then you must do file-transfer to your workstation using whatever tools you have for that purpose.
For example, this shows an export on Unix to an IXF file in /tmp on the Db2-server:
call sysproc.admin_cmd('EXPORT TO /tmp/myFile.ixf OF ixf SELECT * FROM user1.stk1 with ur') ;
If you don't want to use a stored procedure, you must use the command-line shell (for example on Windows, use db2ntcmd.bat , or on Unix use bash or ksh) and connect to the database in the shell and perform the export. This requires the workstation to have a Db2-client and also that the relevant database and node be first catalogued.
If you specify your Db2-version and the operating-system on which your Db2-server runs, then you will get more details.
I have a need to load data from S3 to Postgres RDS (around 50-100 GB) I don't have the option to use AWS Data Pipeline and I am looking for something similar to using the COPY command to load data in S3 into Amazon Redshift.
I would appreciate any suggestions on how I can accomplish this.
Originally, this answer was trying to use the S3 to Postgres RDS Functionality. That whole enterprise failed (see below).
The way I have finally been able to do this is:
Set-up an EC2 instance with psql installed (see below near end of post)
Copy the relevant CSVs to import from S3 to the local instance
Use the psql /copy command to import the files up
This last part is really, really important. If you use the SQL COPY command the entire RDS Postgres role structure will frustrate you to no end. It has a wonky SUPERRDSADMIN role which is not very super at all. However, if you use the psql /copy commany you apparently can do anything. I have confirmed this be the case and have started my uploads succesfully. I will come back and re-edit this post (time permitting) to add relevant documentation steps for the above.
Caveat Emptor: The post below was all the original work I had done trying to get this implemented. I don't want to bury the lead despite multiple efforts (including what can only be described as pathetic tech support from AWS) I don't believe that this feature is ready for prime time. Despite a very simple test environment, easy to replicate, AWS has not provided an effective way to not get the copy statement to crap out as follows:
The actual call to aws_s3.table_import_from_s3(...) is reporting a permission problem between RDS and S3. From my research work with psql this appears to be a C library, probably installed by AWS.
NOTICE: CURL error code: 28 when attempting to validate pre-signed URL, 1 attempt(s) remaining
NOTICE: HINT: make sure your instance is able to connect with S3.
S3 to Postgres RDS Functionality Now Added
On 2019-04-24 AWS released functionality allowing a Postgres RDS to load directly from S3. You can read the announcement here, and see the documentation page here.
I am sharing with the OP because this appears to be the AWS supported way of solving the question posed.
Key summary points:
Requires Postgres 11.1 or greater
Need access to psql and the ability to connect it to the RDS instance
Need to install the aws_s3 extension which pulls in aws_commons.
You can get to the S3 bucket by specifying credentials or by assigning IAM roles to RDS
It advertises supporting all of the same data formats as the postgres COPY command
It currently only appears to support a single file at a time (ie no regex)
The instructions are fairly detailed and provide a variety of paths to configuring (AWS CLI scripts, Console instructions, etc). Additionally, the option to use your IAM keys rather than have to set-up roles is nice.
I did not find a way to download just psql, so I had to bring down a full postgres install down to my mac, but that was no big deal with brew:
brew install postgres
and since the DB service does not get activated it is the quickest way to get psql.
Update: Decided that having psql on my mac was a security hole, port forwarding, etc. I found that there is a simple Postgres install available for AMI Linux 2 under the AMI Extras rubric. The install command is fairly simple on your ami instance type.
sudo amazon-linux-extras install postgresql10
psql is fairly easy to use, however, important to keep in mind that any instructions to psql itself are escaped by a \. Documentation on psql can be found here. Recommend going through it at least once before executing the AWS recommended scripts.
To the extent you run tight security and have access to your RDS instances seriously restricted (which I do) don't forget to open up the ports from your AMI instance running Postgres to your RDS instance.
If your preference is a GUI then you can try to use PGAdmin4. It is the AWS recommended way of connecting to RDS Postgres instances according to the docs. I was unable to get any of the SSH tunneling features to work (which is why I ended up doing the localhost SSH mapping that I used for psql). I also found it to be rather buggy in other ways. Reading reviews of the product it seems that version 4 may not be the stablest of releases.
http://docs.aws.amazon.com/redshift/latest/dg/t_loading-tables-from-s3.html
Use the COPY command to load a table in parallel from data files on
Amazon S3. You can specify the files to be loaded by using an Amazon
S3 object prefix or by using a manifest file.
The syntax to specify the files to be loaded by using a prefix is as
follows:
copy <table_name> from 's3://<bucket_name>/<object_prefix>'
authorization;
update
Another option is to mount s3 and use direct path to the csv with COPY command. I'm not sure If it will hold 100GB effectively, but worth of trying. Here is some list of options on software.
Yet another option would be "parsing" s3 file part by part with something described here to a file and COPY from named pipe, described here
And the most obvious option to just download file to local storage and use COPY I don't cover at all
Also worth of mentioning would be s3_fdw (status unstable). Readme is very laconic, but I assume you could create a foreign table leading to s3 file. Which itself means you can load data to other relation...
I have a "cinema" DB in postgres and I want to dump all its tables and data in a cinema.sql file.
This file will contain all the sql code for re-creating the schema, tables and filling them with the data.
I already have a bank.sql file (for the "bank" DB) which I can execute via PSQL console in pg Admin III and import using the command
/i *path to my bank.sql file*
Now, I want to produce a cinema.sql file like bank.sql, but I don't know how to do it.
It's not the backup/restore feature of course, because it produces a .backup file.
I've also tried
pg dump > cinema.dump
In PSQL console but I can't find a .sql file anywhere, so I don't think it is what I'm looking for either.
Couldn't find anything useful for what I need in Postgres documentation unfortunately so I hope you can help me because I'm just a beginner.
Thanks.
As mentioned in the comments and the documentation, you should use pg_dump command line tool.
pg_dump cinema > cinema.sql
I've made it! Don't know if it's the 100% right way to do it, though, but I think it is.
I'll report here just in case someone else might need this in the future and it can be of help.
I selected the db I wanted to dump in .sql file and then right click -> backup.
Here, as format, I chose plain instead of custom, and "mydbdump.sql" as file name.
In Dump Options #1 and Dump Options #2 I checked the checkboxes to include everything I needed (e.g. "include CREATE DATABASE statement").
I compared this newly created .sql dump with the one I already had (using Notepad++) and they look the same (even if, of course, they are from different dbs).
We have the "coverity" tool setup and are trying to find a way to backup the database to a file, it uses I believe PostgreSQL.
How can we do this, is it using its own independent installation of PostgreSQL?
Even better answer..
cov-admin-db backup c:/mybackupfile
When you installed Coverity Integrity Manager, it asked you if you want it to install and manage a PostgreSQL instance or if you want to connect to your own existing PostgreSQL instance that you then have to manage.
If you chose the former, then you would use the provided cov-admin-db command.
If you chose the latter then presumably you already do regular back-ups of your databases with *pg_dump*, you should do the same for the Coverity database.
Without knowing which of the two you chose, it's not clear which of the two answers already given is correct.
You can check which option you chose by looking in the file /config/system.properties - if the first line is "*embedded_db=true*" then use the cov-admin-db command which is documented in the manual as well as in its own --help option.
If it does use PostgreSQL, then there should be a pg_dump utility somewhere in the PostgreSQL installation.
Taking backups using pg_dump is very well explained in the manual:
http://www.postgresql.org/docs/current/static/backup-dump.html
http://www.postgresql.org/docs/current/static/app-pgdump.html