Import CSV with many columns to pgAdmin v4.1 - postgresql

I'm new to pgAdmin and GIS DB in general. I want to upload a CSV file to pgAdmin v4.1 and I'm trying to understand the logic to do so. I am able to do this by creating a new table under the desired DB and then manually defined the column (name, type etc.), only then I am able to load the CSV into pgAdmin using the GUI. This seems a bit cumbersome way to import a CSV file, because let's say I have a CSV file with 200 columns, it is not practical to define them all manually, and there must be a way to tell pgAdmin: this is the CSV file, now get the columns by yourself and get (or at least assume) the columns type, ad create a new table, much similar to how pandas reads CSV in python. As I'm new to this topic, please elaborate your answer\comment as much as possible.

NO: Unfortunately, we can only import CSV after the table is created.
YES: There is no GUI method, but:
There is a utility called pgFutter which will do exactly what you want. This is a command line utility. Here are the binaries.
You can write a function that does that. Here is an example.

I would look into using GDAL to upload your CSV file into postgis.
I used this recently to do a similar job.
ogr2ogr -f "PostgreSQL" -lco GEOMETRY_NAME=geometry -lco FID=gid PG:"host=127.0.0.1 user=username dbname=dbname password=********" postgres.vrt -nln th_new_data_2019 -t_srs EPSG:27700
Code used to upload a csv to postgis and transform the coordinate system.
-f = file format name
output file format name, some possible values are:
-f "ESRI Shapefile"
-f "TIGER"
-f "MapInfo File"
-f "GML"
-f "PostgreSQL
-lco = NAME=VALUE:
Layer creation option (format specific)
-nln name:
Assign an alternate name to the new layer
-t_srs srs_def:
target spatial reference set. The coordinate systems that can be passed are anything supported by the OGRSpatialReference.SetFromUserInput() call, which includes EPSG PCS and GCSes (i.e. EPSG:4296), PROJ.4 declarations (as above), or the name of a .prj file containing well known text.
The best and simplest guide for installing GDAL that I have used is :
https://sandbox.idre.ucla.edu/sandbox/tutorials/installing-gdal-for-windows

Related

How does the pgadmin encodes the file path in backups?

I'm trying to restore dump files from locations that contain character from other languages besides English.
So here is what I did:
From inside the pgadmin I used the backup tool like:
And inside the FileName input provided an actual real folder named "א":
C:\א\toc.dump
The actual file argument (-f file) has been auto decoded into:
pg_dump.exe --file "C:\\0F04~1\\TOC~1.DUM"
My question is what is the decoding system pgadmin uses in order to decode the file path argument?
How did it came up with 0F04~1 from א?
I'm asking it because pg_restore is not supporting file path that contains not English chars (from cmd):
pg_dump.exe --file "C:\\0F04~1\\TOC1.DUMP" .... WORKS OK!
pg_dump.exe --file "C:\\א\\TOC1.DUMP" ... Not Working!
pg_restore: [custom archiver] could not open input file "..."
As in this question, so if I'll find the encoding system for pgadmin I'll use it from code.
My goal is to encode the path that contain not-English chars from a batch code so it will work.
This is not something weird pgadmin does, but rather it is something weird Windows itself does when needing to represent such file names in a DOS-like setting. Like when the name is more than 8 chars, or extension more than 3.
In my hands the weird presentation is only there in the logs and status messages. If I use the GUI file chooser, the file names look normal, and replay successfully.
If you really want to know what Windows is doing, I think that is a better question for superuser with a Windows tag. I don't know why you can't restore these files. Are you using the pgAdmin GUI file chooser or trying to type the names in directly to something?

Is it possible to automatically create a table in PostgreSQL from a csv file with headers?

According with the documentation https://www.postgresql.org/docs/current/sql-copy.html , PSQL command COPY cannot create a table from a tsv or csv file. You need to create a table and its columns before you can COPY to it.
Is there any workaround for this issue?
There's some ways to workaround this issue.
You can find some scripts around the internet which will do what you're looking for, but the best way I know is using this Data Mover Project.
Besides, it's already published to docker as techindicium/spark-datamover:v0.1.
You can call it from the command line:
docker run --network host techindicium/spark-datamover:v0.1 -s /home/path/your_file.csv --filetype csv --destination "jdbc:postgresql://localhost:PORT/DATABASE?user=USERNAME&password=PASSWD" --destination-table MY_DEST_TABLE

What is the purpose of the sql script file in a tar dump?

In a tar dump
$ tar -tf dvdrental.tar
toc.dat
2163.dat
...
2189.dat
restore.sql
After extraction
$ file *
2163.dat: ASCII text
...
2189.dat: ASCII text
restore.sql: ASCII text, with very long lines
toc.dat: PostgreSQL custom database dump - v1.12-0
What is the purpose of restore.sql?
toc.dat is binary, but I can open it and it looks like a sql
script too. How different are between the purposes of restore.sql
and toc.dat?
The following quote from the document does't answer my question:
with one file for each table and blob being dumped, plus a so-called Table of Contents file describing the dumped objects
in a machine-readable format that pg_restore can read.
Since a tar dump contains restore.sql besides the .dat files,
what is the difference between the sql script files restore.sql and toc.dat in a tar dump and a
plain dump (which has only one sql script file)?
Thanks.
restore.sql is not used by pg_restore. See this comment from src/bin/pg_dump/pg_backup_tar.c:
* The tar format also includes a 'restore.sql' script which is there for
* the benefit of humans. This script is never used by pg_restore.
toc.dat is the table of contents. It contains commands to create and drop each object in the dump and is used by pg_restore to create the objects. It also contains COPY statements that load the data from the *.dat file.
You can extract the table of contents in human-readable form with pg_restore -l, and you can edit the result to restore only specific objects with pg_restore -L.
The <number>.dat files are the files containing the table data, they are used by the COPY statements in toc.dat and restore.sql.
This looks a script to restore the data to PostgresQL. the script was created using pg_dump.
If you'd like to restore, please have a look at pg_restore.
The dat files contain the data to be restored in those \copy commands in the sql script.
the toc.dat file is not referenced inside the sql file. if you try to peek inside using cat toc.dat|strings you'll find that it contains data very similar to the sql file, but with a few more internal ids.
I think it might have been intended to work without the SQL at some point, but that's not how it's working right now. see the code to generate toc here.

Export and Append Data to CSV

I am trying to export data to existing csv file.
I have been using these methods to export data.
Microsoft.Jet.OLEDB.4.0
SQLCMD
Data Export Wizard
However I cannot find if there is any parameter / option to append the exported data to existing file. Is there any way? Thanks.
Note: answer is biased towards *nix operating systems; I'm not too familiar with windows.
If you can run your sql query via the command line,
using a scripting language, you can use a library that creates an MSSQL connection, (an example of this is a node.js program I authored (https://github.com/skilbjo/aqtl but any tool will do), or
a windows binary that runs something like sqlcmd from the command line,
you can just pipe the output to the csv file. For example:
$ node runquery.js myquery.sql >> existing_csv_file.csv

How to import Zipped file into Postgres Table

I would like to important a file into my Postgresql system(specificly RedShift). I have found a arguement for copy that allows importing a gzip file. But the provider for the data I am trying to include in my system only produces the data in a .zip. Any built in postgres commands for opening a .zip?
From within Postgres:
COPY table_name FROM PROGRAM 'unzip -p input.csv.zip' DELIMITER ',';
From the man page for unzip -p:
-p extract files to pipe (stdout). Nothing but the file data is sent to stdout, and the files are always extracted in binary
format, just as they are stored (no conversions).
Can you just do something like
unzip -c myfile.zip | gzip myfile.gz
Easy enough to automate if you have enough files.
This might only work when loading redshift from S3, but you can actually just include a "gzip" flag when copying data to redshift tables, as described here:
This is the format that works for me if my s3 bucket contains a gzipped .csv.
copy <table> from 's3://mybucket/<foldername> '<aws-auth-args>' delimiter ',' gzip;
unzip -c /path/to/.zip | psql -U user
The 'user' must be have super user right else you will get a
ERROR: must be superuser to COPY to or from a file
To learn more about this see
https://www.postgresql.org/docs/8.0/static/backup.html
Basically this command is used in handling large databases