How merge gz files from postgres dump into one big file?

How merge gz files from postgres dump into one big file? - postgresql

There is a folder with postgres dump files like:
0001.dat.gz
0001.dat.gz
...
6000.dat.gz
toc.dat
How merge all these files into single gz archive which is recognized by postgres during restoring?

So it looks like you have the directory format. pg_restore will recognize that format. I don't think there is any supported way to convert it to one of the other formats. You can tar it up into a single file, but you will have to untar it before restoring. Next time you run pg_dump, you should tell it to use the format you want used.
There are subtle differences in the toc.dat file between the directory format and the tar format, so if you just uncompress and then tar up the directory, it will not work (at least in my hands). It does work the other way around, however.

Related

Import data in gzip archive to mongodb

I have data stored in gzip archive folders, every archive contains a big file that includes json in the following format:
{key:value, key:value}
{key:value, key:value}
{key:value, key:value}
I need to import the data to MongoDB. What is the best way to do that? I can't extract the gzip on my PC as each file (not archived) is about 1950MB.

You can unzip the files to STDOUT and pipe the stream into mongoimport. Then you don't need to safe the uncompressed file to your local disk:
gunzip --stdout your_file.json.gz | mongoimport --uri=<connection string> --collection=<collection> --db=<database>

I've imported tens of billions of lines of CSV and JSON to MongoDB in the past year, even from zipped formats. Having tried them all to save precious time, here's what I would like to recommend:
unzip the file
pass it as an argument to mongoimport
create the index on the fields you want, but ONLY at the end of the entire data insert process.
You can find the mongoimport documentation at: https://www.mongodb.com/docs/database-tools/mongoimport/
If you have a lot of files, you may want to do a for in bash that unzips and passes the filename as an argument to mongoimport.
If you are worried about not having enough disk space you can also delete the unzipped file at the end of each single import of mongoimport.
Hope it helped!

How does the pgadmin encodes the file path in backups?

I'm trying to restore dump files from locations that contain character from other languages besides English.
So here is what I did:
From inside the pgadmin I used the backup tool like:
And inside the FileName input provided an actual real folder named "א":
C:\א\toc.dump
The actual file argument (-f file) has been auto decoded into:
pg_dump.exe --file "C:\\0F04~1\\TOC~1.DUM"
My question is what is the decoding system pgadmin uses in order to decode the file path argument?
How did it came up with 0F04~1 from א?
I'm asking it because pg_restore is not supporting file path that contains not English chars (from cmd):
pg_dump.exe --file "C:\\0F04~1\\TOC1.DUMP" .... WORKS OK!
pg_dump.exe --file "C:\\א\\TOC1.DUMP" ... Not Working!
pg_restore: [custom archiver] could not open input file "..."
As in this question, so if I'll find the encoding system for pgadmin I'll use it from code.
My goal is to encode the path that contain not-English chars from a batch code so it will work.

This is not something weird pgadmin does, but rather it is something weird Windows itself does when needing to represent such file names in a DOS-like setting. Like when the name is more than 8 chars, or extension more than 3.
In my hands the weird presentation is only there in the logs and status messages. If I use the GUI file chooser, the file names look normal, and replay successfully.
If you really want to know what Windows is doing, I think that is a better question for superuser with a Windows tag. I don't know why you can't restore these files. Are you using the pgAdmin GUI file chooser or trying to type the names in directly to something?

What is the purpose of the sql script file in a tar dump?

In a tar dump
$ tar -tf dvdrental.tar
toc.dat
2163.dat
...
2189.dat
restore.sql
After extraction
$ file *
2163.dat: ASCII text
...
2189.dat: ASCII text
restore.sql: ASCII text, with very long lines
toc.dat: PostgreSQL custom database dump - v1.12-0
What is the purpose of restore.sql?
toc.dat is binary, but I can open it and it looks like a sql
script too. How different are between the purposes of restore.sql
and toc.dat?
The following quote from the document does't answer my question:
with one file for each table and blob being dumped, plus a so-called Table of Contents file describing the dumped objects
in a machine-readable format that pg_restore can read.
Since a tar dump contains restore.sql besides the .dat files,
what is the difference between the sql script files restore.sql and toc.dat in a tar dump and a
plain dump (which has only one sql script file)?
Thanks.

restore.sql is not used by pg_restore. See this comment from src/bin/pg_dump/pg_backup_tar.c:
* The tar format also includes a 'restore.sql' script which is there for
* the benefit of humans. This script is never used by pg_restore.
toc.dat is the table of contents. It contains commands to create and drop each object in the dump and is used by pg_restore to create the objects. It also contains COPY statements that load the data from the *.dat file.
You can extract the table of contents in human-readable form with pg_restore -l, and you can edit the result to restore only specific objects with pg_restore -L.
The <number>.dat files are the files containing the table data, they are used by the COPY statements in toc.dat and restore.sql.

This looks a script to restore the data to PostgresQL. the script was created using pg_dump.
If you'd like to restore, please have a look at pg_restore.
The dat files contain the data to be restored in those \copy commands in the sql script.
the toc.dat file is not referenced inside the sql file. if you try to peek inside using cat toc.dat|strings you'll find that it contains data very similar to the sql file, but with a few more internal ids.
I think it might have been intended to work without the SQL at some point, but that's not how it's working right now. see the code to generate toc here.

How to import Zipped file into Postgres Table

I would like to important a file into my Postgresql system(specificly RedShift). I have found a arguement for copy that allows importing a gzip file. But the provider for the data I am trying to include in my system only produces the data in a .zip. Any built in postgres commands for opening a .zip?

From within Postgres:
COPY table_name FROM PROGRAM 'unzip -p input.csv.zip' DELIMITER ',';
From the man page for unzip -p:
-p extract files to pipe (stdout). Nothing but the file data is sent to stdout, and the files are always extracted in binary
format, just as they are stored (no conversions).

Can you just do something like
unzip -c myfile.zip | gzip myfile.gz
Easy enough to automate if you have enough files.

This might only work when loading redshift from S3, but you can actually just include a "gzip" flag when copying data to redshift tables, as described here:
This is the format that works for me if my s3 bucket contains a gzipped .csv.
copy <table> from 's3://mybucket/<foldername> '<aws-auth-args>' delimiter ',' gzip;

unzip -c /path/to/.zip | psql -U user
The 'user' must be have super user right else you will get a
ERROR: must be superuser to COPY to or from a file
To learn more about this see
https://www.postgresql.org/docs/8.0/static/backup.html
Basically this command is used in handling large databases

Compress a folder using tar in MATLAB

I try compress a folder in MATLAB using tar. I want to assign the current date as the name of the archive file. When I try
tar 'datestr(now)' FooFolder
Nothing happens. With
tar datestr(now) FooFolder
the name of the archive file is datestr(now).tar as expected. What is the solution?

The documentation is quite clear, use the function syntax:
tar(tarfilename,files)
Example:
tar(datestr(now),'FooFolder')

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

How merge gz files from postgres dump into one big file? - postgresql

There is a folder with postgres dump files like: 0001.dat.gz 0001.dat.gz ... 6000.dat.gz toc.dat How merge all these files into single gz archive which is recognized by postgres during restoring?

Related

Import data in gzip archive to mongodb

How does the pgadmin encodes the file path in backups?

What is the purpose of the sql script file in a tar dump?

How to import Zipped file into Postgres Table

Compress a folder using tar in MATLAB

Categories

Resources