Is possible omit specifics tables from existing dump file when use pgsql for import data? - postgresql

I have a dump file and I want import it, but it contains a log table with millions of records whereby I need exclude it when execute pgsql < dump_file. Note: I cannot use pg_restore
Edit:
Since the best option is to edit the file manually, any suggestions to remove 650K lines from a 690K line file on windows?

The correct way to is to fix whatever problem is preventing you from using pg_restore (I guess that you have already taken the dump in the wrong format?).
The quick and dirty way is to use a program to exclude what you don't want. I'd use perl, because that is what I would use. sed or awk have similar features, and I'm sure there are ways to do it in every other language you would care to look at.
perl -ne 'print unless /^COPY public.pgbench_accounts/../^\\\.$/' dump.file | psql
This excludes every line between the one that starts with COPY public.pgbench_accounts until the next following \.
Of course you would replace public.pgbench_accounts with your table's name, making sure to quote it properly if that is needed.
It might get confused if your database contains a row whose first column starts with the text "COPY public.pgbench_accounts"...

Then you have to edit the file manually.
A crude alternative might be: create a table with the same name as the log table, but with an incompatible definition or no permissions for the importing user. Then restoring that table will fail. If you ignore these errors, you have reached your goal.

Related

COPY a csv file with 108 column into postgresql

I have a csv file with 108 columns which i try to import in my postgresql table. It is obvious that I don't want to specify every columns in my CREATE TABLE statement. But when I enter
\COPY 'table_name' FROM 'directory' DELIMITER ',' CSV HEADER; this error message shows up: "ERROR: Extra Data after Last Expected Column". When having a few columns I know how to fix this problem but, like I said, i don't want to specified the entire 108 columns. By the way my table does contain any columns at all. Any help on how I could do that? Thx !
When dealing with problems like this, I often cheat. Plenty of tools exist online for converting CSV to SQL, https://www.convertcsv.com/csv-to-sql.htm being one of them.
Copy/paste your CSV, copy/paste the generated SQL. Not the most elegant solution, although will work as a one-off situation.
Now, if you're looking to repeat this process regularly (automated I hope), then Python may be a interesting language to explore to quickly write a script to do this for you, then schedule it at a CRON job or whatever method you prefer for invoking it automatically with the correct input (CSV file).
Please feel free to let me know if I've misunderstood your original question, or if I can provide any more help give me a shout and I'll do my best!

Restore a PostgreSQL-9.6 database from an old complete dump and/or a up-to-date just base directory and other rescued files

I'm trying to restore/rescue a database from some that I have:
I have all the recent files in PGDATA/base (/var/lib/postgresql/9.6/main/base/), but I have not the complete /var/lib/postgresql/9.6/main/
I have all files from an old backup (and not much different) dump that I restored in a new installation of PostgreSQL-9.6.
I have a lot of rescued files from the hard drive (from ddrescue) and I got thousand of files without a name (having a "#" and then a number instead and in lost+found directory), so, for instance:
I have the pg_class file
I have the pg_clog directory with 0000 file
Edit:
Probably I have the content of pg_xlog, but I don't have the name of the files. I have 5 files sized 16777216 bytes:
#288294 (date 2019-04-01)
#288287 (date 2019-05-14)
#288293 (date 2019-07-02)
#261307 (date 2019-11-27)
#270185 (date 2020-01-28)
Also my old dump is from 2019-04-23, so the first one could
be the same?
So my next step is going to try to read those files with pg_xlogdump
and/or trying to name them with those namefiles (beginning with
00000001000000000000000A by date and put them to the new one pg_xlog directory, that I saw that the system filenaming them, could be?). Also I realized that the last one has the date of the day hard drive crashed, so I have the last one.
The PGDATA/base directory I rescued from the hard drive (damaged) contains directories 1, 12406, 12407 and 37972 with a lot of files inside. I check with pg_filedump -fi that my updated data is stored on files in directory 37972.
Same (but old) data is stored in files in directory PGDATA/base/16387 in the restored dump.
I tried directly to copy the files from one to other mixing the updated data over the old database but it doesn't work. After solved permission errors I can go in to the "Frankenstein" database in that way:
postgres#host:~$ postgres --single -P -D /var/lib/postgresql/9.6/main/ dbname
And I tried to do something, like reindex, and I get this error:
PostgreSQL stand-alone backend 9.6.16
backend> reindex system dbname;
ERROR: could not access status of transaction 136889
DETAIL: Could not read from file "pg_subtrans/0002" at offset 16384: Success.
CONTEXT: while checking uniqueness of tuple (1,7) in relation "pg_toast_2619"
STATEMENT: reindex system dbname;
Certainly pg_subtrans/0002 file is part of the "Frankenstein" and not the good one (because I didn't find it yet, not with that name), so I tried: to copied another one that seems similar first and then, to generate 8192 zeroes with dd to that file, in both cases I get the same error (and in case that the file doesn't exist get the DETAIL: Could not open file "pg_subtrans/0002": No such file or directory.). Anyway I have not idea that what should be that file. Do you think could I get that data from other file? Or could I find the missing file using some tool? So pg_filedump show me empty for the other file in that directory pg_subtrans/0000.
Extra note: I found this useful blog post that talk about restore from just rescued files using pg_filedump, pg_class's file, reindex system and other tools and but is so hard for me to understand how to adapt it to my concrete and easier problem (I think that my problem is easier because I have a dump): https://www.commandprompt.com/blog/recovering_a_lost-and-found_database/
Finally we restored completely database based on PGDATA/base/37972 directory after 4 parts:
Checking and "greping" with pg_filedump -fi which file correspond
to each table. To "greping" easier we made a script.
#!/bin/bash
for filename in ./*; do
echo "$filename"
pg_filedump -fi "$filename"|grep "$1"
done
NOTE: Only useful with small strings.
Executing the great tool pg_filedump -D. -D is a new option (from postgresql-filedump version ≥10), to decode tuples using given comma separated list of types.
As we know types because we made the database, we "just" need to give a comma separated list of types related to the table. I wrote "just" because in some cases it could be a little bit complicated. One of our tables need this kind of command:
pg_filedump -D text,text,text,text,text,text,text,text,timestamp,text,text,text,text,int,text,text,int,text,int,text,text,text,text,text,text,text,text,text,text,int,int,int,int,int,int,int,int,text,int,int 38246 | grep COPY > restored_table1.txt
From pg_filedump -D manual:
Supported types:
bigint
bigserial
bool
char
charN -- char(n)
date
float
float4
float8
int
json
macaddr
name
oid
real
serial
smallint
smallserial
text
time
timestamp
timetz
uuid
varchar
varcharN -- varchar(n)
xid
xml
~ -- ignores all attributes left in a tuple
All those text for us were type character varying(255) but varcharN didn't work for us, so after other tests we finally change it for text.
timestamp for us was type timestamp with time zone but timetz didn't work for us, so after other tests we finally change it for timestamp and we opted to lose the time zone data.
This changes work perfect for this table.
Other tables were much easier:
pg_filedump -D int,date,int,text 38183 | grep COPY > restored_table2.txt
As we get just "raw" data we have to re-format to CSV format. So we made a python program for format from pg_filedump -D output to CSV.
We inserted each CSV to the PostgreSQL (after create each empty table again):
COPY scheme."table2"(id_comentari,id_observacio,text,data,id_usuari,text_old)
FROM '<path>/table2.csv' DELIMITER '|' CSV HEADER;
I hope this will help other people :)
That is doomed. Without the information in pg_xlog and (particularly) pg_clog, you cannot get the information back.
A knowledgeable forensics expert might be able to salvage some of your data, but it is not a straightforward process.

PostgreSQL: Import columns into table, matching key/ID

I have a PostgreSQL database. I had to extend an existing, big table with a few more columns.
Now I need to fill those columns. I tought I can create an .csv file (out of Excel/Calc) which contains the IDs / primary keys of existing rows - and the data for the new, empty fields. Is it possible to do so? If it is, how to?
I remember doing exactly this pretty easily using Microsoft SQL Management Server, but for PostgreSQL I am using PG Admin (but I am ofc willing to switch the tool if it'd be helpfull). I tried using the import function of PG Admin which uses the COPY function of PostgreSQL, but it seems like COPY isn't suitable as it can only create whole new rows.
Edit: I guess I could write a script which loads the csv and iterates over the rows, using UPDATE. But I don't want to reinvent the wheel.
Edit2: I've found this question here on SO which provides an answer by using a temp table. I guess I will use it - although it's more of a workaround than an actual solution.
PostgreSQL can import data directly from CSV files with COPY statements, this will however only work, as you stated, for new rows.
Instead of creating a CSV file you could just generate the necessary SQL UPDATE statements.
Suppose this would be the CSV file
PK;ExtraCol1;ExtraCol2
1;"foo",42
4;"bar",21
Then just produce the following
UPDATE my_table SET ExtraCol1 = 'foo', ExtraCol2 = 42 WHERE PK = 1;
UPDATE my_table SET ExtraCol1 = 'bar', ExtraCol2 = 21 WHERE PK = 4;
You seem to work under Windows, so I don't really know how to accomplish this there (probably with PowerShell), but under Unix you could generate the SQL from a CSV easily with tools like awk or sed. An editor with regular expression support would probably suffice too.

boolean field in redshift copy

I am producing a comma-separated file in S3 that needs to be copied to a staging table in a redshift database using the postgres COPY command.
It has one boolean field. With every sensible way I can think of to represent the boolean value in the file, redshift copy complains, usually with "Unknown boolean format".
I'm going to give up and change the staging table field to a smallint so that I can proceed with the copy and translate the value on the load from staging to the final redshift table, but I'm curious if anyone knows the correct incantation.
A zero or one works just fine for us.
Check your loads carefully, it may well be another issue that's 'pushing' invalid data into your boolean column.
For instance, we had all kinds of crazy characters embedded in our data that would cause errors like that. I eventually settled on using the US character for the record separator.
Check to make sure you're excluding the headers during the COPY command.
I ran into the same problem, but adding the ignoreheader 1 option (ignores 1 header line during import) solved the issue.

Export tables to Flat File with some logic

I'm writing scripts to export some tables to flat files every day. I'm looking at the BCP utility, but I'm not sure it has the kind of features I really need.
For example, I need to output the fields out of order. That is, the 15th field in the MSSQL database should be the 2nd field in the flat file, et.c
More importantly, some of the fields need to be altered. For example, if a certain field is null or contains some special values, I need to replace them with codes.
Is BCP the right tool for this? My gut tells me to do this in Perl instead.
You can write a stored procedure and do all data transformations there.
Then feed this stored procedure to bcp.
It will surely be faster than Perl.
SSIS is fast too; could be an option in case transformations are very complex.
You can use a query to order and format the columns directly with BCP
bcp Utility
"query"
Is a Transact-SQL query that returns a result set.
example:
bcp "SELECT Name FROM AdventureWorks.Sales.Currency" queryout Currency.Name.dat -T -c