Select query doesn't work with specific column condition in postgresql - postgresql

Background:
PostgreSQL service faced some corruption after the server power outage. And I used the pg_resetwal command to fix that issue. As suggested here
After the service successfully starts I'm facing this weird issue.
When I query with the id column, it doesn't result anything even the queried data is there and column type matches
# SELECT id, email FROM users WHERE id=1;
id | email
----+-------
(0 rows)
But if I query with other columns (in this example, email column), it results
# SELECT id, email FROM users WHERE email='john#gmail.com';
id | email
----+--------------------------
1 | john#gmail.com
(1 row)
Any suggestions?
Postgresql version: 12.7

OK, so after you have managed to get your PostgreSQL instance running what you should have done is:
Take an immediate backup of all your databases
Audit them + check for any damage
Drop the existing dbs and restore from the audited backups
Identify the mis-configuration in your server that resulted in the data corruption.
I assume you haven't done these things and have a corrupted index.
Your hardware is lying to PostgreSQL about persisting data to disk. It isn't safe to trust the existing data - anything that was being updated (not just directly updated by you, but vacuum processess too) is suspect.

Related

PostgreSQL select query with where clause 'COLUMN_1 is null' hangs

Recently we restored PostgreSQL database from the backup which was created without stopping database (I know this was very wrong and now we are paying the price). The backup was simple database directory backup.
Now we noticed that when we execute
select *
from table
where COLUMN_1 is null
query in one of our tables the query just hangs (freezes) and never finishes. Other queries on the same table run fine and distinct(COLUMN_1) returns all the values. The same query runs correctly on the other column COLUMN_2 is null. It seems there is something wrong with that one column.
How can I repair such possibly damaged table?
Dump the whole database with pg_dump and restore it to a new cluster. If that works, it will get rid of all data corruption.
If that fails, you should hire a specialist.
If you attach to the hanging backend with a debugger, you can investigate what it is doing (if you are familiar with the source).
I ran vacuum full and it solved my problem. The query now executes with is null clause on that table.

After numerous (error-free) inserts to Aurora (PostgreSQL) RDS serverless cluster with SQLAlchemy I can't see the table. What happened to my data?

After changes to some Terraform code, I can no longer access the data I've added into an Aurora (PostgreSQL) database. The data gets added into the database as expected without errors in the logs but I can't find the data after connecting to the database with AWS RDS Query Editor.
I have added thousands of rows with Python code that uses the SQLAlchemy/PostgreSQL engine object to insert a batch of rows from a mappings dictionary, like so:
if (count % batch_size) == 0:
self.engine.execute(Building.__table__.insert(), mappings)
self.session.commit()
The logs from this data ingest show no errors, the commits all appear to have completed successfully. So the data was inserted someplace, I just can't work out where that is, as it's not showing up in the AWS Console RDS Query Editor. I run the SQL below to find the table, with zero rows returned:
SELECT * FROM information_schema.tables WHERE table_name = 'buildings'
This has worked as expected before (i.e. I could see the data in the Aurora database via the Query Editor) so I'm trying to work out which of the recently modified Terraform settings have caused the issue.
Where else can I look to find where the data was inserted, assuming that it was actually inserted somewhere? If I can work that out it may help reveal the culprit.
I suspect misleading capitalization. Like "Buildings". Search again with:
SELECT * FROM information_schema.tables WHERE table_name ~* 'building';
Or:
SELECT * FROM pg_catalog.pg_tables WHERE tablename ~* 'building';
Or maybe your target wasn't a table? You can "write" to simple views. Check with:
SELECT * FROM pg_catalog.pg_class WHERE relname ~* 'building';
None of this is specific to RDS. It's the same in plain Postgres.
If the last query returns nothing, you are in the wrong database. (You are aware that there can be multiple databases in one DB cluster?) Or you have a serious problem.
See:
How to check if a table exists in a given schema
Are PostgreSQL column names case-sensitive?
Once I logged more information regarding the connection I discovered that the database name being used was incorrect, so I have been querying the Aurora instance using the wrong database name. Once I worked this out and used the correct database name the select statements in AWS RDS Query Editor worked as expected.

Odoo 10 is not backing up DB in PostgreSQL 9.5. Shows "SQL state: 22008. Timestamp out of range on account_bank_statement_line."

At our company we had a DB crash a few days ago due to hardware reasons. We recovered from that but since then we're having this following error every time we try to back up our DB.
pg_dump: ERROR: timestamp out of range
pg_dump: SQL command to dump the contents of table "account_bank_statement_line"
The error is in "account_bank_statement_line" table, where we have 5 rows created with only the 'create_date' column has a date of year 4855(!!!!), the rest of the columns have null value, even the id (primary key). We can't even delete or update those rows using pgAdmin 4 or PostgreSQL terminal.
We're in a very risky stage right now with no back up of few days of retail sales. Any hints at least would be very highly appreciated.
First, if the data are important, hire a specialist.
Second, run your pg_dump with the option --exclude-table=account_bank_statement_line so that you at least have a backup of the rest of your database.
The next thing you should do is to stop the database and take a cold backup of all the files. That way you have something to go back to if you mess up.
The key point to proceed is to find out the ctids (physical addresses) of the problematic rows. Then you can use that to delete the rows.
You can approach that by running queries like
SELECT create_date FROM account_bank_statement_line
WHERE ctid < '(42,0)';
and try to find the ctids where you get an error. Once you have found a row where the following falls over:
SELECT * FROM account_bank_statement_line
WHERE ctid = '(42,14)';
you can delete the row by its ctid.
Once you are done, take a pg_dumpall of the database cluster, create a new one and restore the dump. It is dangerous to continue working with a cluster that has experienced corruption, because corruption can remain unseen and spread.
I know what we did might not be the most technically advanced, but it solved our issue. We consulted a few experts and what we did was:
migrated all the data to a new table (account_bank_statement_line2), this transferred all the rows that had valid data.
Then we deleted the "account_bank_statement_line" table and
renamed the new table to "account_bank_statement_line".
After that we could DROP the table.
Then the db backup ran smoothly like always.
Hope this helps anyone who's in deep trouble like us. Cheers!

Moving a table from a database to another - Only insert missing rows

I have two databases that are alike, one called datastore and the other called datarestore.
datarestore is a copy of datastore which was created from a backup image. The problem is that I accidentally deleted a little too much data from datastore.
Both databases are located on different AWS instances and I typically connect to them using pgAdmin III or Python to create scripts that handle the data.
I want to get the rows that I accidentally deleted from datastore which are in datarestore into datastore. Does anyone have any idea of how this can be achieved. Both databases contain close to 1.000.000.000 rows and are on version 9.6.
I have seen some backup/import/restore options within pgAdmin III, I just don't know how they work and if they support my needs? I also thought about creating a python script, but querying my database has become pretty slow, so this seems not to be an option either.
-----------------------------------------------------
| id (serial - auto incrementing int) | - primary key
| did (varchar) |
| sensorid (int) |
| timestamp (bigint) |
| data (json) |
| db_timestamp (bigint) |
-----------------------------------------------------
If you preserved primary keys between those databases then you could create foreign tables pointing from datarestore to datastore and check what keys are missing (using for example select pk from old_table except select pk from new_table) and fetch those missing rows using the same foreign table you created. This should limit your first check for missing PK to just index only scans (+ network transfer) and then it will be index scan to fetch missing data. If you are missing only small part of it then it shouldn't take long.
If you require more detailed example then I'll update my answer.
EDIT:
Example of foreign table/server usage
Those commands need to be exuecuted on datarestore (or datastore if you choose to push data instead of pulling it).
If you don't have foreign data wrapper "installed" yet:
CREATE EXTENSION postgres_fdw;
This will create virtual server on your datarestore host. It is just some metadata pointing at foreign server:
CREATE SERVER foreign_datastore FOREIGN DATA WRAPPER postgres_fdw
OPTIONS (host 'foreign_hostname', dbname 'foreign_database_name',
port '5432_or_whatever_you_have_on_datastore_host');
This will tell your datarestore host what user should it connect as when using fdw on server foreign_datastore. It will be used only for your_local_role_name logged in on datarestore:
CREATE USER MAPPING FOR your_local_role_name SERVER foreign_datastore
OPTIONS (user 'foreign_username', password 'foreign_password');
You need to create schema on datarestore. It is where new foreign tables will be created.
CREATE SCHEMA schema_where_foreign_tables_will_be_created;
This will log in to remote host and create foreign tables on datarestore, pointing to tables at datastore. ONLY tables will be done this way.
No data will be copied, just structure of tables.
IMPORT FOREIGN SCHEMA foreign_datastore_schema_name_goes_here
FROM SERVER foreign_datastore INTO schema_where_foreign_tables_will_be_created;
This will return list of id that are missing in your datarestore database for this table
SELECT id FROM foreign_datastore_schema_name_goes_here.table_a
EXCEPT
SELECT id FROM datarestore_schema.table_a
You can either store them in temp table (CREATE TABLE table_a_missing_pk AS [query from above here]
Or use them right away:
INSERT INTO datarestore_schema.table_a (id, did, sensorid, timestamp, data, db_timestamp)
SELECT id, did, sensorid, timestamp, data, db_timestamp
FROM foreign_datastore_schema_name_goes_here.table_a
WHERE id = ANY((
SELECT array_agg(id)
FROM (
SELECT id FROM foreign_datastore_schema_name_goes_here.table_a
EXCEPT
SELECT id FROM datarestore_schema.table_a
) sub
)::int[])
From my tests, this should push-down (meaning send to remote host) something like that:
Remote SQL: SELECT id, did, sensorid, timestamp, data, db_timestamp
FROM foreign_datastore_schema_name_goes_here.table_a WHERE ((id = ANY ($1::integer[])))
You can make sure it does by running explain verbose on your full query to see what plan it will execute. You should see Remote SQL in there.
In case it does not work as expected, you can instead create temp table as mentioned earlier and make sure that this temp table is on datastore host.
Alternative approach would be to create foreign server on datastore pointing to datarestore and push data from your old database to new one (you can insert into foreign tables). This way you won't have to worry about list of id not being pushed down to datastore and instead fetching all data and filtering them afterwards (with would be extremely slow).

How to commit a ghost transaction in Postgresql 9.3?

I am working on a sample dataset restored from my customer's backup.
For some tables, the select count(*) returns a number, but the select * returns nothing.
I already rebuilt all indexes.
I suspect that these tables were loaded with an ETL process, using a transaction that was not committed, and all came to me with the backup.
How can I commit pending transactions, if any, in his case ?
Thanks.
Fred
The source of the issue was the name of one colunm of the table: PCF_index
psql generated "PCF_index" when exporting the schema. This caused trouble to Squirrel when trying to read it.
After the column is renamed pcf_index, the issue was solved.