I am using Postgres 9.4.5 and I detected corruption in one of my tables. I noticed this when running queries on a specific table caused the entire database to go into recovery mode. The symptoms lined up with those found in this article:
https://www.endpoint.com/blog/2010/06/01/tracking-down-database-corruption-with
I tried following the steps to zero out the corrupt block but after following the steps, I got a ctid of 507578.
database=# \set FETCH_COUNT 1
database=# \pset pager off
Pager usage is off.
database=# SELECT ctid, left(coded_element_key::text, 20) FROM coded_element WHERE ctid >= '(507577,1)';
ctid | left
------------+----------
(507577,1) | 30010491
(507577,2) | 30010507
(507577,3) | 30010552
(507577,4) | 30010556
(507577,5) | 30010559
(507577,6) | 30010564
(507577,7) | 30010565
(507577,8) | 30010625
...
...
...
(507578,26) | 0A1717281.0002L270&.
(507578,27) | L270&.*)0000.0000000
(507578,28) | 30011452
(507578,29) | -L0092917\x10)*(0117001
(507578,30) | 0.00003840\x10)*)300114
ERROR: invalid memory alloc request size 1908473862
The problem is that when I went to my /data/base directory and found the corresponding file for my table, the file was only 1073741824 bytes. With a block size of 8192 bytes this only gives me a block count of 131072, way under the 507578 value where the supposed corruption is. Is this the correct way to determine the block offset or is there a different way?
PostgreSQL stores the table data in files of 1GB in size.
Assuming that the result of
SELECT relfilenode
FROM pg_class
WHERE relname = 'coded_element';
is 12345, those files would be called 12345, 12345.1, 12345.2 and so on.
Since each of these 1GB segments contains 131072 blocks, block 507578 is actually block 114362 in 12345.4.
The data corruption is either in that block or the following one.
At this point, make sure you have a backup the complete data directory.
To zero out block 507578, you can use
dd if=/dev/zero of=12345.4 bs=8192 seek=114362 count=1 conv=notrunc,nocreat,fsync
If that doesn't do the trick, try the next block.
To salvage data from the block before zeroing it, you can use the pageinspect extension.
Related
I followed the steps as outlined in this tutorial to use pg_prewarm extention for pre-warming the buffer cache in PostgreSQL:
https://ismailyenigul.medium.com/pg-prewarm-extention-to-pre-warming-the-buffer-cache-in-postgresql-7e033b9a386d
The first time I ran it, I got 1 as the result:
mydb=> SELECT pg_prewarm('useraccount');
pg_prewarm
------------
1
After that I had to drop the table and recreate it. Since then, when I run the same command, I get 0 as the result always. I am not sure if that's expected or if I am missing something?
mydb=> SELECT pg_prewarm('useraccount');
pg_prewarm
------------
0
The function comment in contrib/pg_prewarm/pg_prewarm.c says:
* [...] The
* return value is the number of blocks successfully prewarmed.
So the first time, there was 1 block in the table. After dropping and re-creating the table, it is empty, so 0 blocks are cached.
On development server I'd like to remove unused databases. To realize that I need to know if database is still used by someone or not.
Is there a way to get last access or modification date of given database, schema or table?
You can do it via checking last modification time of table's file.
In postgresql,every table correspond one or more os files,like this:
select relfilenode from pg_class where relname = 'test';
the relfilenode is the file name of table "test".Then you could find the file in the database's directory.
in my test environment:
cd /data/pgdata/base/18976
ls -l -t | head
the last command means listing all files ordered by last modification time.
There is no built-in way to do this - and all the approaches that check the file mtime described in other answers here are wrong. The only reliable option is to add triggers to every table that record a change to a single change-history table, which is horribly inefficient and can't be done retroactively.
If you only care about "database used" vs "database not used" you can potentially collect this information from the CSV-format database log files. Detecting "modified" vs "not modified" is a lot harder; consider SELECT writes_to_some_table(...).
If you don't need to detect old activity, you can use pg_stat_database, which records activity since the last stats reset. e.g.:
-[ RECORD 6 ]--+------------------------------
datid | 51160
datname | regress
numbackends | 0
xact_commit | 54224
xact_rollback | 157
blks_read | 2591
blks_hit | 1592931
tup_returned | 26658392
tup_fetched | 327541
tup_inserted | 1664
tup_updated | 1371
tup_deleted | 246
conflicts | 0
temp_files | 0
temp_bytes | 0
deadlocks | 0
blk_read_time | 0
blk_write_time | 0
stats_reset | 2013-12-13 18:51:26.650521+08
so I can see that there has been activity on this DB since the last stats reset. However, I don't know anything about what happened before the stats reset, so if I had a DB showing zero activity since a stats reset half an hour ago, I'd know nothing useful.
PostgreSQL 9.5 let us to track last modified commit.
Check track commit is on or off using the following query
show track_commit_timestamp;
If it return "ON" go to step 3 else modify postgresql.conf
cd /etc/postgresql/9.5/main/
vi postgresql.conf
Change
track_commit_timestamp = off
to
track_commit_timestamp = on
Restart the postgres / system
Repeat step 1.
Use the following query to track last commit
SELECT pg_xact_commit_timestamp(xmin), * FROM YOUR_TABLE_NAME;
SELECT pg_xact_commit_timestamp(xmin), * FROM YOUR_TABLE_NAME where COLUMN_NAME=VALUE;
My way to get the modification date of my tables:
Python Function
CREATE OR REPLACE FUNCTION py_get_file_modification_timestamp(afilename text)
RETURNS timestamp without time zone AS
$BODY$
import os
import datetime
return datetime.datetime.fromtimestamp(os.path.getmtime(afilename))
$BODY$
LANGUAGE plpythonu VOLATILE
COST 100;
SQL Query
SELECT
schemaname,
tablename,
py_get_file_modification_timestamp('*postgresql_data_dir*/*tablespace_folder*/'||relfilenode)
FROM
pg_class
INNER JOIN
pg_catalog.pg_tables ON (tablename = relname)
WHERE
schemaname = 'public'
I'm not sure if things like vacuum can mess this aproach, but in my tests it's a pretty acurrate way to get tables that are no longer used, at least, on INSERT/UPDATE operations.
I guess you should activate some log options. You can get information about logging on postgreSQL here.
The GoogleCloudSQL FAQ states that
For MySQL Second Generation instances, InnoDB is the only storage engine supported
My experiment indicates that engine=memory is possible, at least for temporary tables.
CREATE TEMPORARY TABLE mt (c CHAR(20)) ENGINE=memory;
Query OK, 0 rows affected
SHOW CREATE TABLE mt;
+---------+----------------+
| Table | Create Table |
|---------+----------------|
| mt | CREATE TEMPORARY TABLE `mt` (
`c` char(20) DEFAULT NULL
) ENGINE=MEMORY DEFAULT CHARSET=utf8 |
+---------+----------------+
1 row in set
Time: 0.022s
INSERT INTO mt (c) VALUES ('waaa' );
Query OK, 1 row affected
Time: 0.017s
SELECT * FROM mt;
+------+
| c |
|------|
| waaa |
+------+
1 row in set
Time: 0.019s
Is this avaiable but unsopported? Might google disable this without giving notice? Is this just left out of the FAQ because the message is that one should use innodb instead of myisam?
Thanks for your time.
Even though it is possible to use MEMORY tables to create tables (temporary tables only), it is not supported by Google Cloud, as it does not provide the same consistency as the InnoDB engine and may be prone to errors.
Besides, in newer Cloud SQL instances with 2nd Generation MySQL the use of any storage engine other than InnoDB will result in an error, such as:
ERROR 3161 (HY000): Storage engine MEMORY is disabled (Table creation is disallowed)
As of this moment, for Cloud SQL instances that use 2nd Generation MySQL, the only supported storage engine is InnoDB. If you can use the MEMORY engine on your instance, that means it is an older version. As the MEMORY engine is unsupported, Google may disable this feature without giving notice, as you comment.
My advice would be that although right now you can use the MEMORY engine for temporary tables in your Cloud SQL instance, please stick to the InnoDB engine as it is the only one supported by Google. The same message that mentions MyISAM also applies to other storage engines.
Can you give me suggestion to create table with starting with digits in postgresql.
use double quotes, eg:
t=# create table "42 Might be not The be$t idea" (i serial);
CREATE TABLE
t=# \d+ "42 Might be not The be$t idea"
Table "public.42 Might be not The be$t idea"
Column | Type | Modifiers | Storage | Stats target | Descript
ion
--------+---------+-----------------------------------------------------------------------------+---------+--------------+---------
----
i | integer | not null default nextval('"42 Might be not The be$t idea_i_seq"'::regclass) | plain | |
Please look close at what it leads to. Generally using mixed case, special characters and starting relation from number is kept a bad practice. Despite the fact that Postgres understands and works with such relation names, you have a risk to hit the bug with other software.
Without an experience you most probably shoot yourself in the foot. Eg pg_dump -t "badName" won't work. Bash will understand double quotes as own - and it is meant to work this way. So you have to specify pg_dump -t '"badName"' to find the table. And if you just fail to find a table you are lucky. Disaster is when you have badname and Badname in same schema.
The fact that it is doable does not mean you should jump into using it.
Lets say I have some customer data like the following saved in a text file:
|Mr |Peter |Bradley |72 Milton Rise |Keynes |MK41 2HQ |
|Mr |Kevin |Carney |43 Glen Way |Lincoln |LI2 7RD | 786 3454
I copied the aforementioned data into my customer table using the following command:
\copy customer(title, fname, lname, addressline, town, zipcode, phone) from 'customer.txt' delimiter '|'
However, as it turns out, there are some extra space characters before and after various parts of the data. What I'd like to do is call trim() before copying the data into the table - what is the best way to achieve this?
Is there a way to call trim() on every value of every row and avoid inserting unclean data in the first place?
Thanks,
I think the best way to go about this is to add a BEFORE INSERT trigger to the table you're inserting to. This way, you can write a stored procedure that will execute before every record is inserted and trim whitepsace (or do any other transformations you may need) on any columns that need it. When you're done, simply remove the trigger (or leave it, which will improve data integrity if you never want that whitespace int those columns). I think explaining how to create a trigger and stored procedure in PostgreSQL is probably outside the scope of this question, but I will link to the documentation for each.
I think this is the best way because it is simpler than parsing through a text file or writing shell code to do this. This kind of sanitization is the kind of thing triggers do very well and very simply.
Creating a Trigger
Creating a Trigger Function
I have somehow similar use case in one of the projects. My input files:
has number of lines in the file as a last line;
needs to have line numbers added on every line;
needs to have file_id added to every line.
I use the following piece of shell code:
FACT=$( dosql "TRUNCATE tab_raw RESTART IDENTITY;
COPY tab_raw(file_id,lnum,bnum,bname,a_day,a_month,a_year,a_time,etype,a_value)
FROM stdin WITH (DELIMITER '|', ENCODING 'latin1', NULL '');
$(sed -e '$d' -e '=' "$FILE"|sed -e 'N;s/\n/|/' -e 's/^/'$DSID'|/')
\.
VACUUM ANALYZE tab_raw;
SELECT count(*) FROM tab_raw;
" | sed -e 's/^[ ]*//' -e '/^$/d'
)
dosql is a shell function, that executes psql with proper connectivity info and executes everything, that was given as an argument.
As a result of this operation I will have $FACT variable holding a total count of inserter records (for error detection).
Later I do another dosql call:
dosql "SET work_mem TO '800MB';
SELECT tab_prepare($DSID);
VACUUM ANALYZE tab_raw;
SELECT tab_duplicates($DSID);
SELECT tab_dst($DSID);
SELECT tab_gaps($DSID);
SELECT tab($DSID);"
to get analyze and move data into the final tables from auxiliary one.