PostgreSQL features a contrib module called amcheck. This module provides functions that allow you to verify the logical consistency of the structure of relations. Its functionalities originally appeared in PostgreSQL 10.
Where does the name come from? I understand the check part of the name, but what is the meaning of am?
“am” stands for access method, which is a way to access a table. You can find the same abbreviation in the system catalog pg_am, which lists all known access methods.
A historical note: the module was added in v10 with this commit after this discussion. Originally it only supported checks in B-tree indexes. Back then, “am” was roughly equivalent to “index type” in PostgreSQL slang, so the module name would have been understood as “index integrity check”. PostgreSQL v12 introduced table access methods (other ways to store data in tables), so that the meaning of access method has broadened. In v14, amcheck got a new function verify_heapam to check the integrity of tables, so the name “amcheck” has stood the test of time.
Related
If we just drop the database from PostgreSQL then can we say that data deleted permanently, and securely?
How can we follow industry-standard sanitization processes, e.g., NIST 800-88 with postgresql ?
There is no support for that built into PostgreSQL, since that belongs to the physical properties of the file system, and PostgreSQL uses the kernel's file system API and has no knowledge about the file system's inner workings.
Even if PostgreSQL went as far as overwriting files with random data before deleting them (which it doesn't), that wouldn't achieve anything on a copy-on-write file system after a snapshot has been taken.
You are approaching this on the wrong layer. This requirement has to be handled on the file system level.
I need to implement schema migration mechanism for PostgreSQL.
Just to remove ambiguity: with schema-migration I mean that I need upgrade my database structures to the latest version regardless of their current state on particular server instance.
For example in version one I created some tables, then in version two I renamed some columns and in version three I removed one table and created another one. I have multiple servers and on some of them I have version one on some version three etc.
My idea:
Generate hash for output produced by
pg_dump --schema-only
every time before I change my database schema. This will be a reliable way to identify database version in the future to which the patch should apply.
Contain a list of patches with the associated hashed to which they should apply.
When I need to upgrade my database I will run an application that will search for hash that corresponds to current database structure (by calculating hash of local database and comparing it with hash set that I have) and apply associated patch.
Repeat until next hash is not found.
Could you please point any weak sides of this approach?
Have you ever heard of https://pgmodeler.io ? At the company where I work we decided to go for this since it can perform schema diff even between local and remote. We are very satisfied with it.
Otherwise if you are more for a free solution, you could develop a migration tool which can be used to apply migrations you store in a single repo. Furthermore this tool could rely on a migration table you keep in a separate schema so that your DB(s) will always know which migrations were applied or not.
The beauty of this approach is that migrations can both be about a schema change and data changes.
I hope this can give you some ideas.
Is it possible to get the table structure like db2look from SQL?
Or the only way is from command line? Thus, by wrapping a external stored procedure in C I could call the db2look, but that is not what I am looking for.
Clarification added later:
I want to know which tables have the non logged option from SQL.
It is possible to create the table structure from regular SQL and the public DB2 catalog - however, it is complex and requires some deeper skills.
The metadata is available in the DB2 catalog views in the SYSCAT schema. For a regular table you would first start off by looking into the values in SYSCAT.TABLES and SYSCAT.COLUMNS. From there you would need to branch off to other views depending on what table and column options you are after, whether time-travel tables, special partitioning rules, or many other options are involved.
Serge Rielau published an article on developerWorks called Backup and restore SQL schemas for DB2 Universal Database that provides a set of stored procedures that will do exactly what you're looking for.
The article is quite old (2006) so you may need to put some time in to update the procedures to be able to handle features that were added to DB2 since the date of publication, but the procedures may work for you now and are a nice jumping off point.
Say we have an SQL database with a table Person and several applications accessing it. For some reason we would like to modify the Person table in a backward-incompatible way.
One potential solution for keeping compatibility is to rename the table to User and to create a Person view that provides the same interface as the legacy table. (Add on insert, on update and on delete triggers as needed).
The approach has the problem that we might run out of available semantically correct names after a few changes.
Is there a well-known best practice for "namespacing" the schema "interface" according to the DB version?
Alternatively, is there a better way to maintain backward-compatibility?
Is there a well-known best practice for "namespacing" the schema "interface" according to the DB version?
It's not a common requirement, but when I've seen the need for similar things I've tended to create a new schema that contains the backwards-compatible wrapper for the table in a separate schema (namespace). I then set the search_path on a per-user basis so that the user who needs the backward compat table sees it, and the others see the new version.
The BC view has a RULE or (in newer PostgreSQL versions) a DO INSTEAD trigger referring to the current version of the table explicitly from its normal schema, eg public.People, to support writes if required.
This only works if you need BC on a per-login-user basis where you can ALTER USER ... SET search_path, or (less likely) where you can set the application that needs BC to run a SET search_path command on each session.
How do you manage a major schema change when you are using a Nosql store like SimpleDB?
I know that I am still thinking in SQL terms, but after working with SimpleDB for a few weeks I need to make a change to a running database. I would like to change one of the object classes to have a unique id, as rather than a business name, and as it is referenced by another object, I will need to also update the reference value in these objects.
With a SQL database you would run set of sql statements as part of the client software deployment process. Obviously this will not work with something like SimpleDB as
there is no equivalent of a SQL update statement.
Due to the distributed nature of SimpleDB, there is no way of knowing when the changes you have made to the database have 'filtered' out to all the nodes running your client software.
Some solutions I have thought of are
Each domain has a version number. The client software knows which version of the domain it should use. Write some code that copies the data from one domain version to another, making any required changes as you go. You can then install new client software that then accesses the new domain version. This approach will not work unless you can 'freeze' all write access during the update process.
Each item has a version attribute that indicates the format used when it was stored. The client uses this attribute when loading the object into memory. Object can then be converted to the latest format when it is written back to SimpleDB. The problem with this is that the new software needs to be deployed to all servers before any writes in the new format occur, or clients running the old software will not know how to read the new format.
It all is rather complex and I am wondering if I am missing something?
Thanks
Richard
I use something similar to your second option, but without the version attribute.
First, try to keep your changes to things that are easy to make backward compatible - changing the primary key is the worst case scenario for this.
Removing a field is easy - just stop writing to that field once all servers are running a version that doesn't require it.
Adding a field requires that you never write that object using code that won't save that field. If you can't deploy the new version everywhere at once, use an intermediate version that supports saving the field before you deploy a version that requires it.
Changing a field is just a combination of these two operations.
With this approach changes are applied as needed - write using the new version, but allow reading of the old version with default or derived values for the new field.
You can use the same code to update all records at once, though this may not be appropriate on a large dataset.
Changing the primary key can be handled the same way, but could get really complex depending on which nosql system you are using. You are probably stuck with designing custom migration code in this case.
RavenDB another NoSQL database uses migrations to acheive this
http://ayende.com/blog/66563/ravendb-migrations-rolling-updates
http://ayende.com/blog/66562/ravendb-migrations-when-to-execute
Normally these type of changes are handled by your application that changes the schema to a newer one upon loading version X and converting to version Y and persisting