Liquibase changesets can not be rerun - postgresql

Our Liquibase script can not be rerun because underlying column is already gone
Consider the following changesets:
A table "foo" is created, and "domain" is one of the columns in this table;
A constraint (in form of an index) is placed on the column "domain";
Column "domain" is dropped from the table "foo".
Now when we try to rerun all liquibase scripts (over already existing DB structure), changeset 2 fails with
[ERROR] Reason: liquibase.exception.DatabaseException: ERROR: column "domain" named in key does not exist
all because the "domain" column in the actual DB is already gone before changeset 2 is run.
Is there any better way to make these changesets runnable other than recreating the "domain" column in the table so that all 3 changesets can run?
there are hundreds of changesets in the system besides the 2 above;
the solution is strongly preferred to avoid any manual steps because there are dozens of environments in which the changesets must be rerun;
In a perfect world, a developer would have placed a preConditions on changeset 2 to check that not only the index is missing, but the underlying column exists, but we have to deal with what we have. It is my understanding that rewriting existing changesets is strongly discouraged in liquibase.

You can always add a preCondition to the changeSets #2 and #3 to check that the domain column exists, e.g.:
<preConditions onFail="MARK_RAN">
<columnExists tableName="foo" columnName="domain"/>
</preConditions>
If these changeSets will start to fail with the "different checksum error", than you can always provide the new checksum or just add <validCheckSum>ANY</validCheckSum>.
This way you'll be able to run these changeSets in all environments you need.
Rewriting the changeSets is discouraged, but writing preConditions for the changeSets is quire encouraged.

According to the comment, your problem is changing liquibase scripts directory location.
What actually happens is liquibase will compare liquibase script's relative path when executing these changesets in your scripts. You can find this relative path in the databasechangelog table and under column filename.
First thing you should understand is problem is not with the checksum. So it will not solve your problem.
The easiest thing that you can do is change the values in column filename in table databasechangelog. If you have more than one liquibase script files I suggest you to change them one by one. Simple sql query like this can do the job.
update databasechangelog set filename='<new_filename>' where filename='<old_filename>'
Note: You can make the situation worse if you did it wrong. Make sure you double check everything before you make any changes.

Related

Sphinx Error: (type='index') already exists

I have my sphinx configuration broken out into 3 files as I have several different indexes that each use common files.
So I have a master sphinx.com that defines paths for where source and index are
Then each index has a source file (sql select, fields) and an index file with all the wordform paths and any rules for that specific index of data.
So I've been making changes to one of the indexes but all of a sudden am getting this error:
ERROR: section 'MyIndex_0' (type='index') already exists in /etc/sphinxsearch/sphinx.conf line 17863 col 19
Yet I did not touch the sphinx.conf file.
Update: As a test I found an old version of the index file I am trying to rotate so I still get the error. So it is not changes I made to this file (once in a while when making changes I get an error which is always due to some type or another).
Can some file have gotten corrupted?
I did stop and start sphinx to no avail.

Merging LiquiBase changesets

I'm developing a system with database version control in LiquiBase. The system is still in pre-alpha development and there are a lot of changes that were reverted or supplemented by other changes (tables removed, columns added and removed).
The current change set reflects the whole development history with many failed experiments, and this whole is rollouted when initializing the database.
Because there is NO release version, I can start from scratch and pull actual DB state in single XML changeset.
Is there a way to tell LiquiBase to merge all change sets into one file, or the only way to do that is per hand?
Just use your existing database to generate change log that will be used from now on. For this you can use generateChangeLog command from command line, it will generate the changelog file with all the changeSets that represent your current state of the database. You can use this file in your project as initial db creation file, to be used on an empty database. Here's a link to docs.
There is a page in the Liquibase docs which discusses this scenario in detail:
http://www.liquibase.org/documentation/trimming_changelogs.html
To summarise, they recommend that you don't bother since consolidating your changelogs is both risky and low-reward.
If you do want to push ahead with this, then restarting the changelog using generateChangeLog, as suggested by #veljkost, is probably the easiest way. This is documented at http://www.liquibase.org/documentation/existing_project.html
Hence I didn't find automatic solution for this problem in case the changelog is already deployed on several databases in different states, I will describe here my solution for this problem:
Generate changelog of your current development state of database using liquibase generate changelog, like:
mvn liquibase:generateChangeLog -Dliquibase.outputChangeLogFile=current_state.yml
Audit generated changelog, check whether it looks good (liquibase is not perfect, it often generates stupid statements). Also if you have in your schema some static data, like dictionaries or so, which were previosuly populated using liquibase, you have to add these to generated changelog as well, you can export data from your database using generate changelog command mentioned above with -Dliquibase.diffTypes=data property.
Now to prevent the execution of generated changelog (it will obviously fail on existing databases, on prod, test, and other developers local envs), you can do this using for example liquibase changelogSync, or using liquibase contexts, but all this options require you to do some manual work on every database. You can achieve automatic result by adding the preConditions statements for your changeSets.
For changesets intended to run on empty databases (changelogs you generated in step 1. above) you can add something like this:
preConditions:
- onFail: MARK_RAN
- not:
- tableExists:
tableName: t_project
Where t_project is the table name that existed before (most likely this should be table added in first changeSet, so every database which runned at least one changeSet will have this). This will mark generated changelog as run on environments with existing schema, and will run generated changelog for every new database you would like to migrate.
Unfortunatelly you have to adjust all legacy changesets as well (I didn't found better solution yet, I did this change using regex and sed), you have to add something like that:
preConditions:
- onFail: MARK_RAN
- tableExists:
tableName: t_project
So opposite condition, from above one. With this all databases which runned at least one changeset in past, will continue to migrate (EXECUTED status of changesets) until changeset generated in step 1. above, and mark generated changesets with MARK_RAN. And for new databases, all previous changesets will be skipped, and first executed will be one generated in step 1. above.
With this solution you can push your merged changelog at any time, and every environment and developer won't have any problem with manual syncing or so.

What will happen if two processes modify data in two transactions at the same time and there is a unique constraint on the table?

I am thinking about a race condition in a production system I am working on. Database is PostgreSQL. Application is written in Java, but this is not relevant.
There is a table called "versions", which contains columns "entity_ID" and "version" (and some other fields). This table contains versions of a certain entity.
There is an application where user can modify those entities.
Every modification of an entity creates a new version to the tabel "versions" (using a trigger). This trigger finds the last version in the same table "versions" and inserts a new row with the same entity_ID, but version = (last version + 1).
There is a nightly job that is run in PostgreSQL every 4:00 that also changes those entities and therefore updates data in the table "versions". This procedure was designed to finish its work by the morning (before users of the application start to use it), but unfortunately runs into the day. As this procedure is run in a function, then it is one big transaction. Therefore the changes done by it are not visible to the application.
The nightly job uses the following workflow:
Set "failed_counter" = 0
Iterate over entities that need to be modified
Do modifications to the entity inside a BEGIN .. EXCEPTION .. END block
If there is an EXCEPTION, increase the "failed_counter". Log the exception and the failed entity to a log table.
If "failed_counter" > 10, cancel work.
End work
This has caused the following race condition to happen a few times (lets assume that X is the last version of entity A):
Nightly job starts
Nightly job modifies entity A, creating version X+1
Application is used to also modify entity A, creating also version X+1 (because the nightly job transaction has not COMMITed, so the version X+1 is not visible to the application)
Nightly job ends, causing COMMIT
There are now two versions with version number X+1, which causes application to break.
I thought that I could just solve the problem by using an UNIQUE CONSTRAINT over fields (entity_ID, version). I thought that it would cause the application to receive an error (due to violating the UNIQUE CONSTRAINT) at race condition step 3. But I am not sure how does the unique constraint work in this situation. In race condition step 3, when the application adds a version, does the database check the UNIQUE CONSTRAINT? I suppose not, since the transaction of the nightly process has not been completed. If I am correct and the UNIQUE CONSTRAINT is checked only at race condition step 4, when COMMIT is made, then this causes the whole nightly procedure to fail, which is not desired result.
So, the question is the following.
When is the UNIQUE CONSTRAINT checked: At race condition step 3 or race condition step 4?
If the answer to the last question is "race condition 4", then how could I change the design of the system to avoid the above-mentioned problems?
By default, unique constraints in PostgreSQL are checked at the end of each statement. It's easy to test the behavior using psql.
Some big, red flags . . .
As this procedure is run in a function, then it is one big transaction.
It's not one, big transaction because you're running a function. It's one, big transaction because you haven't run the function several times over smaller subsets of the data. Whether you can run the function over subsets is application-dependent.
Iterate over entities that need to be modified
Rough rule of thumb for SQL databases: iteration is always a mistake.
SQL is a set-oriented language. Dealing with sets is usually faster than iteration, and often by several orders of magnitude.
If "failed_counter" > 10, cancel work.
This looks suspicious. Why are nine failures ok? Why are any failures ok?
I thought that I could just solve the problem by using an UNIQUE CONSTRAINT over fields (entity_ID, version).
That you don't already have a unique constraint on those two columns is a big, waving red flag. Fix this first.
The fact that an application should apparently be waiting for a batch job to finish, but isn't waiting, might or might not be a system design issue. (It smells like a system design issue.)
There is a nightly job that is run in PostgreSQL every 4:00 ...
Did you think of starting at 3:00?
Test this, but not on your production server.
Drop the trigger.
Add a column of type timestamp with time zone.
Set that column's default value. Most applications will use current_timestamp, but you might want clock_timestamp() instead. Docs
Add a unique constraint on {entity_id, new timestamp column}.
Eliminating the trigger might speed things up enough for you.

Managing database changes

I'm starting to move more logic into the database, using triggers, views, functions, CTEs, etc. When plv8/json comes out for postgres, I can see myself putting lots of logic in there.
I'm having problems with the "standard" way of doing database migrations in sequel and activerecord. Both sequel and activerecord let you put arbitrary sql code into timestamped files. When each file is ran, a schema_versions table is updated with the filename (or timestamp in the filename), which keeps record of which migrations have been applied to the current database.
If a lot of coding is being done at the database level, that means that modifications to existing views, functions, etc follow the below pattern:
Migration 1 defines a function and a view that uses that function.
-- Migration 1
create function calculate(x int) returns int as $$
return x + 1;
$$ language sql;
create view foos as (
select something, calculate(something) from a_table
);
Requirements change, and I need to change a function type. In Migration 2 I have to drop all objects that depend on foo, and recreate them by copying their entire body -- even if there weren't any changes in most of the other code!
-- Migration 2
-- Have to drop all views and functions that depend on the
-- `calculate(int)` function.
drop view foos;
create or replace calculate(x bigint) returns bigint as $$
return x + 1;
$$ language sql;
-- I could do `drop function calculate(int) cascade`,
-- but I might accidentally drop some objects that wouldn't get recreated below.
-- Now I have to recreate foo.
create view foos as (
select something, calculate(something) from a_table
);
If I'm building a system based on views and functions and triggers, my migrations would be filled with duplicated code, and it's difficult to find the latest version of the code. You might say "don't do that!", but for my purposes (e-commerce, shipping, transactions), I'm finding it's a lot easier and faster to have the database ensure the integrity of the data by doing the logic inside the database.
You can (of course) dump the current database schema (which includes all the code definitions), but I think you lose comments. And you wouldn't generally want to edit a giant file that contains the whole schema.
Any ideas on how to solve this problem?
My best idea is to how the sql code contained in their own canonical files (app/sql/orders/shipping.sql, app/sql/orders/creation.sql, etc). Everyone develops directly on these. Whenever it's time for a release, then you'd want to make a new migration file, look at all the changed code since the previous release, figure out the dependency chain of the database objects that need to be dropped and recreated, and then copy the sql from the canonical sql files into a new sequel/activerecord migration file. But it's a pain. :/
Thoughts are very welcome. I hope I explained this well enough, I'm cutting back on my caffeine intake and I'm a little groggy atm.
Oh, I asked a similar question on Stack Overflow: Changing the type of a column used in other views The answer was a function that let me pass in:
sql code to run
database views to drop and recreate
The function would retrieve the view definition, drop the views, run the sql code, then recreate the view definition (in reverse order of dropping). Perhaps a system of functions like this would help solve the problem of having to copy/paste sql code into the migration files.
I'd recommend liquibase.
You create files which track the changes to your database and these will be run into the database in the correct migration order.
You might find Dave Wheeler's blog-posts interesting starting from here:
http://justatheory.com/computers/databases/simple-sql-change-management.html
My rate of database change is fairly small but I tend to be careless and make small changes to the schema directly, so I've had to come up with a fair bit of infrastructure to catch when I've done so. The basic elements are:
A makefile that can rebuild a development database from scratch
A set of schema-files separated into "modules" (lookups_schema.sql, lookup_data.sql)
A set of update files that transition from one revision to the next
I don't usually have the corresponding downgrade scripts, some people do
A script to populate my database with a plausible amount of test data
Crucially, a test suite via pgTAP that checks my various functions, views and also the upgrade scripts. The upgrade tests can be run against a live database too.
If you have a separate instance of PostgreSQL set up with fsync turned off / on ramdisk etc then rebuilding the whole DB and populating it can take seconds (if you don't have too much test data).
Start with #1, #2, then add #6 (pgTAP is very cool), then the rest. The crucial thing is a test suite that checks your in-database code.
There are tools that try to automate schema changes for you, but they are really only good at adding a new column to a table and that sort of thing. Once you have code in your db then they're not much help.

Does DBIx::Class::Schema::Loader cache its moniker map?

Recently we added a "audit_logs" table to the database, and after some frustration I realised that there was already an "auditlog" table in the database for some reason. It wasn't being used so I dropped it. I deleted the Auditlog.pm and AuditLogs.pm files from my schema, and then regenerated. For some reason DCSL again created AuditLogs.pm for the "audit_logs" table, even though there was no longer an "auditlog" table or Auditlog.pm file that would conflict with it.
I have tried just about everything I can think of to get it to generate Log.pm without success. The only thing that I can figure is that it is caching the moniker map somewhere, and I cannot seem to reset it.
I eventually tracked this problem down to an issue with the Lingua inflector. It was picking up "logs" as a singular verb instead of a plural noun. This happened because it followed the word "audit" which ends with "it." Basically, I had to write a custom moniker_map function that added an exception for audit_logs.