Mongock inserting duplicate changeset on runAlways=true - mongodb

My application uses mongock 4.1.19 and when ever there is a changeSet with runAlways=true, there are duplicate entries getting created in the dbchangelog collection.
the below line does not seem to consider already executed case and may be resulting in duplicate changelog entries
Any pointers on how this can be addressed
https://github.com/cloudyrock/mongock-core/blob/91d15d65a22234f4a2e8d28c759d0641d36750e0/mongock-runner/mongock-runner-core/src/main/java/com/github/cloudyrock/mongock/runner/core/executor/MigrationExecutor.java#L139
Below Logger logged at startup -
RE-APPLIED - ChangeEntry{...}

It's not really duplicated. It creates a changelog entry per execution.
However, we understand this is not the more common desired behaviour, we are releasing a bugfix(4.3.8) for version 4 in the next days, probably today.
In version 5, which is under development, we'll keep this by default, plus updating the last_execution field we'll add, and add the option to insert a new entry per execution if desired.

Related

Data syncing with pouchdb-based systems client-side: is there a workaround to the 'deleted' flag?

I'm planning on using rxdb + hasura/postgresql in the backend. I'm reading this rxdb page for example, which off the bat requires sync-able entities to have a deleted flag.
Q1 (main question)
Is there ANY point at which I can finally hard-delete these entities? What conditions would have to be met - eg could I simply use "older than X months" and then force my app to only ever displays data for less than X months?
Is such a hard-delete, if possible, best carried out directly in the central db, since it will be the source of truth? Would there be any repercussions client-side that I'm not foreseeing/understanding?
I foresee the number of deleted's growing rapidly in my app and i don't want to have to store all this extra data forever.
Q2 (bonus / just curious)
What is the (algorithmic) basis for needing a 'deleted' flag? Is it that it's just faster to check a flag rather than to check for the omission of an object from, say, a very large list. I apologize if it's kind of a stupid question :(
Ultimately it comes down to a decision that's informed by your particular business/product with regards to how long you want to keep deleted entities in your system. For some applications it's important to always keep a history of deleted things or even individual revisions to records stored as a kind of ledger or history. You'll have to make a judgement call as to how long you want to keep your deleted entities.
I'd recommend that you also add a deleted_at column if you haven't already and then you could easily leverage something like Hasura's new Scheduled Triggers functionality to run a recurring job that fully deletes records older than whatever your threshold is.
You could also leverage Hasura's permissions system to ensure that rows that have been deleted aren't returned to the client. There is documentation and examples for ways to work with soft deletes and Hasura
For your second question it is definitely much faster to check for the deleted flag on records than to have to try and diff the entire dataset looking for things that are now missing.

Typo3 8.7.24 Scheduler is doing nothing

could not find an answer here, therefore I ask it :-)
Using Typo3 8.7.24 and want to use scheduler for removing delete db entries (recycler). I made the task and executed it manually.
Result is: nothing is done. Recylcer still shows the deleted entries.
Does somebody have an idea why this is not working?
BR
Jürgen
Have you configured the scheduler task correctly?
Aside from the configuration how often the task should be executed you need to configure:
"Delete entries older than (in days)"
how long ago should a deletion have occurred so that the record can be removed?
"Tables"
which tables should be considered to clean up?
Either your deletion wasn't that long ago or your records do not belong to any monitored table.

Merging LiquiBase changesets

I'm developing a system with database version control in LiquiBase. The system is still in pre-alpha development and there are a lot of changes that were reverted or supplemented by other changes (tables removed, columns added and removed).
The current change set reflects the whole development history with many failed experiments, and this whole is rollouted when initializing the database.
Because there is NO release version, I can start from scratch and pull actual DB state in single XML changeset.
Is there a way to tell LiquiBase to merge all change sets into one file, or the only way to do that is per hand?
Just use your existing database to generate change log that will be used from now on. For this you can use generateChangeLog command from command line, it will generate the changelog file with all the changeSets that represent your current state of the database. You can use this file in your project as initial db creation file, to be used on an empty database. Here's a link to docs.
There is a page in the Liquibase docs which discusses this scenario in detail:
http://www.liquibase.org/documentation/trimming_changelogs.html
To summarise, they recommend that you don't bother since consolidating your changelogs is both risky and low-reward.
If you do want to push ahead with this, then restarting the changelog using generateChangeLog, as suggested by #veljkost, is probably the easiest way. This is documented at http://www.liquibase.org/documentation/existing_project.html
Hence I didn't find automatic solution for this problem in case the changelog is already deployed on several databases in different states, I will describe here my solution for this problem:
Generate changelog of your current development state of database using liquibase generate changelog, like:
mvn liquibase:generateChangeLog -Dliquibase.outputChangeLogFile=current_state.yml
Audit generated changelog, check whether it looks good (liquibase is not perfect, it often generates stupid statements). Also if you have in your schema some static data, like dictionaries or so, which were previosuly populated using liquibase, you have to add these to generated changelog as well, you can export data from your database using generate changelog command mentioned above with -Dliquibase.diffTypes=data property.
Now to prevent the execution of generated changelog (it will obviously fail on existing databases, on prod, test, and other developers local envs), you can do this using for example liquibase changelogSync, or using liquibase contexts, but all this options require you to do some manual work on every database. You can achieve automatic result by adding the preConditions statements for your changeSets.
For changesets intended to run on empty databases (changelogs you generated in step 1. above) you can add something like this:
preConditions:
- onFail: MARK_RAN
- not:
- tableExists:
tableName: t_project
Where t_project is the table name that existed before (most likely this should be table added in first changeSet, so every database which runned at least one changeSet will have this). This will mark generated changelog as run on environments with existing schema, and will run generated changelog for every new database you would like to migrate.
Unfortunatelly you have to adjust all legacy changesets as well (I didn't found better solution yet, I did this change using regex and sed), you have to add something like that:
preConditions:
- onFail: MARK_RAN
- tableExists:
tableName: t_project
So opposite condition, from above one. With this all databases which runned at least one changeset in past, will continue to migrate (EXECUTED status of changesets) until changeset generated in step 1. above, and mark generated changesets with MARK_RAN. And for new databases, all previous changesets will be skipped, and first executed will be one generated in step 1. above.
With this solution you can push your merged changelog at any time, and every environment and developer won't have any problem with manual syncing or so.

What will happen if two processes modify data in two transactions at the same time and there is a unique constraint on the table?

I am thinking about a race condition in a production system I am working on. Database is PostgreSQL. Application is written in Java, but this is not relevant.
There is a table called "versions", which contains columns "entity_ID" and "version" (and some other fields). This table contains versions of a certain entity.
There is an application where user can modify those entities.
Every modification of an entity creates a new version to the tabel "versions" (using a trigger). This trigger finds the last version in the same table "versions" and inserts a new row with the same entity_ID, but version = (last version + 1).
There is a nightly job that is run in PostgreSQL every 4:00 that also changes those entities and therefore updates data in the table "versions". This procedure was designed to finish its work by the morning (before users of the application start to use it), but unfortunately runs into the day. As this procedure is run in a function, then it is one big transaction. Therefore the changes done by it are not visible to the application.
The nightly job uses the following workflow:
Set "failed_counter" = 0
Iterate over entities that need to be modified
Do modifications to the entity inside a BEGIN .. EXCEPTION .. END block
If there is an EXCEPTION, increase the "failed_counter". Log the exception and the failed entity to a log table.
If "failed_counter" > 10, cancel work.
End work
This has caused the following race condition to happen a few times (lets assume that X is the last version of entity A):
Nightly job starts
Nightly job modifies entity A, creating version X+1
Application is used to also modify entity A, creating also version X+1 (because the nightly job transaction has not COMMITed, so the version X+1 is not visible to the application)
Nightly job ends, causing COMMIT
There are now two versions with version number X+1, which causes application to break.
I thought that I could just solve the problem by using an UNIQUE CONSTRAINT over fields (entity_ID, version). I thought that it would cause the application to receive an error (due to violating the UNIQUE CONSTRAINT) at race condition step 3. But I am not sure how does the unique constraint work in this situation. In race condition step 3, when the application adds a version, does the database check the UNIQUE CONSTRAINT? I suppose not, since the transaction of the nightly process has not been completed. If I am correct and the UNIQUE CONSTRAINT is checked only at race condition step 4, when COMMIT is made, then this causes the whole nightly procedure to fail, which is not desired result.
So, the question is the following.
When is the UNIQUE CONSTRAINT checked: At race condition step 3 or race condition step 4?
If the answer to the last question is "race condition 4", then how could I change the design of the system to avoid the above-mentioned problems?
By default, unique constraints in PostgreSQL are checked at the end of each statement. It's easy to test the behavior using psql.
Some big, red flags . . .
As this procedure is run in a function, then it is one big transaction.
It's not one, big transaction because you're running a function. It's one, big transaction because you haven't run the function several times over smaller subsets of the data. Whether you can run the function over subsets is application-dependent.
Iterate over entities that need to be modified
Rough rule of thumb for SQL databases: iteration is always a mistake.
SQL is a set-oriented language. Dealing with sets is usually faster than iteration, and often by several orders of magnitude.
If "failed_counter" > 10, cancel work.
This looks suspicious. Why are nine failures ok? Why are any failures ok?
I thought that I could just solve the problem by using an UNIQUE CONSTRAINT over fields (entity_ID, version).
That you don't already have a unique constraint on those two columns is a big, waving red flag. Fix this first.
The fact that an application should apparently be waiting for a batch job to finish, but isn't waiting, might or might not be a system design issue. (It smells like a system design issue.)
There is a nightly job that is run in PostgreSQL every 4:00 ...
Did you think of starting at 3:00?
Test this, but not on your production server.
Drop the trigger.
Add a column of type timestamp with time zone.
Set that column's default value. Most applications will use current_timestamp, but you might want clock_timestamp() instead. Docs
Add a unique constraint on {entity_id, new timestamp column}.
Eliminating the trigger might speed things up enough for you.

Committing to database during process phase in spring batch job

I have a conventional spring-batch job where I read from database, process domain objects and write it out to a file.
I need to slightly tweak the functionality during the processor phase so that I can update and commit the domain object to the database and the write it out to a file. I would need the commit to happen instantly as I would require the database ID for the write phase.
When I tried updating the domain object and saving it, I noticed that the entity was getting committed after the write phase.
Is there any way to force the commit to happen instantly during the processor phase and continue as before?
I would need the commit to happen instantly as I would require the
database ID for the write phase.
i am not sure what id do you need, as you should already have one, when you are trying to update an (existing) entry
if you meant insert, you can work around this issue by using database specific functions to get the id of the inserted but not yet commited object
e.g. for oracle - Obtain id of an insert in the same statement
When I tried updating the domain object and saving it, I noticed that
the entity was getting committed after the write phase.
that is desired behaviour, because the write part is the last one within the (chunk)transaction, if it is successful - commit, if not - rollback, imagine a successful commit and a problem with the file, the item in the database would have a wrong state