What is the purpose of pre and post deployment? - deployment

i am new to pre and post deployment
To understand this i came across this:
“”When databases are created or upgraded, data may need to be added, changed, or deleted. Moreover, certain actions may have to occur on the database before and/or after the process completes. Deployment scripts can be used to accomplish this.””
I want to understand how this exactly works with an example
https://www.mssqltips.com/sqlservertutorial/3006/working-with-pre-and-post-deployment-scripts/

As pointed out in the site, a good example of a post deployment step is insertion of seed data.
For instance, you create a new currency table as part of the schema migration step. Then you insert the most commonly used currencies (say USD, EUR, etc.) so that they don't have to be inserted with a manual step.
Another example of post deployment step is populating data for a newly added column. For example you add a new column called IsPremium to the Customers table and want to set all customers with a start date > 5 years as true. A post deployment script is good place to do that.
Similarly scripts that run before the migration go into pre-deployment scripts. One example is locking certain table to ensure that the migration script is run only once, or setting a flag to indicate a migration is in progress.

Related

Execute Data Factory pipeline based on event in table

I have a requirement to trigger Azure Data Factory pipeline whenever there is a new record in a table. Is there any way to achieve this.
No, Event-based trigger only support Azure Blob Storage by now.
You can vote here to progress this feature in Azure Data Factory.
Currently it is not possible.
There is a similar discussion going on in the below thread :
https://learn.microsoft.com/en-us/answers/questions/197465/how-to-trigger-an-adf-based-on-any-data-changes-wi.html
While this is not currently supported, here's an idea on how to fake it. (Just sharing an idea, not sure whether this is feasible.)
Create a Trigger on INSERT
Trigger executes a Stored Procedure
Stored Procedure uses Polybase to create text file in Blob Storage with the relevant information (like new row ID).
Create a BlobCreated event trigger over that Storage location in ADF or Logic App.
Doing this should end up with an Event Trigger that fires whenever a new row is inserted.
We can use logic app to trigger a pipeline based on a query that finds data inserted in past x minutes/seconds
There's no direct way to do this in ADF (yet). As others have pointed out, you can vote for the feature to be added and Microsoft might add it. But there's still a way to do this:
You can set up a Logic App. There's a really nice video tutorial on this here: https://www.youtube.com/watch?v=z0sMIN4xMSY
You can set up a logic app to use a SQL database as the trigger, and then you can decide if you want to have it trigger based on something new created in a specific table or something modified in a specific table. You can then action the logic app to trigger the run of an Azure data factory pipeline. You can set the Logic App to check the table as often as you like, for example, every minute, 3 minutes or longer etc.
If you want an ADF pipeline to run every time a certain table is modified or row inserted, but the table in question is very large, you can reduce compute by creating a track changes table. Basically have another, very small table that logs changes and then use this table for the one the logic app monitors. This would make it faster and more efficient in such a case.
Give the video a watch.

Keeping database table clean using Entity Framework

I am using Entity Framework and I have a table that I use to record some events that gets generated by a 3rd party. These events are valid for 3 days. So I like to clean out any events that is older than 3 days to keep my database table leaner. Is there anyway I can do this? Some approach that won't cause any performance issue during clean up.
To do the mentioned above there are some options :
1) Define stored procedure mapped to your EF . And you can use Quarts Trigger to execute this method using timely manner.
https://www.quartz-scheduler.net
2) SQL Server Agent job which runs every day at the least peak time and remove your rows.
https://learn.microsoft.com/en-us/sql/ssms/agent/schedule-a-job?view=sql-server-2017
If only the method required for cleaning purpose nothing else i recommend you go with option 2
first of all. Make sure the 3rd party write a timestamp on every record, in that way you will be able to track how old the record is.
then. Create a script that deletes all record that are older than 3 days.
DELETE FROM yourTable WHERE DATEDIFF(day,getdate(),thatColumn) < -3
now create a scheduled task in SQL management Studio:
In SQL Management Studio, navigate to the server, then expand the SQL Server Agent item, and finally the Jobs folder to view, edit, add scheduled jobs.
set script to run once every day or whatever please you :)

Merging LiquiBase changesets

I'm developing a system with database version control in LiquiBase. The system is still in pre-alpha development and there are a lot of changes that were reverted or supplemented by other changes (tables removed, columns added and removed).
The current change set reflects the whole development history with many failed experiments, and this whole is rollouted when initializing the database.
Because there is NO release version, I can start from scratch and pull actual DB state in single XML changeset.
Is there a way to tell LiquiBase to merge all change sets into one file, or the only way to do that is per hand?
Just use your existing database to generate change log that will be used from now on. For this you can use generateChangeLog command from command line, it will generate the changelog file with all the changeSets that represent your current state of the database. You can use this file in your project as initial db creation file, to be used on an empty database. Here's a link to docs.
There is a page in the Liquibase docs which discusses this scenario in detail:
http://www.liquibase.org/documentation/trimming_changelogs.html
To summarise, they recommend that you don't bother since consolidating your changelogs is both risky and low-reward.
If you do want to push ahead with this, then restarting the changelog using generateChangeLog, as suggested by #veljkost, is probably the easiest way. This is documented at http://www.liquibase.org/documentation/existing_project.html
Hence I didn't find automatic solution for this problem in case the changelog is already deployed on several databases in different states, I will describe here my solution for this problem:
Generate changelog of your current development state of database using liquibase generate changelog, like:
mvn liquibase:generateChangeLog -Dliquibase.outputChangeLogFile=current_state.yml
Audit generated changelog, check whether it looks good (liquibase is not perfect, it often generates stupid statements). Also if you have in your schema some static data, like dictionaries or so, which were previosuly populated using liquibase, you have to add these to generated changelog as well, you can export data from your database using generate changelog command mentioned above with -Dliquibase.diffTypes=data property.
Now to prevent the execution of generated changelog (it will obviously fail on existing databases, on prod, test, and other developers local envs), you can do this using for example liquibase changelogSync, or using liquibase contexts, but all this options require you to do some manual work on every database. You can achieve automatic result by adding the preConditions statements for your changeSets.
For changesets intended to run on empty databases (changelogs you generated in step 1. above) you can add something like this:
preConditions:
- onFail: MARK_RAN
- not:
- tableExists:
tableName: t_project
Where t_project is the table name that existed before (most likely this should be table added in first changeSet, so every database which runned at least one changeSet will have this). This will mark generated changelog as run on environments with existing schema, and will run generated changelog for every new database you would like to migrate.
Unfortunatelly you have to adjust all legacy changesets as well (I didn't found better solution yet, I did this change using regex and sed), you have to add something like that:
preConditions:
- onFail: MARK_RAN
- tableExists:
tableName: t_project
So opposite condition, from above one. With this all databases which runned at least one changeset in past, will continue to migrate (EXECUTED status of changesets) until changeset generated in step 1. above, and mark generated changesets with MARK_RAN. And for new databases, all previous changesets will be skipped, and first executed will be one generated in step 1. above.
With this solution you can push your merged changelog at any time, and every environment and developer won't have any problem with manual syncing or so.

Data expiration

I developed an application where the user can create tasks (like a an agenda, ERP or CRM)
So, the user creates a task and that task has an expiration date. The idea is to erase the entry when the task expires.
I've been thinking solutions, like having a timer and so, but:
There is any method or way to create data that expires?? I mean, does MySQL, MSSQL (or any DB Manager) support something like this natively?
I would be great to have something like this:
CREATE TABLE [MyTASK] (expires on mydate action = DELETE){
mydate,
mysomething,
myagain
}
And then, the FakeSQL erases the data when the field "mydate" expires.
Data won't expire itself. You'll have to run something to find and remove it.
MS SQL Server supports the concept of scheduled jobs. So you can specify a stored proc that erases anything over (say) a week old, and configure that job to run every night.
mySQL seems to have Event Schedulers since 5.1.
If storage space is not an issue, it's worth a thought about putting in a bit field to mark a record as deleted so that none of your data is lost in case you want to add a part to your application that can look at all previous tasks. Like some administrative tools. Let's say employees are making tasks, and a task has expired and you don't want normal users to be able to modify or access expired tasks, so you "Delete" them when they expire. But one day the big boss comes along and says he wants a list of all tasks in the last year. This bit field could come in handy. It's just a thought and there may be better suggestions out there

Last Updated Date: Antipattern?

I keep seeing questions floating through that make reference to a column in a database table named something like DateLastUpdated. I don't get it.
The only companion field I've ever seen is LastUpdateUserId or such. There's never an indicator about why the update took place; or even what the update was.
On top of that, this field is sometimes written from within a trigger, where even less context is available.
It certainly doesn't even come close to being an audit trail; so that can't be the justification. And if there is and audit trail somewhere in a log or whatever, this field would be redundant.
What am I missing? Why is this pattern so popular?
Such a field can be used to detect whether there are conflicting edits made by different processes. When you retrieve a record from the database, you get the previous DateLastUpdated field. After making changes to other fields, you submit the record back to the database layer. The database layer checks that the DateLastUpdated you submit matches the one still in the database. If it matches, then the update is performed (and DateLastUpdated is updated to the current time). However, if it does not match, then some other process has changed the record in the meantime and the current update can be aborted.
It depends on the exact circumstance, but a timestamp like that can be very useful for autogenerated data - you can figure out if something needs to be recalculated if a depedency has changed later on (this is how build systems calculate which files need to be recompiled).
Also, many websites will have data marking "Last changed" on a page, particularly news sites that may edit content. The exact reason isn't necessary (and there likely exist backups in case an audit trail is really necessary), but this data needs to be visible to the end user.
These sorts of things are typically used for business applications where user action is required to initiate the update. Typically, there will be some kind of business app (eg a CRM desktop application) and for most updates there tends to be only one way of making the update.
If you're looking at address data, that was done through the "Maintain Address" screen, etc.
Such database auditing is there to augment business-level auditing, not to replace it. Call centres will sometimes (or always in the case of financial services providers in Australia, as one example) record phone calls. That's part of the audit trail too but doesn't tend to be part of the IT solution as far as the desktop application (and related infrastructure) goes, although that is by no means a hard and fast rule.
Call centre staff will also typically have some sort of "Notes" or "Log" functionality where they can type freeform text as to why the customer called and what action was taken so the next operator can pick up where they left off when the customer rings back.
Triggers will often be used to record exactly what was changed (eg writing the old record to an audit table). The purpose of all this is that with all the information (the notes, recorded call, database audit trail and logs) the previous state of the data can be reconstructed as can the resulting action. This may be to find/resolve bugs in the system or simply as a conflict resolution process with the customer.
It is certainly popular - rails for example has a shorthand for it, as well as a creation timestamp (:timestamps).
At the application level it's very useful, as the same pattern is very common in views - look at the questions here for example (answered 56 secs ago, etc).
It can also be used retrospectively in reporting to generate stats (e.g. what is the growth curve of the number of records in the DB).
there are a couple of scenarios
Let's say you have an address table for your customers
you have your CRM app, the customer calls that his address has changed a month ago, with the LastUpdate column you can see that this row for this customer hasn't been touched in 4 months
usually you use triggers to populate a history table so that you can see all the other history, if you see that the creationdate and updated date are the same there is no point hitting the history table since you won't find anything
you calculate indexes (stock market), you can easily see that it was recalculated just by looking at this column
there are 2 DB servers, by comparing the date column you can find out if all the changes have been replicated or not etc etc ect
This is also very useful if you have to send feeds out to clients that are delta feeds, that is only the records that have been changed or inserted since the data of the last feed are sent.