Data expiration - scheduled-tasks

Data expiration - scheduled-tasks

I developed an application where the user can create tasks (like a an agenda, ERP or CRM)
So, the user creates a task and that task has an expiration date. The idea is to erase the entry when the task expires.
I've been thinking solutions, like having a timer and so, but:
There is any method or way to create data that expires?? I mean, does MySQL, MSSQL (or any DB Manager) support something like this natively?
I would be great to have something like this:
CREATE TABLE [MyTASK] (expires on mydate action = DELETE){
mydate,
mysomething,
myagain
}
And then, the FakeSQL erases the data when the field "mydate" expires.

Data won't expire itself. You'll have to run something to find and remove it.
MS SQL Server supports the concept of scheduled jobs. So you can specify a stored proc that erases anything over (say) a week old, and configure that job to run every night.

mySQL seems to have Event Schedulers since 5.1.

If storage space is not an issue, it's worth a thought about putting in a bit field to mark a record as deleted so that none of your data is lost in case you want to add a part to your application that can look at all previous tasks. Like some administrative tools. Let's say employees are making tasks, and a task has expired and you don't want normal users to be able to modify or access expired tasks, so you "Delete" them when they expire. But one day the big boss comes along and says he wants a list of all tasks in the last year. This bit field could come in handy. It's just a thought and there may be better suggestions out there

Related

What is the purpose of pre and post deployment?

i am new to pre and post deployment
To understand this i came across this:
“”When databases are created or upgraded, data may need to be added, changed, or deleted. Moreover, certain actions may have to occur on the database before and/or after the process completes. Deployment scripts can be used to accomplish this.””
I want to understand how this exactly works with an example
https://www.mssqltips.com/sqlservertutorial/3006/working-with-pre-and-post-deployment-scripts/

As pointed out in the site, a good example of a post deployment step is insertion of seed data.
For instance, you create a new currency table as part of the schema migration step. Then you insert the most commonly used currencies (say USD, EUR, etc.) so that they don't have to be inserted with a manual step.
Another example of post deployment step is populating data for a newly added column. For example you add a new column called IsPremium to the Customers table and want to set all customers with a start date > 5 years as true. A post deployment script is good place to do that.
Similarly scripts that run before the migration go into pre-deployment scripts. One example is locking certain table to ensure that the migration script is run only once, or setting a flag to indicate a migration is in progress.

Data syncing with pouchdb-based systems client-side: is there a workaround to the 'deleted' flag?

I'm planning on using rxdb + hasura/postgresql in the backend. I'm reading this rxdb page for example, which off the bat requires sync-able entities to have a deleted flag.
Q1 (main question)
Is there ANY point at which I can finally hard-delete these entities? What conditions would have to be met - eg could I simply use "older than X months" and then force my app to only ever displays data for less than X months?
Is such a hard-delete, if possible, best carried out directly in the central db, since it will be the source of truth? Would there be any repercussions client-side that I'm not foreseeing/understanding?
I foresee the number of deleted's growing rapidly in my app and i don't want to have to store all this extra data forever.
Q2 (bonus / just curious)
What is the (algorithmic) basis for needing a 'deleted' flag? Is it that it's just faster to check a flag rather than to check for the omission of an object from, say, a very large list. I apologize if it's kind of a stupid question :(

Ultimately it comes down to a decision that's informed by your particular business/product with regards to how long you want to keep deleted entities in your system. For some applications it's important to always keep a history of deleted things or even individual revisions to records stored as a kind of ledger or history. You'll have to make a judgement call as to how long you want to keep your deleted entities.
I'd recommend that you also add a deleted_at column if you haven't already and then you could easily leverage something like Hasura's new Scheduled Triggers functionality to run a recurring job that fully deletes records older than whatever your threshold is.
You could also leverage Hasura's permissions system to ensure that rows that have been deleted aren't returned to the client. There is documentation and examples for ways to work with soft deletes and Hasura
For your second question it is definitely much faster to check for the deleted flag on records than to have to try and diff the entire dataset looking for things that are now missing.

Keeping database table clean using Entity Framework

I am using Entity Framework and I have a table that I use to record some events that gets generated by a 3rd party. These events are valid for 3 days. So I like to clean out any events that is older than 3 days to keep my database table leaner. Is there anyway I can do this? Some approach that won't cause any performance issue during clean up.

To do the mentioned above there are some options :
1) Define stored procedure mapped to your EF . And you can use Quarts Trigger to execute this method using timely manner.
https://www.quartz-scheduler.net
2) SQL Server Agent job which runs every day at the least peak time and remove your rows.
https://learn.microsoft.com/en-us/sql/ssms/agent/schedule-a-job?view=sql-server-2017
If only the method required for cleaning purpose nothing else i recommend you go with option 2

first of all. Make sure the 3rd party write a timestamp on every record, in that way you will be able to track how old the record is.
then. Create a script that deletes all record that are older than 3 days.
DELETE FROM yourTable WHERE DATEDIFF(day,getdate(),thatColumn) < -3
now create a scheduled task in SQL management Studio:
In SQL Management Studio, navigate to the server, then expand the SQL Server Agent item, and finally the Jobs folder to view, edit, add scheduled jobs.
set script to run once every day or whatever please you :)

Dynamics CRM workflow failing with infinite loop detection - but why?

I want to run a plug-in every 30 minutes, to poll an external system for changes. I am in CRM Online, so I don't have ready access to a scheduling engine.
To run the plug-in, I have a 'trigger' entity with a timezone independent date-
Updating the field also triggers a workflow, which in pseudocode has this logic:
If (Trigger_WaitUntil >= [Process-Execution Time])
{
Timeout until Trigger:WaitUntil
{
Set Trigger_WaitUntil to [Process-Execution Time] + 30 minutes
Stop Workflow with status of: Succeeded
}
}
If Trigger_WaitUntil < [Process-Execution Time])
{
Send email //Tell an admin that the recurring task has self-terminated
Stop Workflow with status of: Canceled
}
So, the behaviour I expect is that every 30 minutes, the 'WaitUntil' field gets updated (and the Plug-in and workflow get triggered again); unless the WaitUntil date is before the Execution time, in which case stop the workflow.
However, 4 hours or so later (probably 8 executions, although I haven't verified that yet) I get an infinite loop warning "This workflow job was canceled because the workflow that started it included an infinite loop. Correct the workflow logic and try again. For information about workflow".
My question is why? Do workflows have a correlation id like plug-ins, which is being carried through to the child workflow? If so, is there anyway I can prevent this, whilst maintaining the current basic mechanism of using a single trigger record to manage the schedule (I've seen other solutions in which workflows create new records, but then you've got to go round tidying up the old trigger records as well)

Yes, this behavior is well-known. The only way to implement recurring workflows without issues with infinite loops in Dynamics CRM and using only OOB features is usage of Bulk Deletion functionality. This article describes how to implement it - http://www.crmsoftwareblog.com/2012/08/using-the-bulk-deletion-process-to-schedule-recurring-workflows/
UPD: If you want to run your code every 30 mins then you will have to create 48 bulkdelete jobs with correspond startdatetime like 12:00, 12: 30, 1:00 ...

The current supported method for CRM is to use the Azure Scheduler.
Excerpt:
create a Web API application to communicate with CRM and our external
provider running on a shared (free) Azure web site and also utilize
the Azure Scheduler to manage the recurrence pattern.
The free version of the Azure Scheduler limits us to execution no more
than once an hour and a maximum of 5 jobs. If you have a lot going on
$20 a month will get you executions every minute and up to 50 jobs -
which sounds like a pretty good deal.
so if you wanted every 30 minutes, you could create two jobs, one on the half hour, and one on the hour.

The Bulk Deletion is an interesting work around and something we've used before. It creates extra work and maintenance though so I try to avoid it if possible.
I would generally recommend building a windows application and using the windows scheduling feature (I know you said you don't have a scheduler available but this is often forgotten). This approach works really well and is very easy to troubleshoot. Writing to logs and sending error email alerts is pretty easy to make it robust. The server doesn't need to be accessible externally, it only needs to reach CRM. If you had CRM on-prem, you could just use the same server.
Azure Scheduler is a great suggestion. This keeps you in the cloud which is nice.
SSIS is another option if you already have KingswaySoft or Cozy Roc in place.
You could build a workflow that creates another record and cleans up after itself; however, this is really using the wrong tool for the job. Also, it's very easy for it to fail and then not initiate the next record.

There is a solution called "Scheduled Workflow Runner". You create a FetchXML query to create a record set to run against, and point it at an on-demand workflow that you want it to run on each record.
http://alexanderdevelopment.net/post/2013/05/18/scheduling-recurring-dynamics-crm-workflows-with-fetchxml/

Last Updated Date: Antipattern?

I keep seeing questions floating through that make reference to a column in a database table named something like DateLastUpdated. I don't get it.
The only companion field I've ever seen is LastUpdateUserId or such. There's never an indicator about why the update took place; or even what the update was.
On top of that, this field is sometimes written from within a trigger, where even less context is available.
It certainly doesn't even come close to being an audit trail; so that can't be the justification. And if there is and audit trail somewhere in a log or whatever, this field would be redundant.
What am I missing? Why is this pattern so popular?

Such a field can be used to detect whether there are conflicting edits made by different processes. When you retrieve a record from the database, you get the previous DateLastUpdated field. After making changes to other fields, you submit the record back to the database layer. The database layer checks that the DateLastUpdated you submit matches the one still in the database. If it matches, then the update is performed (and DateLastUpdated is updated to the current time). However, if it does not match, then some other process has changed the record in the meantime and the current update can be aborted.

It depends on the exact circumstance, but a timestamp like that can be very useful for autogenerated data - you can figure out if something needs to be recalculated if a depedency has changed later on (this is how build systems calculate which files need to be recompiled).
Also, many websites will have data marking "Last changed" on a page, particularly news sites that may edit content. The exact reason isn't necessary (and there likely exist backups in case an audit trail is really necessary), but this data needs to be visible to the end user.

These sorts of things are typically used for business applications where user action is required to initiate the update. Typically, there will be some kind of business app (eg a CRM desktop application) and for most updates there tends to be only one way of making the update.
If you're looking at address data, that was done through the "Maintain Address" screen, etc.
Such database auditing is there to augment business-level auditing, not to replace it. Call centres will sometimes (or always in the case of financial services providers in Australia, as one example) record phone calls. That's part of the audit trail too but doesn't tend to be part of the IT solution as far as the desktop application (and related infrastructure) goes, although that is by no means a hard and fast rule.
Call centre staff will also typically have some sort of "Notes" or "Log" functionality where they can type freeform text as to why the customer called and what action was taken so the next operator can pick up where they left off when the customer rings back.
Triggers will often be used to record exactly what was changed (eg writing the old record to an audit table). The purpose of all this is that with all the information (the notes, recorded call, database audit trail and logs) the previous state of the data can be reconstructed as can the resulting action. This may be to find/resolve bugs in the system or simply as a conflict resolution process with the customer.

It is certainly popular - rails for example has a shorthand for it, as well as a creation timestamp (:timestamps).
At the application level it's very useful, as the same pattern is very common in views - look at the questions here for example (answered 56 secs ago, etc).
It can also be used retrospectively in reporting to generate stats (e.g. what is the growth curve of the number of records in the DB).

there are a couple of scenarios
Let's say you have an address table for your customers
you have your CRM app, the customer calls that his address has changed a month ago, with the LastUpdate column you can see that this row for this customer hasn't been touched in 4 months
usually you use triggers to populate a history table so that you can see all the other history, if you see that the creationdate and updated date are the same there is no point hitting the history table since you won't find anything
you calculate indexes (stock market), you can easily see that it was recalculated just by looking at this column
there are 2 DB servers, by comparing the date column you can find out if all the changes have been replicated or not etc etc ect

This is also very useful if you have to send feeds out to clients that are delta feeds, that is only the records that have been changed or inserted since the data of the last feed are sent.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Data expiration - scheduled-tasks

Data won't expire itself. You'll have to run something to find and remove it. MS SQL Server supports the concept of scheduled jobs. So you can specify a stored proc that erases anything over (say) a week old, and configure that job to run every night.

mySQL seems to have Event Schedulers since 5.1.

Related

What is the purpose of pre and post deployment?

Data syncing with pouchdb-based systems client-side: is there a workaround to the 'deleted' flag?

Keeping database table clean using Entity Framework

Dynamics CRM workflow failing with infinite loop detection - but why?

Last Updated Date: Antipattern?

Categories

Resources