Update/overwrite DNS record Google Cloud - google-cloud-dns

Does anyone know what is a best practice to overwrite records under Google DNS Cloud, using API? https://cloud.google.com/dns/api/v1/changes/create does not help!
I could delete and create, but it is not nice ;) and could cause an outage.
Regards

The Cloud DNS API uses Changes objects to perform the update actions; you can create Changes but you don't ever delete them. In the Cloud DNS API, you never operate directly on the resource record sets. Instead, you create a Changes object with your desired additions and deletions and if that is created successfully, it applies those updates to the specified resource record sets in your managed DNS zone.
It's an unusual mental model, sort of like editing a file by specifying a diff to be applied, or appending to the commit history of a Git repository to change the contents of a file. Still, you can certainly achieve what you want to do using this API, and it is applied atomically at the authoritative servers (although the DNS system as a whole does not really do anything atomically, due to caching, so if you know you will be making changes, reduce your TTLs before you make the changes). The atomicity here is more about the updates themselves: if you have multiple applications making changes to your managed zones, and there are conflicts in changes to the particular record sets, the create operation will fail, and you will have retry the change with modified deletions (rather than having changes be silently overwritten).
Anyhow, what you want to do is to create a Changes object with deletions that specifies the current resource record set, and additions that specifies your desired replacement. This can be rather verbose, especially if you have a domain name with a lot of records of the same type. For example, if you have four A records for mydomain.example (1.1.1.1, 2.2.2.2, 3.3.3.3, and 4.4.4.4) and want to change the 3.3.3.3 address to 5.5.5.5, you need to list all four original A records in deletions and then the new four (1.1.1.1, 2.2.2.2, 4.4.4.4, and 5.5.5.5) in additions.
The Cloud DNS documentation provides example code boilerplate that you can adapt to do what you want: https://cloud.google.com/dns/api/v1/changes/create#examples, you just need to set the deletions and additions for the Changes object you are creating.

I have never used APIs for this purpose, but if you use command line i.e. gcloud to update DNS records, it binds the change in a single transaction and both tasks of deleting the record and adding the updated record are executed as a single transaction. Since transactions are atomic in nature, it shouldn't cause any outage.
Personally, I never witnessed any outage while using gcloud for updating DNS settings for my domain.

Related

Data syncing with pouchdb-based systems client-side: is there a workaround to the 'deleted' flag?

I'm planning on using rxdb + hasura/postgresql in the backend. I'm reading this rxdb page for example, which off the bat requires sync-able entities to have a deleted flag.
Q1 (main question)
Is there ANY point at which I can finally hard-delete these entities? What conditions would have to be met - eg could I simply use "older than X months" and then force my app to only ever displays data for less than X months?
Is such a hard-delete, if possible, best carried out directly in the central db, since it will be the source of truth? Would there be any repercussions client-side that I'm not foreseeing/understanding?
I foresee the number of deleted's growing rapidly in my app and i don't want to have to store all this extra data forever.
Q2 (bonus / just curious)
What is the (algorithmic) basis for needing a 'deleted' flag? Is it that it's just faster to check a flag rather than to check for the omission of an object from, say, a very large list. I apologize if it's kind of a stupid question :(
Ultimately it comes down to a decision that's informed by your particular business/product with regards to how long you want to keep deleted entities in your system. For some applications it's important to always keep a history of deleted things or even individual revisions to records stored as a kind of ledger or history. You'll have to make a judgement call as to how long you want to keep your deleted entities.
I'd recommend that you also add a deleted_at column if you haven't already and then you could easily leverage something like Hasura's new Scheduled Triggers functionality to run a recurring job that fully deletes records older than whatever your threshold is.
You could also leverage Hasura's permissions system to ensure that rows that have been deleted aren't returned to the client. There is documentation and examples for ways to work with soft deletes and Hasura
For your second question it is definitely much faster to check for the deleted flag on records than to have to try and diff the entire dataset looking for things that are now missing.

LiquiBase and Kubernetes database rolling updates

Let's say I have a database with schema of v1, and an application which is tightly coupled to that schema of v1. i.e. SQLException is thrown if the records in the database don't match the entity classes.
How should I deploy a change which alters the database schema, and deploys the application which having a race condition. i.e. user queries the app for a field which no longer exists.
This problem actually isn't specific to kubernetes, it happens in any system with more than one server -- kubernetes just makes it more front-and-center because of how automatic the rollover is. The words "tightly coupled" in your question are a dead giveaway of the real problem here.
That said, the "answer" actually will depend on which of the following mental models are better for your team:
do not make two consecutive schemas contradictory
use a "maintenance" page that keeps traffic off of the pods until they are fully rolled out
just accept the SQLExceptions and add better retry logic to the consumers
We use the first one, because the kubernetes rollout is baked into our engineering culture and we know that pod-old and pod-new will be running simultaneously and thus schema changes need to be incremental and backward compatible for at minimum one generation of pods.
However, sometimes we just accept that the engineering effort to do that is more cost than the 500s that a specific breaking change will incur, so we cheat and scale the replicas low, then roll it out and warn our monitoring team that there will be exceptions but they'll blow over. We can do that partially because the client has retry logic built into it.

Is the age of an object in Google Cloud Storage affected by calls to set meta?

I'm trying to use Google Cloud Storage's lifecycle management features on a bucket, but I want to circumvent it for certain files (basically auto delete all files after 1 day, except for specific files that I want to keep). If I call the set metadata API endpoint will that update the age of the object and prevent the delete from occurring?
Set metadata changes the last updated time, not the creation time. TTL is keyed off of creation time, so that will not prevent TTL cleanup.
However, you could do a copy operation, and just set the destination to be the same as the source. That would update the creation time, and would be a fast operation as it can copy in the cloud.
That being said, it would probably be safer to just use a different bucket for these files. If your job to keep touching the files goes down they may get deleted.

Preventing update loops for multiple databases using CDC

We have a number of legacy systems that we're unable to make changes to - however, we want to start taking data changes from these systems and applying them automatically to other systems.
We're thinking of some form of service bus (no specific tech picked yet) sitting in the middle, and a set of bus adapters (one per legacy application) to translate between database specific concepts and general update messages.
One area I've been looking at is using Change Data Capture (CDC) to monitor update activity in the legacy databases, and use that information to construct appropriate messages. However, I have a concern - how best could I, as a consumer of CDC information, distinguish changes applied by the application vs changes applied by the bus adapter on receipt of messages - because otherwise, the first update that gets distributed by the bus will get re-distributed by every receiver when they apply that change to their own system.
If I was implementing "poor mans" CDC - i.e. triggers, then those triggers execute within the context/transaction/connection of the original DML statements - so I could either design them to ignore one particular user (the user applying incoming updates from the bus), or set and detect a session property to similar ignore certain updates.
Any ideas?
If I understand your question correctly, you're trying to define a message routing structure that works with a design you've already selected (using an enterprise service bus) and a message implementation that you can use to flow data off your legacy systems that only forward-ports changes to your newer systems.
The difficulty is you're trying to apply changes in such a way that they don't themselves generate a CDC message from the clients receiving the data bundle from your legacy systems. In fact, all you're concerned about is having your newer systems consume the data and not propagate messages back to your bus, creating unnecessary crosstalk that might exponentiate, overloading your infrastructure.
The secret is how MSSQL's CDC features reconcile changes as they propagate through the network. Specifically, note this caveat:
All the changes are logged in terms of LSN or Log Sequence Number. SQL
distinctly identifies each operation of DML via a Log Sequence Number.
Any committed modifications on any tables are recorded in the
transaction log of the database with a specific LSN provided by SQL
Server. The __$operationcolumn values are: 1 = delete, 2 = insert, 3 =
update (values before update), 4 = update (values after update).
cdc.fn_cdc_get_net_changes_dbo_Employee gives us all the records net
changed falling between the LSN we provide in the function. We have
three records returned by the net_change function; there was a delete,
an insert, and two updates, but on the same record. In case of the
updated record, it simply shows the net changed value after both the
updates are complete.
For getting all the changes, execute
cdc.fn_cdc_get_all_changes_dbo_Employee; there are options either to
pass 'ALL' or 'ALL UPDATE OLD'. The 'ALL' option provides all the
changes, but for updates, it provides the after updated values. Hence
we find two records for updates. We have one record showing the first
update when Jason was updated to Nichole, and one record when Nichole
was updated to EMMA.
While this documentation is somewhat terse and difficult to understand, it appears that changes are logged and reconciled in LSN order. Competing changes should be discarded by this system, allowing your consistency model to work effectively.
Note also:
CDC is by default disabled and must be enabled at the database level
followed by enabling on the table.
Option B then becomes obvious: institute CDC on your legacy systems, then use your service bus to translate these changes into updates that aren't bound to CDC (using, for example, raw transactional update statements). This should allow for the one-way flow of data that you seek from the design of your system.
For additional methods of reconciling changes, consider the concepts raised by this Wikipedia article on "eventual consistency". Best of luck with your internal database messaging system.

Last Updated Date: Antipattern?

I keep seeing questions floating through that make reference to a column in a database table named something like DateLastUpdated. I don't get it.
The only companion field I've ever seen is LastUpdateUserId or such. There's never an indicator about why the update took place; or even what the update was.
On top of that, this field is sometimes written from within a trigger, where even less context is available.
It certainly doesn't even come close to being an audit trail; so that can't be the justification. And if there is and audit trail somewhere in a log or whatever, this field would be redundant.
What am I missing? Why is this pattern so popular?
Such a field can be used to detect whether there are conflicting edits made by different processes. When you retrieve a record from the database, you get the previous DateLastUpdated field. After making changes to other fields, you submit the record back to the database layer. The database layer checks that the DateLastUpdated you submit matches the one still in the database. If it matches, then the update is performed (and DateLastUpdated is updated to the current time). However, if it does not match, then some other process has changed the record in the meantime and the current update can be aborted.
It depends on the exact circumstance, but a timestamp like that can be very useful for autogenerated data - you can figure out if something needs to be recalculated if a depedency has changed later on (this is how build systems calculate which files need to be recompiled).
Also, many websites will have data marking "Last changed" on a page, particularly news sites that may edit content. The exact reason isn't necessary (and there likely exist backups in case an audit trail is really necessary), but this data needs to be visible to the end user.
These sorts of things are typically used for business applications where user action is required to initiate the update. Typically, there will be some kind of business app (eg a CRM desktop application) and for most updates there tends to be only one way of making the update.
If you're looking at address data, that was done through the "Maintain Address" screen, etc.
Such database auditing is there to augment business-level auditing, not to replace it. Call centres will sometimes (or always in the case of financial services providers in Australia, as one example) record phone calls. That's part of the audit trail too but doesn't tend to be part of the IT solution as far as the desktop application (and related infrastructure) goes, although that is by no means a hard and fast rule.
Call centre staff will also typically have some sort of "Notes" or "Log" functionality where they can type freeform text as to why the customer called and what action was taken so the next operator can pick up where they left off when the customer rings back.
Triggers will often be used to record exactly what was changed (eg writing the old record to an audit table). The purpose of all this is that with all the information (the notes, recorded call, database audit trail and logs) the previous state of the data can be reconstructed as can the resulting action. This may be to find/resolve bugs in the system or simply as a conflict resolution process with the customer.
It is certainly popular - rails for example has a shorthand for it, as well as a creation timestamp (:timestamps).
At the application level it's very useful, as the same pattern is very common in views - look at the questions here for example (answered 56 secs ago, etc).
It can also be used retrospectively in reporting to generate stats (e.g. what is the growth curve of the number of records in the DB).
there are a couple of scenarios
Let's say you have an address table for your customers
you have your CRM app, the customer calls that his address has changed a month ago, with the LastUpdate column you can see that this row for this customer hasn't been touched in 4 months
usually you use triggers to populate a history table so that you can see all the other history, if you see that the creationdate and updated date are the same there is no point hitting the history table since you won't find anything
you calculate indexes (stock market), you can easily see that it was recalculated just by looking at this column
there are 2 DB servers, by comparing the date column you can find out if all the changes have been replicated or not etc etc ect
This is also very useful if you have to send feeds out to clients that are delta feeds, that is only the records that have been changed or inserted since the data of the last feed are sent.