Is the age of an object in Google Cloud Storage affected by calls to set meta? - google-cloud-storage

I'm trying to use Google Cloud Storage's lifecycle management features on a bucket, but I want to circumvent it for certain files (basically auto delete all files after 1 day, except for specific files that I want to keep). If I call the set metadata API endpoint will that update the age of the object and prevent the delete from occurring?

Set metadata changes the last updated time, not the creation time. TTL is keyed off of creation time, so that will not prevent TTL cleanup.
However, you could do a copy operation, and just set the destination to be the same as the source. That would update the creation time, and would be a fast operation as it can copy in the cloud.
That being said, it would probably be safer to just use a different bucket for these files. If your job to keep touching the files goes down they may get deleted.

Related

How do you perform a real 'move' operation of a GCS storage object, where real 'move' is one that maintains the Last Modified date?

When I move a Google Cloud Storage Object from one bucket to another bucket, or from one "folder" to another "folder" within the same bucket (i.e., a name identity change within the bucket), the Last Modified date is always changed to the date/time of the move.
I would like to move a storage object and have Google Cloud maintain the Last Modified time. If the storage object's contents has not changed, I do not want a move to change the metadata's Last Modified date/time.
I have tried the following tests, but none maintain Modified time:
gsutil .\File.txt gs://bucket-name1/
gsutil gs://bucket-name1/File.txt gs://bucket-name1/SameBucketNameChange/
gsutil gs://bucket-name1/File.txt gs://bucket-name2
Using the GCS portal/console to manually select File.txt, choose move, select a destination bucket that is different from the first bucket.
In all cases, both the Last Modified and Created times change. I would expect at least the Last Modified time to remain unchanged just as it does in both Windows/Linux when there is a move operation.
Especially with cloud storage objects, I would think a positive value add would be date/time integrity, that at least some date/time would be tied to storage object content changes (without such changes that date/time would not change... usually this would be the Last Modified date/time).
The best option I could find so far is when using a Transfer Job with metadata preservation specified for Created time but the result is that GCS still modifies the destination Created/Modified times, but it carries over (copies) the original object's Created time as a new "Custom time" field which seems somewhat odd.
There is no Last Modified date in Cloud Storage. Objects cannot be modified therefore there cannot be a date that you modified the object. You can only change an object by creating a new object and copying the data. Even changing the name requires a copy operation.
Cloud Storage does not support Move. That is emulated with Copy and Delete.

Logic App Blob Trigger for a group of blobs

I'm creating a Logic App that has to process all blobs that in a certain container. I would like to periodically check whether there are any new blobs and, if yes, start a run. I tried using the "When a blob is added or modified". However, if at the time of checking there are several new blobs, several new runs are initiated. Is there a way to only initiate one run if one or more blobs are added/modified?
I experimented with the "Number of blobs to return from the trigger" and also with the split-on setting, but I haven't found a way yet.
If you want to trigger with multiple blob files, yes you have to use When a blob is added or modified. From the connector description you could know
This operation triggers a flow when one or more blobs are added or modified in a container.
And you must set the maxFileCount also you already find the result is split into separate parts. This is because the default setting the splitOn is on, if you want the result be a whole you need to set it OFF.
The the result should be what you want.

How to mark a object for deletion in x days time?

I have a regional object store. I would like to be able to tell a particular object that I want you deleted in 5 days time from now.
How do you suggest I implement?
I don't really want to keep track of the object in a database, and based on time send delete commands as a separate process. Is there any tag that could be set to get deletion to occur at a later time (from now, not a specific time in the past)?
There's no functionality built into Google Cloud Storage to do this.
You can configure Lifecycle Management to delete objects according to a number of criteria (including age) - but deleting at a particular date in the future isn't one of the supported conditions and in fact there's no guarantee that a lifecycle condition will run the same day the condition becomes true. Instead you would have to implement this functionality yourself (e.g., in a Compute Engine or App Engine implementation).

Update/overwrite DNS record Google Cloud

Does anyone know what is a best practice to overwrite records under Google DNS Cloud, using API? https://cloud.google.com/dns/api/v1/changes/create does not help!
I could delete and create, but it is not nice ;) and could cause an outage.
Regards
The Cloud DNS API uses Changes objects to perform the update actions; you can create Changes but you don't ever delete them. In the Cloud DNS API, you never operate directly on the resource record sets. Instead, you create a Changes object with your desired additions and deletions and if that is created successfully, it applies those updates to the specified resource record sets in your managed DNS zone.
It's an unusual mental model, sort of like editing a file by specifying a diff to be applied, or appending to the commit history of a Git repository to change the contents of a file. Still, you can certainly achieve what you want to do using this API, and it is applied atomically at the authoritative servers (although the DNS system as a whole does not really do anything atomically, due to caching, so if you know you will be making changes, reduce your TTLs before you make the changes). The atomicity here is more about the updates themselves: if you have multiple applications making changes to your managed zones, and there are conflicts in changes to the particular record sets, the create operation will fail, and you will have retry the change with modified deletions (rather than having changes be silently overwritten).
Anyhow, what you want to do is to create a Changes object with deletions that specifies the current resource record set, and additions that specifies your desired replacement. This can be rather verbose, especially if you have a domain name with a lot of records of the same type. For example, if you have four A records for mydomain.example (1.1.1.1, 2.2.2.2, 3.3.3.3, and 4.4.4.4) and want to change the 3.3.3.3 address to 5.5.5.5, you need to list all four original A records in deletions and then the new four (1.1.1.1, 2.2.2.2, 4.4.4.4, and 5.5.5.5) in additions.
The Cloud DNS documentation provides example code boilerplate that you can adapt to do what you want: https://cloud.google.com/dns/api/v1/changes/create#examples, you just need to set the deletions and additions for the Changes object you are creating.
I have never used APIs for this purpose, but if you use command line i.e. gcloud to update DNS records, it binds the change in a single transaction and both tasks of deleting the record and adding the updated record are executed as a single transaction. Since transactions are atomic in nature, it shouldn't cause any outage.
Personally, I never witnessed any outage while using gcloud for updating DNS settings for my domain.

Can watchman send why a file changed?

Is watchman capable of posting to the configured command, why it's sending a file to that command?
For example:
a file is new to a folder would possibly be a FILE_CREATE flag;
a file that is deleted would send to the command the FILE_DELETE flag;
a file that's modified would send a FILE_MOD flag etc.
Perhaps even when a folder gets deleted (and therefore the files thereunder) would send a FOLDER_DELETE parameter naming the folder, as well as a FILE_DELETE to the files thereunder / FOLDER_DELETE to the folders thereunder
Is there such a thing?
No, it can't do that. The reasons why are pretty fundamental to its design.
The TL;DR is that it is a lot more complicated than you might think for a client to correctly process those individual events and in almost all cases you don't really want them.
Most file watching systems are abstractions that simply translate from the system specific notification information into some common form. They don't deal, either very well or at all, with the notification queue being overflown and don't provide their clients with a way to reliably respond to that situation.
In addition to this, the filesystem can be subject to many and varied changes in a very short amount of time, and from multiple concurrent threads or processes. This makes this area extremely prone to TOCTOU issues that are difficult to manage. For example, creating and writing to a file typically results in a series of notifications about the file and its containing directory. If the file is removed immediately after this sequence (perhaps it was an intermediate file in a build step), by the time you see the notifications about the file creation there is a good chance that it has already been deleted.
Watchman takes the input stream of notifications and feeds it into its internal model of the filesystem: an ordered list of observed files. Each time a notification is received watchman treats it as a signal that it should go and look at the file that was reported as changed and then move the entry for that file to the most recent end of the ordered list.
When you ask Watchman for information about the filesystem it is possible or even likely that there may be pending notifications still due from the kernel. To minimize TOCTOU and ensure that its state is current, watchman generates a synchronization cookie and waits for that notification to be visible before it responds to your query.
The combination of the two things above mean that watchman result data has two important properties:
You are guaranteed to have have observed all notifications that happened before your query
You receive the most recent information for any given file only once in your query results (the change results are coalesced together)
Let's talk about the overflow case. If your system is unable to keep up with the rate at which files are changing (eg: you have a big project and are very quickly creating and deleting files and the system is heavily loaded), the OS can't fit all of the pending notifications in the buffer resources allocated to the watches. When that happens, it blows those buffers and sends an overflow signal. What that means is that the client of the watching API has missed some number of events and is no longer in synchronization with the state of the filesystem. If that client is maintains state about the filesystem it is no longer valid.
Watchman addresses this situation by re-examining the watched tree and synthetically marking all of the files as being changed. This causes the next query from the client to see everything in the tree. We call this a fresh instance result set because it is the same view you'd get when you are querying for the first time. We set a flag in the result so that the client knows that this has happened and can take appropriate steps to repair its own state. You can configure this behavior through query parameters.
In these fresh instance result sets, we don't know whether any given file really changed or not (it's possible that it changed in such a way that we can't detect via lstat) and even if we can see that its metadata changed, we don't know the cause of that change.
There can be multiple events that contribute to why a given file appears in the results delivered by watchman. We don't them record them individually because we can't track them with unbounded history; imagine a file that is incrementally being written once every second all day long. Do we keep 86400 change entries for it per day on hand and deliver those to our clients? What if there are hundreds of thousands of files like this? We'd have to truncate that data, and at that point the loss in the data reduces how well you can reason about it.
At the end of all of this, it is very rare for a client to do much more than try to read a file or look at its metadata, and generally speaking, they want to do that only when the file has stopped changing. For this use case, watchman-wait, watchman-make and trigger all have the concept of a settle period that causes the change notifications to be delayed in delivery until after the filesystem has stopped changing.