Deleting Folders From Azure Storage Containers or File Shares That are Older than X Days - powershell

I am using Azure Storage Accounts and trying to work with powershell to delete folders that exist on a container (I know the container is just a 2 layer hierarchy and the blobs concept and that folders do not actually exist per say).
Apart from not being able to check a folder date/time properties, on the blobs themselves the only property I could find is "last modified" which is generally OK for our purpose, although having creation property is better.
As I understand the only solution for this is to create a table and list each file and its creation time and date? seems like a lot of work for this matter.
I can enumerate a file from that folder as they are all copied together and then delete all blobs sharing the root "folder" but I would prefer to know the actual last modified time of the folder itself than the files in it. Is there any way to achieve this? Now, I am not LOCKED on using azure storage containers, file shares are also possible, but when I tried that, enumerating the folders was possible, but the modifed date and time property is just not filled for some reason, and that is the only property there aside of "ETag".
Thanks in advance.

As far as I know, allowing users to define expiration policies on blobs natively from storage is still planned, we can find it in this Azure storage feedback.
If you’d like to delete ''expired'' folders/files using powershell script, you can try to include path information with datetime in blob names (such as 2017/10/test.txt), and then you can list and traverse the blobs to compare datetime part in blob name with current datetime, if the blob is older than x days, delete it.
Besides, if you do not want to include path information with datetime in blob names, you can try to store creation datetime in properties or metadata, and then you can retrieve creation datetime of blob from properties or metadata, and compare creation datetime with current datetime to determine if delete the blob.

Related

How do you perform a real 'move' operation of a GCS storage object, where real 'move' is one that maintains the Last Modified date?

When I move a Google Cloud Storage Object from one bucket to another bucket, or from one "folder" to another "folder" within the same bucket (i.e., a name identity change within the bucket), the Last Modified date is always changed to the date/time of the move.
I would like to move a storage object and have Google Cloud maintain the Last Modified time. If the storage object's contents has not changed, I do not want a move to change the metadata's Last Modified date/time.
I have tried the following tests, but none maintain Modified time:
gsutil .\File.txt gs://bucket-name1/
gsutil gs://bucket-name1/File.txt gs://bucket-name1/SameBucketNameChange/
gsutil gs://bucket-name1/File.txt gs://bucket-name2
Using the GCS portal/console to manually select File.txt, choose move, select a destination bucket that is different from the first bucket.
In all cases, both the Last Modified and Created times change. I would expect at least the Last Modified time to remain unchanged just as it does in both Windows/Linux when there is a move operation.
Especially with cloud storage objects, I would think a positive value add would be date/time integrity, that at least some date/time would be tied to storage object content changes (without such changes that date/time would not change... usually this would be the Last Modified date/time).
The best option I could find so far is when using a Transfer Job with metadata preservation specified for Created time but the result is that GCS still modifies the destination Created/Modified times, but it carries over (copies) the original object's Created time as a new "Custom time" field which seems somewhat odd.
There is no Last Modified date in Cloud Storage. Objects cannot be modified therefore there cannot be a date that you modified the object. You can only change an object by creating a new object and copying the data. Even changing the name requires a copy operation.
Cloud Storage does not support Move. That is emulated with Copy and Delete.

Nextcloud - mass removal of collaborative tags from files

due to an oversight in a flow-routine that was meant to tag certain folders on upload into the cloud, a huge amount of unwanted files were also tagged in the process. Now there are thousands upon thousands of files that have the wrong tag and need to be untagged. Neither doing this by hand nor reuploading with the correct flow-routine are really workable options. Is there a way to do the following:
Crawl through every entry in a folder
If its a file, untag it, if its a folder, don't
Everything I found about tags and NextCloud was concerning with handling them when they were uploaded, but never running over existing files in regards of tagging.
Is this possible?
The cloud stores those data into the configured database. So you could simply remove the assigns from the db.
The assigns are stored in oc_systemtag_object_mapping while the tags itself are in oc_systemtag. If you found the ID of the tag to remove (let's say 4), you could simply remove all assignments from the db:
DELETE FROM oc_systemtag_object_mapping WHERE systemtagid = 4;
If you would like to do this only for a specific folder, it's not even getting much more complicated. Files (including their folder structure!) are stored in oc_filecache, while oc_systemtag_object_mapping.objectid references oc_filecache.fileid. So with some joining and LIKEing, you could limit the rows to delete. If your tag is used for non-files, your condition should include oc_systemtag_object_mapping.objecttype = 'files'.

Is there a way to find the oldest file in a directory using Azure Data Lake?

Is there a way to find the oldest file in a directory using Azure Data Lake?
I had assumed I could use the meta data activity to get all the file names and dates (which I can). I then thought I could use the forEach to set two variables in the pipeline (Name & Date) with the values from the list if they were older than the current value of the variables. This does not work as all the files are processed in parallel. This really should not be this hard.
Yes, ForEach activity in Azure Data Factory works in parallel by default , but you change to work sequentially through checking Sequential option.
More details, you can refer to this documentation.

Azure Data factory, How to incrementally copy blob data to sql

I have a azure blob container where some json files with data gets put every 6 hours and I want to use Azure Data Factory to copy it to an Azure SQL DB. The file pattern for the files are like this: "customer_year_month_day_hour_min_sec.json.data.json"
The blob container also has other json data files as well so I have filter for the files in the dataset.
First question is how can I set the file path on the blob dataset to only look for the json files that I want? I tried with the wildcard *.data.json but that doesn't work. The only filename wildcard I have gotten to work is *.json
Second question is how can I copy data only from the new files (with the specific file pattern) that lands in the blob storage to Azure SQL? I have no control of the process that puts the data in the blob container and cannot move the files to another location which makes it harder.
Please help.
You could use ADF event trigger to achieve this.
Define your event trigger as 'blob created' and specify the blobPathBeginsWith and blobPathEndsWith property based on your filename pattern.
For the first question, when an event trigger fires for a specific blob, the event captures the folder path and file name of the blob into the properties #triggerBody().folderPath and #triggerBody().fileName. You need to map the properties to pipeline parameters and pass #pipeline.parameters.parameterName expression to your fileName in copy activity.
This also answers the second question, each time the trigger is fired, you'll get the fileName of the newest created files in #triggerBody().folderPath and #triggerBody().fileName.
Thanks.
I understand your situation. Seems they've used a new platform to recreate a decades old problem. :)
The patter I would setup first looks something like:
Create a Storage Account Trigger that will fire on every new file in the source container.
In the triggered Pipeline, examine the blog name to see if it fits your parameters. If no, just end, taking no action. If so, binary copy the blob to a account/container your app owns, leaving the original in place.
Create another Trigger on your container that runs the import Pipeline.
Run your import process.
Couple caveats your management has to understand. You can be very, very reliable, but cannot guarantee compliance because there is no transaction/contract between you and the source container. Also, there may be a sequence gap since a small file can usually process while a larger file is processing.
If for any reason you do miss a file, all you need to do is copy it to your container where your process will pick it up. You can load all previous blobs in the same way.

The default value for a column in not set when copy file using rest api (SharePoint 2013 standalone)

I am trying to copy files from one folder to another folder using SharePoint REST API. Some columns inside the destination folder have defined a default value. Even though the files are copied successfully, some files do not get the default value for the columns.
On a closer look, I found that the new office documents types (.docx, .xlsx, .pptx etc.) get the default values, while the old office document types (.doc, .xls, .ppt) do not get the values.
Also the old office documents get the values only when they are coming from a source folder which already contains the columns in the destination folder.
I am wondering why the old office documents do not get the values and if anything can be done.
Is it a bug in SharePoint Server or am I missing any configuration to make all files work?
My understanding is that this is expected. Because you are copying files, the copy includes not only the file itself but also its metadata. If the file in the source folder doesn't have values in those columns, it does make sense that if you copy it to a destination folder, those same columns shouldn't have values either. Now, why some files (docx, pptx, etc.) do have values in the destination? Probably because of the SharePoint document parser feature (Document Property Promotion and Demotion). So in your case what you can do is, instead of copying the files, download/upload them using for instance code like this.