Upload same name files to google cloud storage then download them with original names - google-cloud-storage

So in google-cloud-storage if you upload more than one file with the same name to it the last will overwrite what was uploaded before it.
If I want to upload more than one file with the same name I should append some unique thing to the file name e.g. timestamp, random UUID.
But by doing so I'll lose the original file name while downloading, because I want to serve the file directly from google.

If we used the unique identifier as a folder instead of appending it to the file name, e.g. UUID +"/"+ fileName then we can download the file with its original name.

You could turn on Object Versioning which will keep the old versions of the object around.
Alternatively, you can set the Content Disposition header when uploading the object, which should preserve whatever filename you want on download.

instead of using object versioning, you can attach the UUID (or any other unique identifier) and then update the metadata of the object (specifically the content disposition), the following is a part of a python script i've used to remove the forward slashes - added by google cloud buckets when to represent directories - from multiple objects, it's based on this blog post, please keep in mind the double quotes around the content position "file name"
def update_blob_download_name(bucket_name):
""" update the download name of blobs and remove
the path.
:returns: None
:rtype: None
"""
# Storage client, not added to the code for brevity
client = initialize_google_storage_client()
bucket = client.bucket(bucket_name)
for blob in bucket.list_blobs():
if "/" in blob.name:
remove_path = blob.name[blob.name.rfind("/") + 1:] # rfind gives that last occurence of the char
ext = pathlib.Path(remove_path).suffix
remove_id = remove_path[:remove_path.rfind("_id_")]
new_name = remove_id + ext
blob.content_disposition = f'attachment; filename="{new_name}"'
blob.patch()

Related

Azure Data Factory - Check If Any Zip File Exists

I am trying to check if any zip file exists in my SFTP folder. GetMetadata activity works fine if I explicitly provide the filename but I can't know the file name here as the file name is embeded with timestamp and sequence number which are dynamic.
I tried specifying *.zip but that never works and GetMetadata activity always returns false even though the zip file actually exists. is there any way to get this worked? Suggestion please.
Sample file name as below, in this the last part 0000000004_20210907080426 is dynamic and will change every time:
TEST_TEST_9999_OK_TT_ENTITY_0000000004_20210907080426
You could possibly do a Get Metadata on the folder and include the Child items under the Field List.
You'll have to iterate with a ForEach using the expression
#activity('Get Folder Files').output.childItems
and then check if item().name (within the ForEach) ends with '.zip'.
I know it's a pain when the wildcard stuff doesn't work for a given dataset, but this alternative ought to work for you.
If you are using exists in the Get Metadata activity, you need to provide the file name in it.
As a workaround, you can get the child items (with filename *.zip) using the Get Metadata activity.
Output:
Pass the output to If Condition activity, to check if the required file exists.
#contains(string(json(string(activity('Get Metadata1').output.childItems))),'.zip')
You can use other activities inside True and False activities based on If Condition.
If there is no file exists or no child items found in the Get Metadata activity.
If condition output:
For SFTP dataset, if you want to use a wildcard to filter files under the field specified folderPath, you would have to skip this setting and specify the file name in activity source settings (Get Metadata activity).
But Wildcard filter on folders/files is not supported for Get Metadata activity.

How to get a list of all cached audio?

For example, my podcast app has a list of all downloaded podcast, how do I get a list of all LockCachingAudioSource that has been downloaded using request() method?
When you create your LockCachingAudioSource instances, you can choose the location where you want them to be saved. If you create a directory for that purpose, you can obtain a directory listing using Dart's file I/O API. The directory listing will also show partially downloaded files and other temporary files, which you want to ignore. These have extensions .mime and .part.
Having explained that, here is a solution. First, create your cache directory during app init:
final cacheDir = File('/your/choice/of/location');
...
await cacheDir.create(recursive: true);
Then for each audio source, create it like this:
import 'package:path/path.dart' as p;
...
source = LockCachingAudioSource(
uri,
cacheFile: File(p.joinAll([cacheDir, 'yourChoiceOfName.mp3'],
);
Now you can get a list of downloaded files at any time by listing the cacheDir and ignoring any temporary files:
final downloadedFiles = (await _getCacheDir()).list().where((f) =>
!['mime', 'part'].contains(f.path.replaceAll(RegExp(r'^.*\.'), '')));
If you need to turn these files back into the original URI, you could either create your own database to store which file is for which URI, or you choose the file name of each of your cache files by encoding the URI in base64 or something that's reversable, so given a file name, you can then decode it back into the original URI.

What parts of a Blob do I need to store in my database to retrieve back files?

I am now able to store files in buckets:
Blob blob = storage.create(
BlobInfo.newBuilder(
BUCKET_NAME,
Objects.requireNonNull(multipartFile.getOriginalFilename()))
.build(),
multipartFile.getBytes()
);
but I am not sure what I am supposed to keep in my own database in order to be able to retrieve my files back.
Now, the naive approach would be to just store a URL in a google_bucket_url column.
However, there's already the mediaLink and the selfLink - both of which look like they might not be valid forever.
I could also store the bucket plus the blob id
Storage storage = StorageOptions.getDefaultInstance().getService();
// Blob blob = storage.create( .. );
BlobId blobId = BlobId.of(blob.getBucket(), blob.getName());
storage.get(blobId);
but instead of guessing what's the "right" way, I would like to be sure what I am doing.
I must be missing it in the docs but I can't find any recommendations on this.
I recommend you to store the full URI of your blobs gs://bucket_name/path/to/file. The full URI is important. Tomorrow you will maybe have to store file on AWS S3 or locally, you have to identify the protocol to use for retrieving the file content.
When you have your full GCS URI gs://bucket_name/path/to/file, the split is easy for getting the file
// One line to split only at the first "/" character encounter and, previously, to replace the "gs://" by nothing
s := strings.SplitN(strings.Replace(URI, "gs://", ""), "/", 1)
// Then you can use this
bucketName := s[0]
pathToFile := s[1]
Note: This code (here in Go) is easily implementable in any language

Save variables as mat files on S3

I would like to save variables as mat files on s3. The example on the official site shows "tall table" only. Maybe I can use the "system" command overstep MATLAB but I am looking for a straight forward solution.
Any suggestion?
It does look like save does not support saving to remote filesystems.
You can, however, write matrices, cells, tables and timetables.
An example which uses writetable:
LastName = {'Smith';'Johnson';'Williams';'Jones';'Brown'};
Age = [38;43;38;40;49];
T = table(Age,LastName)
writetable(T,'s3://.../table.txt')
Note:
To write to a remote location, filename must contain the full path of
the file specified as a uniform resource locator (URL) of the form:
scheme_name://path_to_file/my_file.ext
To obtain the right URL of the bucket, you can navigate to the contents of the s3 bucket, select a file in there, choose Copy path and remove the name of the file (e.g table.txt).
The alternative is, as you mentioned, a system call:
a = rand(5);
save('matExample','a');
system('aws s3api put-object --bucket mybucket --key=s3mat.mat --body=matExample.mat')
the mat file matExample.mat is saved as s3.mat on the server.

Get path of uploaded image in Moodle

I have added custom column to store company logo. I have used file api of moodle like :
$mform->addElement('filepicker', 'certificatelogo', 'Company Logo', null,
array('maxbytes' => $maxbytes, 'accepted_types' => '*'));
$mform->setDefault('certificatelogo', '0');
$mform->addHelpButton('certificatelogo', 'certificatelogo', 'certificate');
Once the form is submitted itemid will be stored in custom column. Say "648557354"
Now I need to get image to print logo on certificate. How can I get image path from itemid? Do I need to store any other information to retrieve image?
The itemid returned is the temporary id of the draft area where the file is stored whilst the form is being displayed. You need to copy the file into its 'real' location, when the form is submitted, otherwise the file will be automatically deleted after a few days (and it will only be accessible to the user who originally uploaded it).
I'd always recommend using the filemanager element, if you are planning on keeping the file around (filepicker elements are for files you want to process and discard, such as when uploading a CSV file data to parse and add to the database).
Details of how to use it are here:
https://docs.moodle.org/dev/Using_the_File_API_in_Moodle_forms#filemanager
But the basic steps are:
Copy any existing files from the 'real' area to the draft area (file_prepare_standard_filemanager).
Display the form.
On submission, copy files from the draft area to the 'real' area (file_postupdate_standard_filemanager).
When you want to display the file to the user, get a list of files stored in the file area (defined by the component, filearea, context and, optionally, itemid, you used in file_prepare_standard_filemanager and file_postupdate_standard_filemanager). You can do this with: $fs = get_file_storage(); $fs->get_area_files().
For those files (maybe only 1 file, in your case), generate the URL with moodle_url::make_pluginfile_url.
Make sure your plugin has a PLUGINNAME_pluginfile() function in lib.php, to examine incoming file requests, do security checks on them, then serve the file.
There is a reasonable example of all of this at: https://github.com/AndyNormore/filemanager