I would like to save variables as mat files on s3. The example on the official site shows "tall table" only. Maybe I can use the "system" command overstep MATLAB but I am looking for a straight forward solution.
Any suggestion?
It does look like save does not support saving to remote filesystems.
You can, however, write matrices, cells, tables and timetables.
An example which uses writetable:
LastName = {'Smith';'Johnson';'Williams';'Jones';'Brown'};
Age = [38;43;38;40;49];
T = table(Age,LastName)
writetable(T,'s3://.../table.txt')
Note:
To write to a remote location, filename must contain the full path of
the file specified as a uniform resource locator (URL) of the form:
scheme_name://path_to_file/my_file.ext
To obtain the right URL of the bucket, you can navigate to the contents of the s3 bucket, select a file in there, choose Copy path and remove the name of the file (e.g table.txt).
The alternative is, as you mentioned, a system call:
a = rand(5);
save('matExample','a');
system('aws s3api put-object --bucket mybucket --key=s3mat.mat --body=matExample.mat')
the mat file matExample.mat is saved as s3.mat on the server.
Related
I am trying to create a new dataset in ADF that looks for csv files that meet a certain naming convention. These files are located within a series of different folders in my Azure Blob Storage.
For instance, in the sample directory below, I am trying to pull out csv files that contain the word "cars".
Folder A
fastcars.csv
fasttrucks.csv
Folder B
slowcars.csv
slowtrucks.csv
Ideally , I would end up with the files "slowcars.csv" and "fastcars.csv". I've seen examples out there were people were able to wildcard the file name. I have been playing around with that, but have had no luck. (See image below for one example of what I have been doing).
Is what I am trying to do even possible? Would appreciate any advice you guys may have. Please let me know if I can provide further clarification.
According to the description of filename in this documentation,
The file name under the given fileSystem + folderPath. If you want to
use a wildcard to filter files, skip this setting and specify it in
activity source settings.
so you need to specify it in activity not in file path.
A easy sample in copy activity:
Hope this can help you.
I was assigned a matlab assignment where I was given 25000 pictures of cats and dogs all stored in one folder. My question is how can I use the imagedatastore function on matlab to store these files into one single variable containing two labels (cats and dogs). Each image stored in the file follow the following format:
cat.1.png,
cat.2.png,
.....,
cat.N.png,
dog.1.png,
dog.2.png,
.....,
dog.N.png,
Ideally I think labeling them based on image name would probably be the best approach to this. How ever I've tired doing this using various implementations methods but I keep failing. Any advice on this would be greatly appreciated!
The steps for both image data stores are the same:
Find all the image files with a matching name with dir.
Rebuild the full path to these files with fullfile.
Create the image data store with the files.
My code assumes that you are running the script in the same folder in which images are located. Here is the code:
cats = dir('cat.*.png');
files_cats = fullfile({cats.folder}.', {cats.name}.');
imds_cats = imageDatastore(files_cats);
dogs = dir('dog.*.png');
files_dogs = fullfile({dogs.folder}.', {dogs.name}.');
imds_dogs = imageDatastore(files_dogs);
You could also use the short path:
imds_cats = imageDatastore('cat.*.png');
imds_dogs = imageDatastore('dog.*.png');
If you want to use a single image data store and split files into categories within it (without using folder names, since all your files seem to be located in the same directory):
cats = dir('cat.*.png');
cats_labs = repmat({'Cat'},numel(cats),1);
dogs = dir('dog.*.png');
dogs_labs = repmat({'Dog'},numel(dogs),1);
labs = [cats_labs; dogs_labs];
imds = imageDatastore({'cat.*.png' 'dog.*.png'},'Labels',labs);
So in google-cloud-storage if you upload more than one file with the same name to it the last will overwrite what was uploaded before it.
If I want to upload more than one file with the same name I should append some unique thing to the file name e.g. timestamp, random UUID.
But by doing so I'll lose the original file name while downloading, because I want to serve the file directly from google.
If we used the unique identifier as a folder instead of appending it to the file name, e.g. UUID +"/"+ fileName then we can download the file with its original name.
You could turn on Object Versioning which will keep the old versions of the object around.
Alternatively, you can set the Content Disposition header when uploading the object, which should preserve whatever filename you want on download.
instead of using object versioning, you can attach the UUID (or any other unique identifier) and then update the metadata of the object (specifically the content disposition), the following is a part of a python script i've used to remove the forward slashes - added by google cloud buckets when to represent directories - from multiple objects, it's based on this blog post, please keep in mind the double quotes around the content position "file name"
def update_blob_download_name(bucket_name):
""" update the download name of blobs and remove
the path.
:returns: None
:rtype: None
"""
# Storage client, not added to the code for brevity
client = initialize_google_storage_client()
bucket = client.bucket(bucket_name)
for blob in bucket.list_blobs():
if "/" in blob.name:
remove_path = blob.name[blob.name.rfind("/") + 1:] # rfind gives that last occurence of the char
ext = pathlib.Path(remove_path).suffix
remove_id = remove_path[:remove_path.rfind("_id_")]
new_name = remove_id + ext
blob.content_disposition = f'attachment; filename="{new_name}"'
blob.patch()
'pickAndStore' method allows me to specify full path to the file, but I don't know it's extension at this point (file path has to be defined before file is uploaded, so it's not possible to provide a path with correct extension).
if I use 'pick' and then 'store' I have 2 files (because both methods uploads file to the s3). I can delete 'old' file, but it's not optimal and can be pain (take ages) with really big files.
Is there any better solution? Ideally to rename existing file.
Currently, there is no workaround for renaming file.
However, in our Javascript API v2 we are planing to add new callback function. onStart callback will be fired after user pick file but before file uploading. There could be option like renaming file based on original filename.
We will keep you updated.
I'm evaluating the InkFilePicker service. How do I make sure that uploading a new file to my S3 bucket won't overwrite an existing file with an identical name already in that bucket?
I'm currently using another third party upload solution that allows me to rename a file with a GUID as its file name to prevent such accidental overwrite situations.
How do rename files using InkFilePicker? Or what is the right approach with InkFilePicker to prevent unintended overwrites?
Thanks,
Sam
Looks like InkFilePicker prepends a unique key to file name during upload.
myfile.pdf becomes something like DNjimbeSQWVrcd0Uv8lJ_myfile.pdf when it's saved on Amazon S3 so it inherently prevents overwrites.