Managing Versions of Images on Amazon S3 and MongoDB - mongodb

I am have an app that will require to use Amazon S3 to host Images and its different sizes on it. I am looking forward to understand which is the best way to do this.
When a user uploads an Image i create 3 different sizes that are required at different parts of the site or the mobile app. In total i have 4 files of the same images.
Questions:
How to store them in Amazon S3. any idea on how to rename these files to make it easier.
Do i need to store the file names of all 4 file names in MongoDB ?

You don't need to keep all file names in db. Just keep a parent folder name.
Lets say an image uploaded by a user has id 1234567(for sake of uniqueness. You may also use timestamp). Then create a folder named 1234567 and put all images with specific names like original, thumbnail, medium, large. And whenever you need a specific one just stream it.

You could simply append a short string, either the resolution or a usage-identifier like 'thumbnail', e.g.
foobar.jpg -> 2342342_thumb.jpg
-> 2342342_gallery.jpg
-> 2342342_full.jpg
That way, you won't need to store the name of the four files but just follow the convention.

Related

Nextcloud - mass removal of collaborative tags from files

due to an oversight in a flow-routine that was meant to tag certain folders on upload into the cloud, a huge amount of unwanted files were also tagged in the process. Now there are thousands upon thousands of files that have the wrong tag and need to be untagged. Neither doing this by hand nor reuploading with the correct flow-routine are really workable options. Is there a way to do the following:
Crawl through every entry in a folder
If its a file, untag it, if its a folder, don't
Everything I found about tags and NextCloud was concerning with handling them when they were uploaded, but never running over existing files in regards of tagging.
Is this possible?
The cloud stores those data into the configured database. So you could simply remove the assigns from the db.
The assigns are stored in oc_systemtag_object_mapping while the tags itself are in oc_systemtag. If you found the ID of the tag to remove (let's say 4), you could simply remove all assignments from the db:
DELETE FROM oc_systemtag_object_mapping WHERE systemtagid = 4;
If you would like to do this only for a specific folder, it's not even getting much more complicated. Files (including their folder structure!) are stored in oc_filecache, while oc_systemtag_object_mapping.objectid references oc_filecache.fileid. So with some joining and LIKEing, you could limit the rows to delete. If your tag is used for non-files, your condition should include oc_systemtag_object_mapping.objecttype = 'files'.

Reading file from Google Drive with Talend

I need to read an uploaded file in Google Drive and perform X transformation with it. As per my reading, the single way to do it is by downloading the file to my local machine with the Talend component and then, reading from there.
If it is correct, I cannot figure what would be the file name assuming that I don't want to use the exact name of the file.
I found http://meowbi.com/2018/02/23/getting-google-sheet-gdrive-talend/ and it is exactly what I need - read from Google Drive, check the file name and proceed if the file name is X. What is unclear for me is what they used in tJava.
The output schema of tGoogleDriveList component's Main row contains a field name that is the file name you're looking for. Using Iterate row is less straightforward as you need to extract values from GlobalMap. In the article you cited they get file name by "tGoogleDriveList_1_TITLE" key of the GlobalMap.
Main row between tGoogleDriveList and tJava
For more details please look into the Talend Reference for Google Drive components. The Listing files and folders in Google Drive section should be particularly topical for your case.

How to call a sas dataset by its label or where to check its name

I have a problem in dealing with SAS Enterprise Guide that runs on the server of my client.
I do not have access to the libraries so, in order to use the datasets the only thing we can do is to store them on the local disk C: of the computer and drag them to SAS.
We can not create libraries because the server does not read local paths.
Once you drag a table, let's call it "mydata" in SAS, the table is automatically renamed "mydata9865" with random numbers at the end and "mydata" is its label.
If you right-click the table and go to properties, you can't find the name of the table, just the label.
The only way I found to check the real name of the dataset is to open the Query Builder and check the name in the code preview.
The problem is that I am dealing with tables of millions of records and the machine I am using is very slow, so whenever I want to open the Query Building, just to check the table's name, it takes sometimes even an hour.
I am not a SAS expert, so I am sure there is a smarter way to do so. Is it possible for instance to use the table by calling it with its label?
data mydata2;
set mydata;
run;
instead of
set mydata9865?
Or is there some place I can rapidly check the name of the table without going through the query builder?
I tried to google it but I can't find anything, I hope someone will be able to help me!
Thank you in advance
Hover the mouse pointer over a data node to see it's attributes. The data set name is the File name: value.
For example:
In this example I had renamed the nodes created by two different queries to be the same (doable:yes, smart:maybe not). NOTE: A data node Label: is not necessarily the same as it's underlying data set's label metadata.
Regarding
use the table by calling it with its label?
Two nodes can have the same label, and is a a situation that defeats this approach.
Use the COPY task to upload your data explicitly. It sounds like you're not adding your data to the projects properly so SAS automatically assigns a name, rather than if you explicitly import or load your data.
Problem solved! I should have simply upload the data to the server with Tasks->Data->Upload Data Sets to Server but I didn't know this task so I didn't know it was possible to do it at all!
https://communities.sas.com/t5/SAS-Enterprise-Guide/Importing-sas-data-sets-from-C-drive-into-SAS-EG-not-possible/td-p/135184
Thank you everybody for you help!

Deleting Folders From Azure Storage Containers or File Shares That are Older than X Days

I am using Azure Storage Accounts and trying to work with powershell to delete folders that exist on a container (I know the container is just a 2 layer hierarchy and the blobs concept and that folders do not actually exist per say).
Apart from not being able to check a folder date/time properties, on the blobs themselves the only property I could find is "last modified" which is generally OK for our purpose, although having creation property is better.
As I understand the only solution for this is to create a table and list each file and its creation time and date? seems like a lot of work for this matter.
I can enumerate a file from that folder as they are all copied together and then delete all blobs sharing the root "folder" but I would prefer to know the actual last modified time of the folder itself than the files in it. Is there any way to achieve this? Now, I am not LOCKED on using azure storage containers, file shares are also possible, but when I tried that, enumerating the folders was possible, but the modifed date and time property is just not filled for some reason, and that is the only property there aside of "ETag".
Thanks in advance.
As far as I know, allowing users to define expiration policies on blobs natively from storage is still planned, we can find it in this Azure storage feedback.
If you’d like to delete ''expired'' folders/files using powershell script, you can try to include path information with datetime in blob names (such as 2017/10/test.txt), and then you can list and traverse the blobs to compare datetime part in blob name with current datetime, if the blob is older than x days, delete it.
Besides, if you do not want to include path information with datetime in blob names, you can try to store creation datetime in properties or metadata, and then you can retrieve creation datetime of blob from properties or metadata, and compare creation datetime with current datetime to determine if delete the blob.

Exporting a log from iPhone application

One of the features in my application is a log where a user can add log entries. I want to make it possible to for the user to export this data. However I do not know which format I should use for this. The data looks like this:
A date, distance, duration, maximum four category names. What I want is to make it possible to send it on mail or open it with dropbox using the URL scheme if the user has dropbox.
I have read about CSV format but I don't know if that is a good file format? My main concern is that the user do not have to have a fixed number of categories (could be between 1-4 categories)
Seeing as the columns of data to be exported will be dynamic in total, it will depend on what the user selects - and there's nothing wrong with this.
I think .csv is fine for this purpose as well - but you need to ask yourself... what will the user be doing with the data? You could either offer multiple file export formats or whatever is the best-for-purpose format, depending on what your average user will do with it.
CSV (comma separated values) is simple (and adds very little overhead - the commas), but not terribly flexible. This is good for importing to MSFT Excel, for instance.
You should consider using XML (the same underlying format used for plists) which is a very flexible (future proof should you wish to add additional columns in the future) and well supported format.