Programmatically add/associate metadata to any file - metadata

I'm building a tool that will export files from a learning management system and store those files somewhere like S3. I need to keep track of certain pieces of information about those files that relate to the system they're coming from (e.g. course name, academic term, student's name/id). The file types will vary drastically, so I don't want to make accommodations to store the metadata based on the filetype (like pdf or doc).
What's the best way to associate filetype-agnostic metadata to a file? I'm toying with the idea of generating a JSON file for every file I bring over, but it would be great if I could add the metadata directly to the file itself.

Related

Akeneo 2.1 : Import / export best practices to set it all up

I'm currently setting up an Akeneo (2.1) instance that needs to communicate with an e-commerce solution. I was wondering what the best practices are when it comes to importing and exporting data. The documentation kind of lacks in this; it tells how you can set it up, but I'm missing practical use cases here.
Here's what I'm thinking of:
I want our customer to be able to upload their images / CSV files using an FTP connection.
Akeneo should ideally only start importing when a mutation in this (FTP) destination folder is detected.
Exporting should only be done once or twice a day, and upon completing the archive should be transfered with (s)FTP to a different location
I'm currently having trouble on how to implement this flow in Akeneo. Because if I look at what comes out of the box I can come up with the following:
I can setup an FTP account that ends up in `app/uploads/product/` and allow the customer to upload to that location
Akeneo does not detect file system changes, so I can only setup a cronjob that tries to import every hour or something. The drawback of this approach is that Akeneo will copy the CSV file(s) every time to `app/archive/import`. If you have big CSV files this can cause for some increment in disk usage.
I can setup a cronjob to export twice a day, but again: Akeneo will create archives on every export, so `app/archive/export` will grow even bigger every day. Please note that my customer has 4GB+ assets (images, documents, etc.). Does Akeneo cleanup the `app/archive`-folder every now and then?
Every exported archive comes in a new folder (with an every incremented job number (`app/archive/export/csv_product_export/28/` for example)), so I'm kind of wondering how I can detect for this new folder and how I can trigger to uploading of the archive to the remote (S)FTP server after the export is complete.
I was just wondering how other people who work with Akeneo handled these challenges. I know I can write my own custom bundle and hook into a ton of events or write shell scripts that do lot of the magic for me, but I am wondering what Akeneo itself already offers regarding this subject.
Any thoughts / ideas / suggestions / experiences on this topic are welcome!
To answer your questions:
Akeneo doesn't need to have csv uploaded in the app/uploads/product/
folder. You can define the csv location in the import profile. That way you can use whatever location you want.
To import images, you need to zip them with the csv file (to see how the structure of the archive should look like, you can export some products with media on demo.akeneo.com)
Setting up a cronjob seems to be a good idea. If the disk usage is a problem, this cronjob could also clean the folder after the import.
To export twice a day, you can use the export builder to only export products that have been updated since last export (delta export). That way, you don't use too much space for nothing.
Again, the app/archive/export/csv_product_export/28/ path is only for internal use. This is a working directory used by Akeneo during the export (before zip for example) and the final file (csv or zip) is moved to the defined destination (in job configuration).
With all those informations, here is my recommendation:
Write a simple bash/php script to detect change in a folder and if there is one, move the file to another location and launch the import.
If you want to handle images, you can add to your script a way to generate the zip file with the good format
Then to export to your ecommerce, setup a cronjob to export every hours and export only new or updated products to the desired destination.
Another way would be to use the new REST API which is well documented here: https://api.akeneo.com/

How to Connect CloverETL to Google Cloud Storage?

I am using CloverETL Designer for ETL operations and I want to load some csv files from GCS to my Clover graph. I used FlatFileReader and tried to get file using remote File URL but it is not working. Can someone please detail the entire process here??
The path for file in GCS is
https://storage.cloud.google.com/PATH/Write_to_a_file.csv
And I need to get this csv file into the FlatFileReader in CloverETL Designer
You should use the Google Cloud Storage API to GET the file; Clover's HTTPConnector component will allow you to pass in the appropriate parameters to make a GET request (you will presumably have to do an OAuth2 authentication first to get a token), and send the output to a local destination specified in "Output File URL." Then you can use a FlatFileReader to read from that local file.
GCS has several different ways to download files from your buckets. You can use the console and the Cloud Storage browser. Steps: open the storage browser, navigate to the object you want to download, right click, and save to your chosen local folder. If you use Chrome the save appears as “Save Link As…”.
To use the GS Utility, use this command:
`gsutil cp gs://[BucketName]/[ObjectName] [ObjectDestination]`.
Or you can use client libraries or the REST APIs to download files. With these last options you could work with a number of files or create a job to download them. Once they are in a location known to Clover ETL the process is straightforward.
Within Clover designer, under the navigation pane you can right click a folder and choose import. Pick the one in which you placed your GCS file. Once the file is imported then you can use data from it like any other datafile in Clover. Since this is a .csv file, remember to edit your metadata (right click the component, choose extract metadata then edit inside the Metadata Editor -- for data types, labels and such.) Assign metadata to the edges of your components so they know what is coming in/going out of that step. Depending on your file, this process may be repeated many times.
Even with an ETL tool, getting the data and data types correct can be tricky. If you have questions about how to configure data types or your edges in an ETL project, a wiki may help. The web has additional resources may help you get the end analysis you’re looking for.

FineUploader - get metadata during upload and store it

I am using FineUploader and it works just fine for my application needs. The files end up where I need them etc.
However once the file is uploaded, I really don't know what the file actually is. For instance the file could be a resume, a cover letter, a release of information, etc. For that I will need to "attach" additional metadata that relates to that file.
Is there a way to add some sort of a select box, where the user will select a category (before the upload process begins) so that the file can be identified by that category? The filename, size and other information gathered during the upload process are already stored in the database.
Any pointers are more than appreciated.
Edit: Duplicate Submit multiple form fields for each file in FineUploader

How do I edit files in place that were uploaded to Moodle?

I would like a better workflow for debugging uploaded SCOs. As things are, I must edit a file in the activity, repackage, upload, and test. Often, I just need to change a single line of code. It would be VERY nice to be able to edit that file, that line of code, on the server. So far, all I've found is that Moodle manages the files, so it seems impractical to locate and decipher the renamed files after upload.
Is there a way to configure Moodle so that it doesn't rename and relocated files in SCOs upon extraction? Actually, I'm open to any suggestions on the best, fastest workflow for debugging SCOs.
Problem background
Since Moodle 2.0, files are no longer stored on server in the conventional /this/is/the/path/to/my.file way. Instead, files are rehashed and stored in Repositories (i.e. spread all over the moodledata folder as a collection of seemingly random data). This increases security and cross-OS compatibility but complicates stuff for people who would like to simply upload a SCORM zip package via FTP. Here's more information on file handling in Moodle 2.0
Path to the soluton
Let's locate the file you want to update, then update it.
Run phpmyadmin, go to mdl_files table, find your file by name in the filename field (let's say it's portrait.jpg)
Look at the contenthash field, it'll look like abcde1234567890. This means your file is stored in moodledata/filedir/ab/cd/ folder under the name abcde1234567890.
Rename the updated portrait.jpg to abcde1234567890, upload and overwrite.
Go back to phpmyadmin and update the filesize field in record for portrait.jpg with the size of the updated file.
Obviously, this process can be automated. You'll have to write a script that allows you to upload a file, then it'll search for that file in mdl_files, save it to the correct folder and update all fields accordingly.
Alternative idea
Enable external package type (and also enable 'Update on every launch'). Go to Site administration / Plugins / Activities / SCORM and check the box down below. Now you'll be able to launch SCORM packages directly from another server, so Moodle won't mess with it. Of course, you can run in other (probably cross-domain related) problems.
Sergey's answer is very good, with one caveat:
In his example with the contenthash of abcde1234567890, the file is stored in the moodledata/filedir/ab/cd/ folder under the name abcde1234567890. Moodle uses the full contenthash to name the file.

Own data format for the iPhone

I would like to create my own data format for an iPhone app. The files should be similar structured as e.g. Apple's iWork files (.pages). That means, I have a folder with some files in it:
The file 'Juicy.fruit' contains:
Fruits
---> Apple.xml
---> Banana.xml
---> Pear.xml
---> PreviewPicture.png
This folder "Fruits" should be packed in a handy file 'Juicy.fruit'. Compression isn't necessary. How could I achieve this? I've discovered some open source ZIP-libraries. However, I would like to to build my own data format with the iPhones built-in libs (if possible).
Best regards,
Stefan
Okay, so there are three ways I am reading your question, here's my best guess on each one:
You want your .fruit files to be associated with your app via Safari/SMS/some network connection (aka when someone wants to download files made for your app or made by your app).
In this case, you can register a protocol for your app, as discussed here:
iPhone file extension app association
You want the iPhone to globally associate .fruit files with your app, in which case you want to look into Uniform Type Identifiers. Basically, you set up this association in your installer's info.plst file.
You want to know how you can go from having a folder with files in it to that folder being a single file (package) with your .fruit extension.
If that's the case, there are many options out there and I don't see a purpose in rolling your own. Both Microsoft and Adobe simply use a standard zip compression method and use their own extension (instead of .zip). If you drop any office 2007 document, such as docx or Adobe's experimental .pdfxml file into an archive utility (I like 7z, but any descent one will do), you will get a folder with several xml files, just like you're describing for your situation. (This is also how Java's jar file type works, fyi). So unless you have a great reason to avoid standard compression methods (I vote gzip), I would follow the industry lead on this one.
I can definitly appreciate the urge to go DIY at every level possible, but you're basically asking (if it's #3) how you can create your own packaging algorithm, and after reading how some of the most basic compression methods work, I would leave that one alone. Plus I really doubt that Apple has built in libraries for doing something that most people will just use standard methods for.
One last note:
If you are really gunning to do it from scratch (still suggest not), since your files are all XML, you could just create a new XML file that will act as a wrapper of sorts, and have each file go into that wrapper file. But this would be really redundant when it came time to unwrap, as it would have to load the whole file every time. But it would be something like:
Juicy.fruit --
<fruit-wrapper>
<fruit>
<apple>
... content from apple.xml
</apple>
</fruit>
<fruit>
<banana>
... content from banana.xml
</banana>
</fruit>
<fruit>
<pear>
... content from pear.xml
</pear>
</fruit>
<picture>
...URL-encoded binary of preview picture
</picture>
</fruit-wrapper>
But with this idea, you either have to choose to unpack it, and thus risk losing track of the files, overwriting some but not all, etc etc, or you always treat it like one big file, in which case, unlike with archives, you have to load all of the data each time to pull anything out, instead of just pulling the file you want from the archive.
But it could work, if you're determined.
Also, if you are interested, there is a transfer protocol intended specifically for XML over mobile called WBXML (Wap Binary XML). Not sure if it is still taken seriously, but if there is an iPhone library for it, you should research it.