Choosing right tool - data-integration

I have following need:
1) Users will upload .xls or .csv files in "uploads" folder.
2) "uploads" folder have to be constantly monitored, and with each new file added to him, a job has to be started.
3) Job will process data from .xls or .csv file so they meet DB table structure, and write this data into DB table.
This have to be automated process, and I'm looking for all-in-one solution tool.

You didn't tell on which operating system, and you didn't tell if the user upload the files on a different server, or not. If the upload goes thru a web application (using an HTTP POST request), it is also different.
And I'm not sure that your wish scales well with many users.

You should take a look at Pentaho Data Integration, a.k.a. Kettle: http://sourceforge.net/projects/pentaho/
With Kettle you can desing a Job that pools the upload directory and once a file is found makes all the needed transformation and input on the desired database table.

Related

Akeneo 2.1 : Import / export best practices to set it all up

I'm currently setting up an Akeneo (2.1) instance that needs to communicate with an e-commerce solution. I was wondering what the best practices are when it comes to importing and exporting data. The documentation kind of lacks in this; it tells how you can set it up, but I'm missing practical use cases here.
Here's what I'm thinking of:
I want our customer to be able to upload their images / CSV files using an FTP connection.
Akeneo should ideally only start importing when a mutation in this (FTP) destination folder is detected.
Exporting should only be done once or twice a day, and upon completing the archive should be transfered with (s)FTP to a different location
I'm currently having trouble on how to implement this flow in Akeneo. Because if I look at what comes out of the box I can come up with the following:
I can setup an FTP account that ends up in `app/uploads/product/` and allow the customer to upload to that location
Akeneo does not detect file system changes, so I can only setup a cronjob that tries to import every hour or something. The drawback of this approach is that Akeneo will copy the CSV file(s) every time to `app/archive/import`. If you have big CSV files this can cause for some increment in disk usage.
I can setup a cronjob to export twice a day, but again: Akeneo will create archives on every export, so `app/archive/export` will grow even bigger every day. Please note that my customer has 4GB+ assets (images, documents, etc.). Does Akeneo cleanup the `app/archive`-folder every now and then?
Every exported archive comes in a new folder (with an every incremented job number (`app/archive/export/csv_product_export/28/` for example)), so I'm kind of wondering how I can detect for this new folder and how I can trigger to uploading of the archive to the remote (S)FTP server after the export is complete.
I was just wondering how other people who work with Akeneo handled these challenges. I know I can write my own custom bundle and hook into a ton of events or write shell scripts that do lot of the magic for me, but I am wondering what Akeneo itself already offers regarding this subject.
Any thoughts / ideas / suggestions / experiences on this topic are welcome!
To answer your questions:
Akeneo doesn't need to have csv uploaded in the app/uploads/product/
folder. You can define the csv location in the import profile. That way you can use whatever location you want.
To import images, you need to zip them with the csv file (to see how the structure of the archive should look like, you can export some products with media on demo.akeneo.com)
Setting up a cronjob seems to be a good idea. If the disk usage is a problem, this cronjob could also clean the folder after the import.
To export twice a day, you can use the export builder to only export products that have been updated since last export (delta export). That way, you don't use too much space for nothing.
Again, the app/archive/export/csv_product_export/28/ path is only for internal use. This is a working directory used by Akeneo during the export (before zip for example) and the final file (csv or zip) is moved to the defined destination (in job configuration).
With all those informations, here is my recommendation:
Write a simple bash/php script to detect change in a folder and if there is one, move the file to another location and launch the import.
If you want to handle images, you can add to your script a way to generate the zip file with the good format
Then to export to your ecommerce, setup a cronjob to export every hours and export only new or updated products to the desired destination.
Another way would be to use the new REST API which is well documented here: https://api.akeneo.com/

How to Connect CloverETL to Google Cloud Storage?

I am using CloverETL Designer for ETL operations and I want to load some csv files from GCS to my Clover graph. I used FlatFileReader and tried to get file using remote File URL but it is not working. Can someone please detail the entire process here??
The path for file in GCS is
https://storage.cloud.google.com/PATH/Write_to_a_file.csv
And I need to get this csv file into the FlatFileReader in CloverETL Designer
You should use the Google Cloud Storage API to GET the file; Clover's HTTPConnector component will allow you to pass in the appropriate parameters to make a GET request (you will presumably have to do an OAuth2 authentication first to get a token), and send the output to a local destination specified in "Output File URL." Then you can use a FlatFileReader to read from that local file.
GCS has several different ways to download files from your buckets. You can use the console and the Cloud Storage browser. Steps: open the storage browser, navigate to the object you want to download, right click, and save to your chosen local folder. If you use Chrome the save appears as “Save Link As…”.
To use the GS Utility, use this command:
`gsutil cp gs://[BucketName]/[ObjectName] [ObjectDestination]`.
Or you can use client libraries or the REST APIs to download files. With these last options you could work with a number of files or create a job to download them. Once they are in a location known to Clover ETL the process is straightforward.
Within Clover designer, under the navigation pane you can right click a folder and choose import. Pick the one in which you placed your GCS file. Once the file is imported then you can use data from it like any other datafile in Clover. Since this is a .csv file, remember to edit your metadata (right click the component, choose extract metadata then edit inside the Metadata Editor -- for data types, labels and such.) Assign metadata to the edges of your components so they know what is coming in/going out of that step. Depending on your file, this process may be repeated many times.
Even with an ETL tool, getting the data and data types correct can be tricky. If you have questions about how to configure data types or your edges in an ETL project, a wiki may help. The web has additional resources may help you get the end analysis you’re looking for.

How do I edit files in place that were uploaded to Moodle?

I would like a better workflow for debugging uploaded SCOs. As things are, I must edit a file in the activity, repackage, upload, and test. Often, I just need to change a single line of code. It would be VERY nice to be able to edit that file, that line of code, on the server. So far, all I've found is that Moodle manages the files, so it seems impractical to locate and decipher the renamed files after upload.
Is there a way to configure Moodle so that it doesn't rename and relocated files in SCOs upon extraction? Actually, I'm open to any suggestions on the best, fastest workflow for debugging SCOs.
Problem background
Since Moodle 2.0, files are no longer stored on server in the conventional /this/is/the/path/to/my.file way. Instead, files are rehashed and stored in Repositories (i.e. spread all over the moodledata folder as a collection of seemingly random data). This increases security and cross-OS compatibility but complicates stuff for people who would like to simply upload a SCORM zip package via FTP. Here's more information on file handling in Moodle 2.0
Path to the soluton
Let's locate the file you want to update, then update it.
Run phpmyadmin, go to mdl_files table, find your file by name in the filename field (let's say it's portrait.jpg)
Look at the contenthash field, it'll look like abcde1234567890. This means your file is stored in moodledata/filedir/ab/cd/ folder under the name abcde1234567890.
Rename the updated portrait.jpg to abcde1234567890, upload and overwrite.
Go back to phpmyadmin and update the filesize field in record for portrait.jpg with the size of the updated file.
Obviously, this process can be automated. You'll have to write a script that allows you to upload a file, then it'll search for that file in mdl_files, save it to the correct folder and update all fields accordingly.
Alternative idea
Enable external package type (and also enable 'Update on every launch'). Go to Site administration / Plugins / Activities / SCORM and check the box down below. Now you'll be able to launch SCORM packages directly from another server, so Moodle won't mess with it. Of course, you can run in other (probably cross-domain related) problems.
Sergey's answer is very good, with one caveat:
In his example with the contenthash of abcde1234567890, the file is stored in the moodledata/filedir/ab/cd/ folder under the name abcde1234567890. Moodle uses the full contenthash to name the file.

Update a .csv file on server from iPad programmatically

I am developing a simple iPad application that should submit some text from a form to a .csv file. I could manage to update the .csv file which is saved locally in the documents folder on my computer. However, I need to keep the file on a server, probably download the file, append data, and upload it again (Export a bulk of data to the file on server). Any idea how I could do something like that?
I guess ftp might be the easiest way, see this question
your other options likely involve writing a server-side service to post data to.

Uploading files different servers at once using a single html Form

Is there a way to make a form where it can simultaneously upload to several servers at once?
Currently in my web application, I am asking the users to type in some info + select a few files to upload.
Title, Description, Info, etc
File 0
File 1
File 2
File ...
On the backend, I'm using Pylons. Currently it accepts POST of (info + all files), processes the info and the first file (file 0), and uploads them again to Amazon S3. I only need to process info and the 'file 0' on my own server, the rest of the files I can pass through directly to S3 via a POST.
Is there a way to make a form where the info+file0 will be POST'ed to one server, and the rest of the files be POST'ed directly to S3?
While you could employ JavaScript trickery to make clicking on a submit button result in multiple POST requests to multiple servers, such a solution will almost certainly cause more problems than it solves. You should probably stick with your current method.
Check this and this out.
My solution would involve ajax upload(s) to S3 as a first step. Once you're done with the files, you can submit all the other fields (not ajax) to your webserver, eventually adding the generated S3 links.