How to Connect CloverETL to Google Cloud Storage? - google-cloud-storage

I am using CloverETL Designer for ETL operations and I want to load some csv files from GCS to my Clover graph. I used FlatFileReader and tried to get file using remote File URL but it is not working. Can someone please detail the entire process here??
The path for file in GCS is
https://storage.cloud.google.com/PATH/Write_to_a_file.csv
And I need to get this csv file into the FlatFileReader in CloverETL Designer

You should use the Google Cloud Storage API to GET the file; Clover's HTTPConnector component will allow you to pass in the appropriate parameters to make a GET request (you will presumably have to do an OAuth2 authentication first to get a token), and send the output to a local destination specified in "Output File URL." Then you can use a FlatFileReader to read from that local file.

GCS has several different ways to download files from your buckets. You can use the console and the Cloud Storage browser. Steps: open the storage browser, navigate to the object you want to download, right click, and save to your chosen local folder. If you use Chrome the save appears as “Save Link As…”.
To use the GS Utility, use this command:
`gsutil cp gs://[BucketName]/[ObjectName] [ObjectDestination]`.
Or you can use client libraries or the REST APIs to download files. With these last options you could work with a number of files or create a job to download them. Once they are in a location known to Clover ETL the process is straightforward.
Within Clover designer, under the navigation pane you can right click a folder and choose import. Pick the one in which you placed your GCS file. Once the file is imported then you can use data from it like any other datafile in Clover. Since this is a .csv file, remember to edit your metadata (right click the component, choose extract metadata then edit inside the Metadata Editor -- for data types, labels and such.) Assign metadata to the edges of your components so they know what is coming in/going out of that step. Depending on your file, this process may be repeated many times.
Even with an ETL tool, getting the data and data types correct can be tricky. If you have questions about how to configure data types or your edges in an ETL project, a wiki may help. The web has additional resources may help you get the end analysis you’re looking for.

Related

PowerBI/Powershell - Get list of datasources that a PBIX file uses

I am trying to get a list of datasources that a Powerbi file is using. I seen solutions online where I can use the ReportingService module to get a list but this only works when the PowerBI report is published online. Is there a solution that would work for a local file?
Here is the situation.
A user gives me a Powerbi file. In order for me to get a list of datasources, I have to go in manually and to take a look at sources manually. Ideally, I would like to use Powershell to get this list.
There isn't an API that can access the desktop application. You would have to brute force it.
The PBX file is basically a Zip file which then contains separate files with JSON information. You would have to follow the following steps:
Use Expand-Archive to get the files out of the PBX (Not sure if you will need to change the file extension first).
Read the "Connections" file (Which is Json). It will have the various connection strings used by the model.
You can do this manually by changing the file extension to Zip and opening the Zip file directly, and looking at the connections file in notepad.

Difference between a twb and twbx

Please provide some information about difference between twb workbook with extract and twbx workbook. Also I am facing some issues, I have workbook(twbx) on Tableau Server which use published extract. Extract was refreshed today. But workbook shows old data....
TWB - XML file for your Tableau Workbook, contains all the selections and layout you've made. It does not contain data. These tend to be very small.
TWBX - zipped file that contains the TWB as well as data used by TWB in an extract
Here's some more info from the Tableau website.
http://kb.tableausoftware.com/articles/knowledgebase/sending-packaged-workbook
Try closing & opening your workbook. If that doesn't refr
Make sure that the data at the path or database connection that the Tableau Server points to the exact source you wish to refresh from.
Remember the Server may have different drives mounted, different firewall rules. If you are reading from a file like Excel or Access to create your extract; changing the version of the file elsewhere on the file system won't affect the extract on the Tableau Server if that extract points elsewhere (kind of obvious, but often forgotten, especially if a copy of the Excel file is bundled up into the twbx file).
It is also often a good idea in production to publish a data source and extract separately from the workbooks that use it so that they can be updated independently. Look under the data menu to find the publish command.
TWBX is intended for sharing. It does not link to the original file source; instead it contains a copy of the data that was obtained when the file was created.
If you need to give them TWBX, you can create a TWB as a template and then use it to create TWBX when your data source is updated. Your clients will get a TWBX that they want and you don't have to do anything.
You can even have a batch process for that. Here is a video: https://www.youtube.com/watch?v=Odk2xr6qOoQ
As Ryan mentioned, the twbx file contains its own data extract. Since you have a twbx file that uses a published data extract as its source, you basically created an extract of the original extract. In other words, the data is not coming from the published extract anymore, but is self-contained in the workbook itself, so refreshing the published extract won't update your workbook.
You can try scheduling the workbook itself (after the refresh of the extract of course). However, that didn't work for me, and I always have to refresh the extract manually from the Tableau Desktop.

How do I edit files in place that were uploaded to Moodle?

I would like a better workflow for debugging uploaded SCOs. As things are, I must edit a file in the activity, repackage, upload, and test. Often, I just need to change a single line of code. It would be VERY nice to be able to edit that file, that line of code, on the server. So far, all I've found is that Moodle manages the files, so it seems impractical to locate and decipher the renamed files after upload.
Is there a way to configure Moodle so that it doesn't rename and relocated files in SCOs upon extraction? Actually, I'm open to any suggestions on the best, fastest workflow for debugging SCOs.
Problem background
Since Moodle 2.0, files are no longer stored on server in the conventional /this/is/the/path/to/my.file way. Instead, files are rehashed and stored in Repositories (i.e. spread all over the moodledata folder as a collection of seemingly random data). This increases security and cross-OS compatibility but complicates stuff for people who would like to simply upload a SCORM zip package via FTP. Here's more information on file handling in Moodle 2.0
Path to the soluton
Let's locate the file you want to update, then update it.
Run phpmyadmin, go to mdl_files table, find your file by name in the filename field (let's say it's portrait.jpg)
Look at the contenthash field, it'll look like abcde1234567890. This means your file is stored in moodledata/filedir/ab/cd/ folder under the name abcde1234567890.
Rename the updated portrait.jpg to abcde1234567890, upload and overwrite.
Go back to phpmyadmin and update the filesize field in record for portrait.jpg with the size of the updated file.
Obviously, this process can be automated. You'll have to write a script that allows you to upload a file, then it'll search for that file in mdl_files, save it to the correct folder and update all fields accordingly.
Alternative idea
Enable external package type (and also enable 'Update on every launch'). Go to Site administration / Plugins / Activities / SCORM and check the box down below. Now you'll be able to launch SCORM packages directly from another server, so Moodle won't mess with it. Of course, you can run in other (probably cross-domain related) problems.
Sergey's answer is very good, with one caveat:
In his example with the contenthash of abcde1234567890, the file is stored in the moodledata/filedir/ab/cd/ folder under the name abcde1234567890. Moodle uses the full contenthash to name the file.

Choosing right tool

I have following need:
1) Users will upload .xls or .csv files in "uploads" folder.
2) "uploads" folder have to be constantly monitored, and with each new file added to him, a job has to be started.
3) Job will process data from .xls or .csv file so they meet DB table structure, and write this data into DB table.
This have to be automated process, and I'm looking for all-in-one solution tool.
You didn't tell on which operating system, and you didn't tell if the user upload the files on a different server, or not. If the upload goes thru a web application (using an HTTP POST request), it is also different.
And I'm not sure that your wish scales well with many users.
You should take a look at Pentaho Data Integration, a.k.a. Kettle: http://sourceforge.net/projects/pentaho/
With Kettle you can desing a Job that pools the upload directory and once a file is found makes all the needed transformation and input on the desired database table.

Non class files with Java Web Start

How do you distribute other files needed by your application that aren't in a jar file? For example, the application at http://www.javabeginner.com/java-swing/java-swing-shuffle-game . The download contains Shuffle.jar, Shuffle.bat, Score.dat, and an images folder with 3 images in it. I can see possibly putting the images directly in Shuffle.jar, but you wouldn't want to put Score.dat in the jar file because it changes. Is there somewhere you could identify this type of file in the jnlp?
The non-java files should be stored as resources. For files that change, you store the original or template file also as a resource in your jar. When the program starts, you have it check the local system to see if that file exists. If not, it creates the local file by copying the template file from the JAR resource. If the file already exists, then it is used as is.
To save files to the local system, even when running in the sandbox (unsigned), you can use the PersistenceService (javadoc / example). If your java application is signed, then you can use the regular File apis to write the file to the local machine, such as in a ".yourgame" subfolder under the user's home folder.
You can put all those files (except the scores file) in your jar file and load the contents using resource loading.
I've just deleted and restarted my reply twice now, changing my answer each time; this is confusing and needs a bit more clarification.
Are you SURE that application is supposed to be a Web Start app? On the site you linked to, it doesn't appear to be. Are you trying to take an application that was not designed as a Web Start application and change it into one that can be Web Start?
If it's not a Web Start app as your tag implies, then this question is open ended. You can distribute it 100 different ways.
If you are indeed trying to convert it into a Web Start app, you can start by packaging the images into the jar and that will alleviate your first headache if you just read them from there instead of from a File(). If it's going to be Web Start, then you need to decide how you want to keep scores. You have to decide what the scoring system is like before you can decide on how to go about it; will all the scores be kept on the web site hosting the Web Start app? Will that part still be local? If you want to get access to the local file system, you need to sign the jar, then you can extract the score.dat to the file system and do whatever you want with it if the end user accepts.
You need to figure out what you want to do before you can do it, or at least clear it up for us if you already know more than we know you know.