automating downloads of facebook insight data - facebook

I'm looking for a tool or process for exporting facebook insights data for a facebook page and a facebook app. Currently I am just manually downloading csv files from their Insights interface but ideally I want to automate this process and load the data into Pentaho Kettle, so I can perform some operations on the data.
Is there some way to automate the downloading and input of csv files? Or will I have to use the facebook graph api explorer? I am currently looking at a set-up where I use NetBeans and RestFB to pull the data I want, and then access that data using Pentaho Kettle. I am not sure if this will work, or if it is the best approach.

As Codek says, a Kettle plugin is a very good idea, and would be very helpful to the Kettle project. However, it's also a serious effort.
If you don't want to put in that kind of effort, you can certainly download files with a Kettle Job as long as the files are available through a standard transfer method (FTP, SFTP, SSH, etc). I've never used RestFB, so I don't know what's available. You might be able to get directly from a web service with the REST Client transform step.
After downloading the files, you can send them to a transform to be loaded. You can do this with either the Execute for every input row? option on the Transformation job step, or you can get the filenames from the job's result set in the transform with Get files from result.
Then you can archive the files after loading with Copy or Move result filenames. In one job, I find only files that are not in my archive using a Get File Names and Merge Join, and then a Set files in result step in a transform, so that can be done if need too.
To automate it, you can run your job from a scheduler using Kitchen.bat/Kitchen.sh. Since I use PostgreSQL a lot, I use PGAgent as my scheduler, but the Windows scheduler or cron work too.
Hope that helps.

Related

Tableau - Auto Archiving Historic Data

I have a live data source connected to Tableau Desktop. This source overwrites as it is updated. Is there a way to automatically save off monthly data from Tableau? Preferably, save the charts I have built.
There are a few options to consider if you are using Tableau Server:
REST API Export View: If you don't need interactivity, then you can automate exporting the image using the REST API. This is analogous to using the Worksheet -> Export -> Image feature in the front-end. There are other export options such as PDF and CSV. I would recommend using the Python client library over the raw REST API, but both are valid methods to export content.
If you need interactivity, then you will need to use an extract instead of a live data source. With this, you can automate exporting the workbook using the REST API. If you want a live query for normal use, then you can duplicate the workbook, covert the data source to extract, schedule the extract refresh, and download the workbook.
Scheduled Subscription: If you don't want to code, then you can schedule emailing an image of the view and manually save the images as needed. You could setup a dedicated shared resource mailbox and subscribe a user, typically a service account, to the schedule. This would allow you to consolidate all the subscriptions to a dedicated mailbox for future use.
TabCmd: If you are comfortable with tabcmd, you can automate exporting to CSV, Image, and PDF.
If you are only using Tableau Desktop, then the best option may be to convert the data source to an extract, automate refreshing the extract locally with tabcmd, and save a copy of the workbook to a folder while renaming the file to include a YYYY_MM_DD in the name. This will give you a fully functional copy of the workbook.

Fetch all metadata of Salesforce

I've been trying to implement a way to download all the changes made by a particular user in salesforce using PowerShell script & create a package The changes could be anything whether it can be added or modified, Apex classes, profiles, Account, etc based on the modified by the user, component ID, timestamp, etc. below is the URL that exposes the API. The URL Does not explain any way to do this by using a script.
https://developer.salesforce.com/docs/atlas.en-us.api_meta.meta/api_meta/meta_listmetadata.htm
Does anyone know how I can implement this?
Regards,
Kramer
Salesforce orgs other than scratch orgs do not currently provide source tracking, which makes it possible to pinpoint user changes in metadata and extract only those changes. This is done by an SFDX/Metadata API client, like Salesforce DX or CumulusCI (disclaimer: I'm on the CumulusCI team).
I would not try to implement a Metadata API client in PowerShell; instead, harness one of the existing tools to do so.
Salesforce orgs other than scratch orgs don't provide source tracking at present. To identify user changes, you can either
Attempt to extract all metadata and diff it against your version control, which is considerably harder than it sounds and is implemented by a variety of commercial DevOps tools for Salesforce (GearSet, Copado, etc).
Have the user manually add components to a Change Set or Unmanaged Package, and use a Metadata API client as above to retrieve the contents of that package. (Little-known fact, a Change Set can be retrieved as a package!)
To emphasize: DevOps on Salesforce does not work like other platforms. Working on the Metadata API requires a fair amount of time investment and specialization. Harness the existing work of the Salesforce community where you can, but be aware that the task you are laying out may be rather more involved than you think and it's not necessarily something you can just throw together from off-the-shelf components.

Which kind of Google Cloud Platform mobile backend client is appropriate?

THE PROBLEM
I'm writing a mobile app which will allow a user to log in, save some preferences that must be stored in a database, and display congressional bills to the user.
I've only written simple RESTful services with PHP and MySQL in the past. I'd like to take advantage of newer technologies, and am a little lost on general direction.
The bill data (formatted as JSON) can be gathered by running the scrapers found here. Using docker, I managed to set a working directory and download the files on my local machine.
I've designed a MySQL database for holding the relevant bill and user data.
I started to mess around in Google Cloud Platform, and read the doc that describes different models. I'm thinking of a few different ideas, but aren't familiar with GCP or what I can actually accomplish.
QUESTIONS
1) What are App Engine, Compute Engine, and Container Engine each for? I get the gist that Container Engine holds different instances of stuff you load up with docker, and that Compute Engine sets up a VM, but I don't really understand the relationships. How should I think of them?
2) When I run those scrapers from the shell, where are the files being stored, and how can I check on them? On my computer, I set a working directory, but how do directories work in GCP? Is it just a directory in the currently selected VM, or is this what Buckets are for?
IDEAS
1) Since my bill data already comes as JSON, should I skip the entire process of building a database for the bills and insert them into Firebase somehow? Is this even possible? If so, am I stuck using Firebase's NoSQL, or can I still set up a relational database?
2) I could schedule the scrapers to run periodically, detect new files, and run a script to parse the JSON and insert new bill data into my a database (PostgrSQL?/MySQL?). Then I would write an API.
3) Download the JSON files to a bucket, and write an API that reads from them. Not sure how the performance would compare to using a DB.
I'm open to other suggestions as well.
For your use case (stateless web application), App Engine is probably your best choice. The Google documentation has severalcomparisons of your computing options
You can use App Engine with PHP and cloud-hosted MySQL if you want, which could be a good way to get your toes wet without going in over your head.

uploading images to php app on GCE and storing them onto GCS

I have a php app running on several instances of Google Compute Engine (GCE). The app allows users to upload images of various sizes, resizes the images and then stores the resized images (and their thumbnails) in the storage disk and their meta data in the database.
What I've been trying to find is a method for storing the images onto Google Cloud Storage (GCS) through the php app running on GCE instances. A similar question was asked here but no clear answer was given there. Any hints or guidance on the best way for achieving this is highly appreciated.
You have several options, all with pros and cons.
Your first decision is how users upload data to your service. You might choose to have customers upload their initial data to Google Cloud Storage, where your app would then fetch it and transform it, or you could choose to have them upload it directly to your service. Let's assume you choose the second option, and you want users to stream data directly to your service.
Your service then transforms the data into a different size. Great. You now have a new file. If this was video, you might care about streaming the data to Google Cloud Storage as you encode it, but for images, let's assume you want to process the whole thing locally and then store it in GCS afterwards.
Now we have to get a file into GCS. It's a PHP app, and so as you have identified, your main three options are:
Invoke the GCS JSON API through the Google API PHP client.
Invoke either the GCS XML or JSON API via custom code.
Use gsutil.
Using gsutil will be the easiest solution here. On GCE, it automatically picks up appropriate credentials for your service account, and it's got several useful performance optimizations and tuning that a raw use of the API might not do without extra work (for example, multithreaded uploads). Plus it's already installed on your GCE instances.
The upside of the PHP API is that it's in-process and offers more fine-grained, programmatic control. As your logic gets more complicated, you may eventually prefer this approach. Getting it to perform as well as gsutil may take some extra work, though.
This choice is comparable to copying files via SCP with the "scp" command line application or by using the libssh2 library.
tl;dr; Using gsutil is a good idea unless you have a need to handle interactions with GCS more directly.

custom export/Import for alfresco

the obvious question is that is there any solution to export some alfresco contents which have a custom condition, for example export files which their create date is between a given date range?
the goal of this solution is:
1- to have a minimum mount of data volume in export/import action
2- in my weekly or monthly export/import action on backup alfresco server, I shouldn't have duplicate records for import action
thanks a lot for any kind of help
One idea is to use a library like OpenCMIS (Java) or cmislib (Python), both available from the Apache Chemistry project. Then use a CMIS query to restrict the data you want to export to a certain date range. If you want examples of CMIS queries, including ones that use date ranges, take a look at this Java example.
Another idea would be to use CMIS change tokens. Using this approach, you ask Alfresco what has changed since the last time your code ran. Alfresco responds back with a set of changes. You can then iterate over those changes and process them accordingly. The CMIS & Apache Chemistry in Action book has a change token example that uses Python to run a polling sync server between to CMIS repositories. The source code lives here.
Both of these options use CMIS. If you would rather have a native Alfresco option you could write a custom action that runs on a schedule to call the export. Or, you could use the File Transfer Service to write files to a file system on a schedule.
If what you are really trying to do is back up your repository, don't use any of these options. Instead you should be following standard practice for backup up the repo which is to dump the database and backup the content store.
Maybe you can use the Alfresco Replication Jobs to export your contents into a different repository.
In addition, you can export the contents to a file system using the FSTR feature.
Replication jobs use Alfresco Transfer Services that can be customised to only transfer some kind of content.