Programmatic export/dump/mass data retrieval (BaaS) - rest

Does anyone have experiences with programmatic exports of data in conjunction with BaaS providers like e.g. parse.com or StackMob?
I am aware that both providers (as far as I can tell from the marketing talk) offer a REST API which will allow for queries against the database, not only to be used by mobile clients but also by e.g. custom web apps.
I am also aware that both providers offer a manual export of data (parse.com via their web interface, StackMob via support).
But lets say I would like to dump all data nightly, so that I can import it into a reporting system for instance. Or maybe simply to have an up-to-date backup.
In this case, I would need a programmatic way to export/replicate the data stored in the backend. Manual exports are not an option for obvious reasons.
The REST APIs offered however seem to be designed for specific queries, not for mass reads (performance?). Let alone the pricing - I assume none of the providers would be happy about a nightly X Gigabyte data export via their REST API, so their probably will be a price tag.
I just couldn't find any specific information on this topic so far, so I was wondering if anyone else has already gone through this. Also, any suggestions on StackMob/parse alternatives are welcome, especially if related to the data export topic.
Cheers, Alex

Did you see the section of the Parse REST API on Batch operations? Batch operations reduce the number of API calls needed to grab data so that you are not using a call for every row you retrieve. Keep in mind that there is still a limit (the default is 100, but you can set it to a maximum of 1000). That means you are still limited to pulling down 1000 rows per API call.
I can't comment on StackMob because I haven't used it. At my present job, we are using Parse and we wrote a C# app which compares the data in a Parse class with a SQL table and pulls down any changes.

Related

How to get complete metadata of dataset in Palantir Foundry through API call?

I want to fetch complete metadata of the given dataset through API call. Can anyone please suggest how to fetch metadata
You actually already manipulate and interact with various forms of metadata inside your Transforms Python builds today, but in a way that is structured to be safe when reading and writing.
While not all forms of metadata are possible to access today, this generally is because of the desire to ensure product stability and good version controls of your builds.
That said, if there's a certain interaction with metadata you'd like to see in the product, I'd recommend reaching out to your support engineers with a feature request so they can understand your request more specifically and discuss with our product teams.

WebApi supporting Range requests without querying the db multiple times

Currently I have a dotnetcore WebApi that is serving up videos. The videos are stored in a SQL server table as a varbinary(MAX). This was working however I was reading that to support on IOS safari we needed to accept the ranges header, so I have added support for this (I think).
However now I am noticing two things (could be unrelated):
1) Whenever a call is made to this API the CPU throttles to 100%. I can only assume that is EntityFramework querying the db for a 25MB file. Seems crazy but the API is doing nothing else? Can this be improved as the server just grinds.
2) Multiple requests are made to the API with different range bytes requested. However my api in turn queries the db on each request and so sends the CPU into overdrive for a long period.
Is there a better way of handling range requests when querying for a large object?
If you ask me, EF is not really well suited for this, it's too clunky and resources consuming. You can write your own T-SQL using something like substring. This being said, from a practical point of view, depending on how many and how big these files are and how many users you have, I would not go with such a solution.
I don't think a SQL database should be how you store your data at all for this.
You could start doing some research on how netflix does it: https://www.techhive.com/article/2158040/how-netflix-streams-movies-to-your-tv.html
You probably want something like that, a CDN system, some sort of caching. Your way of doing it now might work while you build it, with one or two users but if this is an API used by lots of people, you will quickly find out that it won't scale.

Accessing and Updating External Databases From Salesforce

I need to connect Salesforce to an external database we have, and constantly keep both the database and salesforce updated in as close to real time as we can get. I have tired Google searching possible solutions, but nearly all of them have been outdated by over a year. Any ideas?
Thank You!
Depending on your exact scenario it is quite difficult to give you a proper answer.
However off the top of my head I would suggest two Salesforce products.
Salesforce Connect
https://www.salesforce.com/products/platform/products/salesforce-connect/
Salesforce Connect allows you to connect to various data sources and turn the tables / objects of that data source into a SObject. For example MySQL, Microsoft SQL Server, Oracle etc. There are limitations and thus it would be better to talk to a Certified Architect about such an implementation.
Heroku Connect
https://www.heroku.com/connect
Heroku Connect allows you to connect a Heroku data source with a Salesforce Object. The sync is not immediate but there are quite a few customisations inside the product to make the sync as "live" as possible. There are limitations and thus it would be better to talk to a Certified Architect about such an implementation.
Salesforce Connect has limitations.. It's good for presenting data via the interface, but if you need to act on the data and report on the data it might not be the best bet.
For close to real time hand coded sync, look at the streaming API, or using Salesforce Platform Events.
If you want to use an ETL tool, my organization has had decent luck with DBAmp, which is a Sql add on product and fairly inexpensive as compared to a lot of ETL tools ($1625 annually.) http://www.forceamp.com/ We're able to replicate the entire SF database offline in SQL with DBAMP, push changes to the offline Sql copy and upsert changes. It's also a good backup solution via offline full data copy. We got very good support from them as well when we encountered challenges.
Hope this helps.
Not sure if you are syncing one object or multiple objects but there are a few options that you have.
You can try the salesforce provided features Salesforce Connect which allows you to view and update data from your external source In salesforce but there are limitations with reporting and other considerations you should consider.
If you make use of Heroku, Heroku Connect is your best bet
You can also use a middleware ESB solutions like MuleSoft which can orchestrate keeping data in sync across multiple data sources and do batch loads, but depending on how often changes you want to keep an eye out for api limits for inbound calls to salesforce.
You can roll your own solution where you can use Outbound Messages in workflow (or triggers that initiates an apex class that calls out, but that is more cumbersome and you have to do custom error handling and retry logic which you get for free using outbound messages) to send changes from salesforce to your homegrown service that writes to you database and have you homegrown solution write back to salesforce using the soap or rest api. That would probably take you some time to build. You would also still need to be aware of API limits depending on how many updates are made on the non salesforce side.
You crate a Canvas App which displays data from your DB in Salesforce as a Tab and hook it up via SSO so users are auto logged in. But again there would not be reporting, or any salesforce features that you can take advantage of.
But I really think that you should spend some time to determine what system is your source of truth because that would determine how the data should be synced. You should also investigate if you really need the sync to be realtime or near realtime, or if you can manage with something like an hourly true up on the system that is not the source of truth.

exporting data from Bluemix Presence Insights

I'm trying to export data from Presence Insights on Bluemix, I followed the following documentation:
https://presenceinsights.ng.bluemix.net/pidocs/analytics/export/
however I can't seem to find export button mentioned inside the document.
Data can be exported from the IBM Presence Insights Dashboard if you have data available. There are also REST APIs for exporting data. They are documented in the Floors, Sites, and Zones sections of the API Reference.
There were REST APIs in the product some time ago, but they were found to have limitations that made them less useful in production. In particular, the amount of data that builds up forces the response time on the API to grow beyond what the Bluemix infrastructure allowed. The API requests would timeout. To that end, the APIs were backed out, but it appears the documentation was left. That will be removed shortly.
Presence Insights still understands the value of exporting the data, so a new scheme is under investigation. For example, it would be ideal if the data could be exported under the covers to a production data storage facility, on a regular time frame.
In the interim, an alternative solution would be to use a Subscription to gather the backend enter/exit/dwell/timeout events directly and roll your own solution to store only what you need in whatever facility works for your application.

Google Fusion Table REST Api vs Advanced Services Fusion Table Services in app scripts

I am very confused about the correct or recommended mechanism to use for accessing google fusion tables APIs in app scripts. There seem to be two methods with examples but no discussion about which is preferred or why. Is one of these interfaces newer and preferred while the other is dying? Is one obsolete or more restricted in what it can do?
Method 1 is the REST API described here
https://developers.google.com/fusiontables/docs/v2/sql-reference#Select
Method 2 is a set of library functions sort of described here under the Apps Script/Google Advanced Services:
https://developers.google.com/apps-script/advanced/fusion-tables
For example, using the REST api to do a dql query, we end up with something like this:
function runSQL(sql){
var getDataURL = 'https://www.googleapis.com/fusiontables/v1/query?sql='+sql;
var dataResponse = UrlFetchApp.fetch(getDataURL,getUrlFetchOptions()).getContentText();
return dataResponse;
}
And using the advanced API we use something like this:
result = FusionTables.Query.sql(sql, { hdrs: false });
The REST API seems much harder to use, requireing complex oAuth and developer keys to be configured in advance and coded into the application while the Advanced Services API harvests all this behind the scenes and makes for simple API calls like I show here.
I have seen numerous examples using each of the above with no hint as to why one author chose her mechanism instead of the other.
Your help is greatly appreciated.
The service within app-script is a work in progress, so the full functionality of the API might not be fully supported at the moment. As you mentioned though, the big advantage of the service over the REST API is that you do not have to handle the OAuth flow, as you only need to enable it on your script (as stated here).
The Apps Script "advanced service" implementation still lacks some advanced functionality (like alt=media format queries or multipart / resumable uploads) -- if it actually has those features, it lacks extremely basic documentation of them, to the point that the Apps Script editor autocomplete is unaware of them. The tradeoff of these functionality gaps is that you don't need to handle keys, request building, etc.
So, if you're doing simple sql select / importRows work, the Advanced Service should be able to cover almost all your needs. If you need to delete from your FusionTables, you might want to consider setting up the REST API - because deleting is 1 record per query, the better way to delete is to instead "download what you want to keep, then re-upload it back via replaceRows."
(This worked for me for a while, but eventually what I was keeping outgrew the Apps Script service's limitations and I began receiving Empty Response errors from the call to replaceRows. My remedy was to perform my record maintenance tasks via the REST API, where I can specify resumable uploads, timeouts, etc., while more "normal" interactions are done through the Advanced Service.)