How to call HTTP endpoint with data from Data Fusion / CDAP - google-cloud-data-fusion

I'm trying to call an HTTP endpoint (service/function) from my Data Fusion / CDAP real-time (streaming) pipeline. This HTTP endpoint serves a training machine learning model (via Google Cloud AI Platform Unified). I need to pass some data from my pipeline to this endpoint and ontain data back (i.e. send a chunk of pre-processed data and obtain classification result back to path it further in my Data Fusion / CDAP pipeline). How can I do it?
I've looked into:
HTTP Plug-ins, but they support either Sink or Source, while I need a Transform plugin (i.e. data in -> call http service -> data out);
Wrangler's invoke-http directive (https://cdap.atlassian.net/wiki/spaces/DOCS/pages/382107784/Invoke+HTTP+directive), but it does not support body formating and nested JSON (e.g. Cloud AI Platform serves machine learning models via nested JSON, also reply is nested JSON); also how to debug and handle errors there is not clear to me;
python transform plug-in, but it is restrictive in terms of importing modules when run in Interpreted mode

For your usecase, I would recommend to create a custom plugin that will use the Google Cloud AI Platform API directly, then you will have more flexibility in formating the input and output of the data.
For an example, there is a DLP plugin that communicates to Google Cloud DLP APIs: https://github.com/data-integrations/dlp-plugins

Related

Consume RESTFull service and insert them into Context broker

I have an external service that provides weather data via Restfull API with authentication.
What would be the best option to able to consume the services and send/insert the data to a Context broker.
I was thinking to develop a custom IoT Agent with json file to provide the external Restfull service endpoint and configuration for the Context broker.
Is any other option to achieve the same functionality?
The big question here is whether you need to inject the data into the context broker, or just inform the context broker that such data exists. If you want to consider the weather station as a device, then indeed, your proposed architecture makes sense:
Create a chron job to fire periodically
Generate a file in a known format (e.g. JSON) and pass the file to a custom micro-service
The micro-service interprets the file and runs a batch upsert to send all the data as measures into the context broker
An example of this with a code walkthrough is discussed in the following webinar
The alternative would be to create a micro-service which listens to the registration endpoint(s) - for NGSI-v2 uses the /v2/op/query batch endpoint for this, for NGSI-LD it is a direct forwarding of a request. In this scenario, the weather-station data remains outside of the context broker itself and can be used to augment existing entities. A working example can be found within the FIWARE Tutorials
Obviously the route you choose will depend upon what you need to do with the data, if you need to subscribe to temperature changes for example, then it is better to treat the weather station as a device providing context data in the form of measures and go for Option 1.

How to stream data from Firebase REST API to Beckhoffs's PLC using SSE (TwinCAT3)?

I can normally read data from Firebase Realtime Database via REST API through .GET requests. And the same applies for writing data with .PUT requests. But in the Firebase Realtime Database REST API documentation it is specified that you can also set a SSE listener (EventSource / Server-Sent Events ).
Thus far I have
Set the Accept header to "text/event-stream" as stated in the documentation (with FB_IotHttpHeaderFieldMap and its method AddField).
Set the HTTP security layer SSL (so that PLC would communicate with REST API through HTTPS as needed by the documentation).
But now I can't wrap my head around what I should do next ...
How would you approach this problem?
What is the next step into setting an SSE listener?
And if there is no built-in way to do this - is it possible to code
it by myself?
Using: TwinCAT XAE (VS 2017) on Windows 10

real-time data update with cloud firestore via api rest call

I would like to create a Cloud Firestore API while maintaining my business rule within Cloud Functions, I would like to process these requirements in my Angular application and update the data in real time.
Is it possible to do real time updates if I create a Rest API with Cloud Firestore?
If you're building your REST API on top of Cloud Functions, this will not be possible. Cloud Functions doesn't support HTTP chunked transfer encoding, nor does it support HTTP/2 streaming that would be required to let the client keep a socket open for the duration of the listen. Instead, functions are required to send their entire response to the client at once, with a size less than 10MB, within the timeout of the function (max 9 minutes).

Read server time from AWS DynamoDb, swift

I need to get the current server time/timestamp from AWS dynamodb to my ios swift application.
In firebase db we can write the current timestamp to db and after that we can read it from the app. Any suggestion about this is appreciated.
DynamoDB does not provide any sort of server time—any timestamps must be added by the client. That being said, you can emulate a server time behavior by setting up a Lambda function or an EC2 instance as a write proxy for DynamoDB and have it add a timestamp to anything being written to DynamoDB. But it’s actually even easier than that.
AWS allows you to use API Gateway to act as a proxy to many AWS services. The process is a little long to explain in detail here, but there is an in-depth AWS blog post you can follow for setting up a proxy for DynamoDB. The short version is that you can create a rest endpoint, choose “AWS Service Proxy” as the integration type, and apply a transformation to the request that inserts the time of the request (as seen by API Gateway). The exact request mapping you set up will depend on how you want to define the REST resources and on the tables you are writing to. There is a request context variable that you can use to get the API Gateway server time. It is $context.requestTimeEpoch.

Azure Message size limit and IOT

I read through azure documentation and found that the message size limit of Queues is 64 Kb and Service Bus is 256 KB. We are trying to develop an application which will read sensor data from the some devices, call a REST Service and upload it to cloud . This data will be stored in the queues and then dumped in to a Cloud database.
There could be chances that the sensor data collected is more than 256 KB... In such cases what is the recommended approach... Do we need to split the data
in the REST service and then put chunks of data in the queue or is there any other recommended pattern
Any help is appreciated
You have several conflicting technology statements. I will begin by clarifying a few.
Service Bus/IoT Hub are not post calls. A post call would use a
restful service, which exists separately. IoT Hub uses a low
latency message passing system that is abstracted from you. These
are intended to be high volume small packets and fits most IoT
scenarios.
In the situation in which a message is larger than 256 KB (which is very interesting for an IoT scenario, I would be interested to
see why those messages are so large), you should ideally upload to
blob storage. You can still post packets
If you have access to blob storage api's with your devices, you should go that route
If you do not have access to this, you should post big packets to a rest endpoint and cross your fingers it makes it or chop it up.
You can run post analytics on blob storage, I would recommend using the wasb prefix as those containers are Hadoop compliant and you can stand up analytics clusters on top of those storage mechanisms.
You have no real need for a queue that I can immediately see.
You should take a look at the patterns leveraging:
Stream Analytics: https://azure.microsoft.com/en-us/services/stream-analytics/
Azure Data Factory: https://azure.microsoft.com/en-us/services/data-factory/
Your typical ingestion will be: Get your data up into the cloud into super cheap storage as easily as possible and then deal with analytics later using clusters you can stand up and tear down on demand. That cheap storage is typically blob and that analytics cluster is usually some form of Hadoop. Using data factory allows you to pipe your data around as you figure out what you are going to use specific components of it for.
Example of having used HBase as ingestion with cheap blob storage as the underlayment and Azure Machine Learning as part of my analytics solution: http://indiedevspot.com/2015/07/09/powering-azureml-with-hadoop-hbase/