Azure DataFactory Integration Runtime - azure-data-factory

I have manually created an integration runtime in Azure Data Factory. I have read few articles that said - Once we create an integration runtime in Data Factory, Microsoft bills for it though there is no activity using it unless it is terminated.
Is this true?

Azure integration runtime provides a fully managed, serverless compute in Azure. You don't have to worry about infrastructure provision, software installation, patching, or capacity scaling. In addition, you only pay for the duration of the actual utilization.
Ref Azure document here:
Azure IR compute resource and scaling.
Understanding Data Factory pricing through examples
To know more about the Data Factory pricing, you could reference here Data Factory Pipeline Orchestration and Execution:
If no active executed on the IR, you don't need pay for it.
Hope this helps.

Related

AWS Proton vs CloudFormation

Recently, I went to the AWS Proton service, I also tried to do a hands-on service, unfortunately, I was not able to succeed.
What I am not able to understand is what advantage I am getting with Proton, because the end to end pipeline I can build using CodeCommit, CodeDeploy, CodePipeline, and CloudFormation.
It will be great if someone could jot down the use cases where Proton can be used compared to the components which I suggested above.
From what I understand, AWS Proton is similar to AWS Service Catalog in that it allows
administrators prepare some CloudFormation (CFN) templates which Developers/Users can provision when they need them. The difference is that AWS Service Catalog is geared towards general users, e.g. those who just want to start a per-configured instance by Administrators, or provision entire infrastructures from the set of approve architectures (e.g. instance + rds + lambda functions). In contrast, AWS Proton is geared towards developers, so that they can provision by themselves entire architectures that they need for developments, such as CICD pipelines.
In both cases, CFN is used as a primary way in which these architectures are defined and provisioned. You can think of AWS Service Catalog and AWS Proton as high level services, while CFN as low level service which is used as a building block for the two others.
because the end to end pipeline I can build using CodeCommit, CodeDeploy, CodePipeline, and CloudFormation
Yes, in both cases (AWS Service Catalog and AWS Proton) you can do all of that. But not everyone want's to do it. Many AWS users and developers do not have time and/or interest in defining all the solutions they need in CFN. This is time consuming and requires experience. Also, its not a good security practice to allow everyone in your account provision everything they need without any constrains.
AWS Service Catalog and AWS Proton solve these issues as you can pre-define set of CFN templates and allow your users and developers to easily provision them. It also provide clear role separation in your account, so you have users which manage infrastructure and are administrators, while the other ones users/developers. This way both these groups of users concentrate on what they know best - infrastructure as code and software development.

Azure Storage: Deploy Queue via ARM template

Is there a way to deploy Queues within a Storage Account via ARM templates? Haven't found an option so far.
If yes, how?
If not, is it planned to provide this capability?
If not and not planned, which approach do you recommend for automated deployments?
Thanks
Is there a way to deploy Queues within a Storage Account via ARM templates?
No, it's not support now.
ARM templates could deploy Azure Storage accounts, blob container, but not support to deploy queues, or tables within storage account.
Here is a link to a sample template to create a container: https://azure.microsoft.com/en-us/resources/templates/101-storage-blob-container/
You can vote up your voice to this feedback to promote the further achieved.
A few ways you can think about data plane operations in deployment/apps
1) the code that consumes the queue can create on init if it doesn't exist
2) if you're deploying via a pipeline, use a pipeline task to perform data plane operations
3) in a template use the deploymentScript resource to create it - this is a bit heavyweight for the task you need, but it will work.
That help?

Spring Cloud Data Flow with Azure Event Hub limitations?

We plan to use Spring Cloud Data Flow on Azure Cloud using Azure EventHub as a messaging binder.
On Azure EventHub, there are hard limits :
100 Namespaces
10 topics per namespaces.
The Spring Cloud Azure Event Hub Stream Binder seems to be able to configure only one namespace, so how can we manage multiple namespaces?
Maybe we should use multiple binders, to have multiple instances of the Spring Cloud Azure Event Hub Stream Binder?
Does anyone have any ideas? or documentation we did not find?
Regards
RĂ©mi
Spring Cloud Data Flow and Spring Cloud Skipper support the concept of "platform accounts". Using that, you can set up multiple accounts, for each namespace or any other K8s clusters even. This opens a lot of flexibility to work around these hard limits in Azure stack.
We have a recipe on multi-platform deployments.
When deploying the streams from SCDF, you'd pick and choose the platform account (aka namespace or other configs), so automatically the deployed stream apps (with Azure binder in the classpath) would be running in different namespaces. Effectively, dodging the limits enforced in Azure.
The provenance tracking of where the apps run and the audit trail is automatically also captured in SCDF, so at any given time, you'd know who did what and in which namespace.

Stopping Cloud Data Fusion Instance

I have production pipelines which only runs for couple of hours using Google Data Fusion. I would like to stop the Data Fusion Instance and start it the next day. I don't see an option to stop the instance. Is there anyway we can stop the instance and start the same instance again ?
As per design Data Fusion instance is running in a GCP tenancy unit that guarantees the user fully automated way to manage all the cloud resources and services (GKE cluster, Cloud Storage, Cloud SQL, Persistent Disk, Elasticsearch, and Cloud KMS, etc.) for storing, developing and executing customer pipelines. Therefore, there is no possibility to terminate Data Fusion instance, thus all the pipeline service execution contributors are launching on demand and clearing after pipeline completion, find here the price charging concepts.

Azure Data Factory v2 and data processing in custom activity

I am migrating (extract-load) a large dataset to a LOB service, and would like to use Azure Data Factory v2 (ADF v2). This would be the cloud version of the same kind of orchestration typically implemented in SSIS. My source database and dataset, as well as the target platform are on Azure. That lead me to ADFv2 with Batch Service (ABS) and creating a custom activity.
https://learn.microsoft.com/en-us/azure/data-factory/transform-data-using-dotnet-custom-activity
However, I am unable to read from the documentation or samples provided by Microsoft how ADF v2 can create the job and tasks needed by the batch service.
As an example, lets say I have dataset with 10 million records, and batch service with 10 cores in a pool. How do I submit 1/10, or even row-for-row, to my command line app running on each of the cores in the pool? How do I distribute the work? Following the default guide at the docs for ADF v2, I just get a datasets.json file, and it is the same for all my pool nodes, no "slice" or subset information.
If ADF v2 was not involved I would create a job in ABS and for each row or for each X rows, create a task. The nodes would then execute task for task. How do I achieve something similar with ADF v2?