Data Factory - Azure File Linked Service - Managed Identity - azure-data-factory

When creating a new linked service for data factory I am able to select "Managed Identity" for connection to storage account\blob but this isn't an option for same storage account\file.
Is this a known limitation?
Works ok with blob:
No option for Managed Identity for file share:

Azure file storage connector in Azure data factory linked service currently supports only below authentication:
Account key authentication
Shared access signature authentication
Refer to this document for more information on linked service for Azure File storage connector.

Related

error connecting to azure data lake in azure data factory

I am trying to create a linked service in Azure Data Factory to an Azure Data Lake Storage Gen2 data store. Below is my linked service configuration:
I get the following error message when I test the connection:
Error code 24200 Details ADLS Gen2 operation failed for: Storage
operation '' on container 'testconnection' get failed with 'Operation
returned an invalid status code 'Forbidden''. Possible root causes:
(1). It's possible because some IP address ranges of Azure Data
Factory are not allowed by your Azure Storage firewall settings. Azure
Data Factory IP ranges please refer
https://learn.microsoft.com/en-us/azure/data-factory/azure-integration-runtime-ip-addresses..
I have found a very similar question here, but I'm not using Managed Identity as my authentication method. Perhaps I should be using that method. How can I overcome this error?
I tried to create a linked service to my Azure Data Lake storage and when I test its connection, it gives me the same error.
Error code 24200 Details ADLS Gen2 operation failed for: Storage
operation '' on container 'testconnection' get failed with 'Operation
returned an invalid status code 'Forbidden''. Possible root causes:
(1). It's possible because some IP address ranges of Azure Data
Factory are not allowed by your Azure Storage firewall settings. Azure
Data Factory IP ranges please refer
https://learn.microsoft.com/en-us/azure/data-factory/azure-integration-runtime-ip-addresses
As indicated by the Possible root causes in the error details, this occurred because of the Azure data lake storage account firewall settings.
Navigate to your data lake storage account, go to Networking -> Firewalls and virtual networks.
Here, when the public network access is either disabled or enabled from selected virtual networks and IP addresses, the linked service creation fails with the above specified error message.
Change it to Enabled from all networks save the changes and try creating the linked service again.
When we test the connection before creating the linked service, it will be successful, and we can proceed to create it.
UPDATE:
In order to proceed with a data lake storage with public access enabled from selected virtual netowrks and IP addresses to create a successful connection via linked service, you can use the following approach.
Assuming your data lake storage has public network access enabled from selected virtual netowrks and IP addresses, first create an integration runtime in your azure data factory.
In your data factory studio, navigate to Manage -> Integration Runtime -> New. Select Azure,self hosted as the type of integration runtime.
Select Azure in the next window and click continue. Enter the details for integration runtime
In the virtual network tab, enable the virtual network configuration and check the interactive authoring checkbox.
Now continue to create the Integration runtime. Once it is up and running, start creating the linked service for data lake storage.
In Connect via integration runtime, select the above created IR. In order to complete the creation, we also need to create a managed private endpoint (It will be prompted as shown in the image below).
Click Create new, with account selection method as From azure subscription, select the data lake storage you are creating the linked service to and click create.
Once you create this, a private endpoint request will be sent to your data lake storage account. Open the storage account, navigate to Networking -> Private endpoint connections. You can see a pending request. Approve this request.
Once this is approved, you can successfully create the linked service where your data lake storage allows access on selected virtual networks and IP addressess.
The error has occurred because of firewall and network access restriction. One way to overcome this error is by adding your client ip to the firewall and network setting of your storage account. Navigate to your data lake storage account, go to Networking -> Firewalls and virtual networks. Under firewall option click on "Add your client ip address"

How to use Azure Data Factory, Key Vaults and ADF Private Endpoints together

I've created new ADF instance on Azure with Managed Virtual Network integration enabled.
I planned to connect to Azure Key Vault to retrieve credentials for my pipeline’s source and sink systems using Key Vault Private Endpoint. I was able to successfully create it using Azure Data Factory Studio. I have also created Azure Key Vault linked service.
However, when I try to configure another Linked Services for source and destination systems the only option available for retrieving credentials from Key Vault is AVK Linked Service. I'm not able to select related Private Endpoint anywhere (please see below screen).
Do I miss something?
Are there any additional configuration steps required? Is the scenario I've described possible at all?
Any help will be appreciated!
UPDATE: Screen comparing 2 linked services (one with managed network and private endpoint selected and another one where I'm not able to set this options up):
Managed Virtual Network integration enabled, Make sure check which region you are using unfortunately ADF managed virtual network is not supported for East Asia.
I have tried in my environment even that option is not available
So, I have gathered some information even if you create a private endpoint for Key Vault, this column is always shown as blank .it validates URL format but doesn't do any network operation
As per official document if you want to use new link service, instead of key vault try to create other database services like azure sql, azure synapse service like as below
For your Reference:
Store credentials in Azure Key Vault - Azure Data Factory | Microsoft Docs
Azure Data Factory and Key Vault - Tech Talk Corner

Connect to ADLS with Spark API in Databricks

I am trying to establish a connection to an ADLS using a Spark API. I am really new to this. I read the documentation where it says that you can establish the connection with the following code:
spark.conf.set("fs.adl.oauth2.access.token.provider.type", "ClientCredential")
spark.conf.set("fs.adl.oauth2.client.id", "<application-id>")
spark.conf.set("fs.adl.oauth2.credential", dbutils.secrets.get(scope = "<scope-name>", key = "<key-name-for-service-credential>"))
spark.conf.set("fs.adl.oauth2.refresh.url", "https://login.microsoftonline.com/<directory-id>/oauth2/token")
I can see in the Azure Portal / Azure Storage Explorer that I have Read/Write/Execute permission on the ADLS folder that I need, but I don't know where to find application-id, scope-name, and key-name-for-service-credential.
There are two ways of accessing Azure Data Lake Storage Gen1:
Mount an Azure Data Lake Storage Gen1 filesystem to DBFS using a
service principal and OAuth 2.0.
Use a service principal directly.
Prerequisites:
You need to Create and grant permissions to service principal.
Create an Azure AD application and service principal that can access resources.
Note the following properties:
application-id: An ID that uniquely identifies the client application.
directory-id: An ID that uniquely identifies the Azure AD instance.
service-credential: A string that the application uses to prove its identity.
Register the service principal, granting the correct role assignment, such as Contributor, on the Azure Data Lake Storage Gen1 account.
Method1: Mount Azure Data Lake Storage Gen1 resource or folder
Method2: Access directly with Spark APIs using a service principal and OAuth 2.0
Method3: Access directly with Spark APIs using a service principal and OAuth 2.0 with dbutils.secrets.get(scope = "", key = "") retrieves your storage account access key that has been stored as a secret in a secret scope.
Reference: Databricks - Azure Data Lake Storage Gen1.

Edit sql file to secure credentials during deployment of project in azure devOps

I am using an open source tool for deployment of schema for my warehouse snowflake. I have successfully done it for tables, views and procedures. Currently I'm facing an issue, I have to deploy snowflake stages same way. But stages required url and azure saas token when you define it in your sql file like this:
CREATE or replace STAGE myStage
URL = 'azure://xxxxxxxxx.blob.core.windows.net/'
CREDENTIALS = ( AZURE_SAS_TOKEN = 'xxxxxxxxxxxxxxxxxxxx' )
file_format = myFileFormat;
As it is not encouraged to use your credentials in file that will be published on version control and access by others. Is there a way/task in azure devOps so I can just pass a template SQL file in repo and change it before compilation and execution(may be via azure key vault) and change back to template? So these credentials and token always remain secure.
Have you considered using a STORAGE INTEGRATION, instead? If you use the storage integration credentials and grant that to your Blob storage, then you'd be able to create STAGE objects without passing any credentials at all.
https://docs.snowflake.net/manuals/sql-reference/sql/create-storage-integration.html
For this issue ,you can use credential-less stages to secure your cloud storage without sharing secrets.
Here agree with Mike, storage integrations, a new object type, allow a Snowflake administrator to create a trust policy between Snowflake and the cloud provider. When Snowflake connects to the organization’s cloud storage, the cloud provider authenticates and authorizes access through this trust policy.
Storage integrations and credential-less external stages put into the administrator’s hands the power of connecting to storage in a secure and manageable way. This functionality is now generally available in Snowflake.
For details ,please refer to this document. In addition, you can also via azure key vault, key vault provides a secure place for accessing and storing secrets.

Having issue determining credentials used when connecting to SoftLayer ObjectStorage using SFTP

I'm having trouble connecting to the Bluemix Object Store using the instructions presented by this link: https://knowledgelayer.softlayer.com/procedure/connect-object-storage-using-sftp
It's unclear to me what the username and account ID are so I would appreciate it if someone can clarify
The instructions are valid
Where I can find the values for SLOS/IBMOS etc?
I do not have access to the Softlayer customer portal as this service as created in Bluemix.
I can confirm that an sftp server is listening at the appropriate region endpoint.
Brien, it is not possible to use SFTP to access the Bluemix Object Storage if you create it from the Services catalog area of the Bluemix UI:
https://console.ng.bluemix.net/catalog/services/object-storage
This one can be accessed via swift cli or REST API.
To use SFTP to access your Object Storage you need to create it from the Infrastructure are of the Bluemix UI - that is the legacy Softayer that is now integrated with Bluemix.
https://console.ng.bluemix.net/catalog/infrastructure/object_storage/
Also, to create the Object Storage from the Infrastructure catalog you need to first link your Bluemix and Softlayer accounts:
https://console.ng.bluemix.net/docs/admin/softlayerlink.html