Connect to ADLS with Spark API in Databricks

Connect to ADLS with Spark API in Databricks - pyspark

I am trying to establish a connection to an ADLS using a Spark API. I am really new to this. I read the documentation where it says that you can establish the connection with the following code:
spark.conf.set("fs.adl.oauth2.access.token.provider.type", "ClientCredential")
spark.conf.set("fs.adl.oauth2.client.id", "<application-id>")
spark.conf.set("fs.adl.oauth2.credential", dbutils.secrets.get(scope = "<scope-name>", key = "<key-name-for-service-credential>"))
spark.conf.set("fs.adl.oauth2.refresh.url", "https://login.microsoftonline.com/<directory-id>/oauth2/token")
I can see in the Azure Portal / Azure Storage Explorer that I have Read/Write/Execute permission on the ADLS folder that I need, but I don't know where to find application-id, scope-name, and key-name-for-service-credential.

There are two ways of accessing Azure Data Lake Storage Gen1:
Mount an Azure Data Lake Storage Gen1 filesystem to DBFS using a
service principal and OAuth 2.0.
Use a service principal directly.
Prerequisites:
You need to Create and grant permissions to service principal.
Create an Azure AD application and service principal that can access resources.
Note the following properties:
application-id: An ID that uniquely identifies the client application.
directory-id: An ID that uniquely identifies the Azure AD instance.
service-credential: A string that the application uses to prove its identity.
Register the service principal, granting the correct role assignment, such as Contributor, on the Azure Data Lake Storage Gen1 account.
Method1: Mount Azure Data Lake Storage Gen1 resource or folder
Method2: Access directly with Spark APIs using a service principal and OAuth 2.0
Method3: Access directly with Spark APIs using a service principal and OAuth 2.0 with dbutils.secrets.get(scope = "", key = "") retrieves your storage account access key that has been stored as a secret in a secret scope.
Reference: Databricks - Azure Data Lake Storage Gen1.

Related

How to use Azure Data Factory, Key Vaults and ADF Private Endpoints together

I've created new ADF instance on Azure with Managed Virtual Network integration enabled.
I planned to connect to Azure Key Vault to retrieve credentials for my pipeline’s source and sink systems using Key Vault Private Endpoint. I was able to successfully create it using Azure Data Factory Studio. I have also created Azure Key Vault linked service.
However, when I try to configure another Linked Services for source and destination systems the only option available for retrieving credentials from Key Vault is AVK Linked Service. I'm not able to select related Private Endpoint anywhere (please see below screen).
Do I miss something?
Are there any additional configuration steps required? Is the scenario I've described possible at all?
Any help will be appreciated!
UPDATE: Screen comparing 2 linked services (one with managed network and private endpoint selected and another one where I'm not able to set this options up):

Managed Virtual Network integration enabled, Make sure check which region you are using unfortunately ADF managed virtual network is not supported for East Asia.
I have tried in my environment even that option is not available
So, I have gathered some information even if you create a private endpoint for Key Vault, this column is always shown as blank .it validates URL format but doesn't do any network operation
As per official document if you want to use new link service, instead of key vault try to create other database services like azure sql, azure synapse service like as below
For your Reference:
Store credentials in Azure Key Vault - Azure Data Factory | Microsoft Docs
Azure Data Factory and Key Vault - Tech Talk Corner

Access specific folder in GCS bucket according to user, using Workload Identity Federation

I have an external identity provider that supports OpenID Connect (OIDC) and want to access Google Cloud Storage(GCS) directly, using a short-lived access token. So I'm using workload identity federation in order to provide a credential from my external identity provider and get a federated token in exchange.
I have created the workload identity pool and provider and connected a service account to it, which has write access to a certain bucket in GCS.
How can I differentiate the access to specific folder in the bucket according to the token provided from my external identity provider? For example for userA to have access only to folderA in the bucket. Can I do this using one service account?
Any help would be highly appreciated.

The folders don't exist on Cloud Storage, it's a blob storage, all the object are stored at the bucket level. For human readability and representation, the / are the folder separator, by convention.
Therefore, because directory doesn't exist, you can't grant any permission on it. The finer granularity is the bucket.
In your use case, you can't grant a write access at folder level, but you can create 1 bucket per user and therefore grant the impersonated service account on the bucket.

Data Factory - Azure File Linked Service - Managed Identity

When creating a new linked service for data factory I am able to select "Managed Identity" for connection to storage account\blob but this isn't an option for same storage account\file.
Is this a known limitation?
Works ok with blob:
No option for Managed Identity for file share:

Azure file storage connector in Azure data factory linked service currently supports only below authentication:
Account key authentication
Shared access signature authentication
Refer to this document for more information on linked service for Azure File storage connector.

Edit sql file to secure credentials during deployment of project in azure devOps

I am using an open source tool for deployment of schema for my warehouse snowflake. I have successfully done it for tables, views and procedures. Currently I'm facing an issue, I have to deploy snowflake stages same way. But stages required url and azure saas token when you define it in your sql file like this:
CREATE or replace STAGE myStage
URL = 'azure://xxxxxxxxx.blob.core.windows.net/'
CREDENTIALS = ( AZURE_SAS_TOKEN = 'xxxxxxxxxxxxxxxxxxxx' )
file_format = myFileFormat;
As it is not encouraged to use your credentials in file that will be published on version control and access by others. Is there a way/task in azure devOps so I can just pass a template SQL file in repo and change it before compilation and execution(may be via azure key vault) and change back to template? So these credentials and token always remain secure.

Have you considered using a STORAGE INTEGRATION, instead? If you use the storage integration credentials and grant that to your Blob storage, then you'd be able to create STAGE objects without passing any credentials at all.
https://docs.snowflake.net/manuals/sql-reference/sql/create-storage-integration.html

For this issue ,you can use credential-less stages to secure your cloud storage without sharing secrets.
Here agree with Mike, storage integrations, a new object type, allow a Snowflake administrator to create a trust policy between Snowflake and the cloud provider. When Snowflake connects to the organization’s cloud storage, the cloud provider authenticates and authorizes access through this trust policy.
Storage integrations and credential-less external stages put into the administrator’s hands the power of connecting to storage in a secure and manageable way. This functionality is now generally available in Snowflake.
For details ,please refer to this document. In addition, you can also via azure key vault, key vault provides a secure place for accessing and storing secrets.

Azure HTTPS POST and GET

I am a new user of the Azure platform, and am having trouble understanding how differents parts are conected. I have data in a Storage blob that I would like to use to make HTTPS POST requests to a web service. My question therfore is as follows: How can I send data from my Azure storage blob to a REST API endpoint?

First, let's start with a little background:
Azure Resource Manager (ARM)
ARM is the REST API that you interface with, using the Azure Portal, PowerShell module, or cross-platform (xPlat) CLI tool, in order to provision and manage cloud resources inside your Azure subscription (account). In order to provision resources, you must first create a Resource Group, essentially a management container for various cloud resource instances.
Azure Storage (Blob)
Microsoft Azure Storage offers several different services:
Blob (unstructured, flat data storage)
Files (cloud-based SMB share for Azure VMs)
Queue (FIFO / LIFO queues, similar to Azure Service Bus)
Table (NOSQL partitioned storage)
Of these types of storage, Blob storage is arguably the most common. In order to utilize any of these storage services, you must first provision a Storage Account inside an ARM Resource Group (see above). To specifically utilize blob storage, you create a Blob Container inside your Storage Account, and then create or upload blobs into this container(s). Once data is stored in an Azure Blob Container, it does not move unless a service explicitly requests the data.
Azure App Service
If you're deploying a Web App (with a front end) or a REST API App (no front end), you'll be using Microsoft Azure's App Service offering. One unique feature of Azure App Service's Web App (I know, it's a mouthful) offering is WebJobs. WebJobs essentially allow you to run arbitrary code in the cloud, kind of like a background worker process. You can trigger WebJobs when blobs are created or uploaded, using this document.
Essentially, you use the [BlobTrigger()] .NET attribute, from the Azure WebJobs SDK, to designate code that will be executed inside Azure WebJobs whenever a new blob is created. The code that executes could grab the blob data, and send it off to your REST API endpoint.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Connect to ADLS with Spark API in Databricks - pyspark

Related

How to use Azure Data Factory, Key Vaults and ADF Private Endpoints together

Access specific folder in GCS bucket according to user, using Workload Identity Federation

Data Factory - Azure File Linked Service - Managed Identity

Edit sql file to secure credentials during deployment of project in azure devOps

Azure HTTPS POST and GET

Categories

Resources