To Mount onedrive for business in databricks - pyspark

I am trying to mount a folder in one drive business in databricks community edition. I am unable to use onedrivesdk because it is deprecated.
I created a app registration, assigned read and write permissions to that and using the client id and secret.i tried to mount using api requests but it was not giving the access token.
First of all i want to know , whether it is possible to mount one drive to databricks community edition. if yes, what are the ways..?
Below is the code i used to mount one drive using api requests.
# Import the necessary libraries
import requests
# Set up the client
client_id = ""
client_secret = ""
tenant_id = ""
redirect_uri = "http://localhost:8080/"
# Get the access token
response = requests.post(
"https://login.microsoftonline.com/{}/oauth2/token".format(tenant_id),
data={
"client_id": client_id,
"client_secret": client_secret,
"redirect_uri": redirect_uri,
"grant_type": "client_credentials",
"resource": "https://graph.microsoft.com"
}
)
access_token = response.json()["access_token"]
# Mount the OneDrive folder to DBFS
folder_id = ""
mount_point = "/mnt/onedrive"
dbutils.fs.mount(
source="graph",
mount_point=mount_point,
extra_configs={
"graph.access_token": access_token,
"graph.folder_id": folder_id
}
)

No, you can't use dbutils.fs.mount with OneDrive - DBFS mounts are supported only for specific cloud storage implementations (S3, ADLS Gen2, Azure Blob Storage, ... - see docs).
If you need to access data in OneDrive, you need to use REST API directly (or find another Python library)

Related

In Azure DevOps Services where can one configure "Administer Permissions" access of a user to some Environment?

When I am trying to change the access of Project Administrators from Reader to Administrator I get
Access Denied: e...4 needs the following permission(s) on the resource Environments/f...6/7 to perform this action: Administer Permissions
Where e...4 corresponds to the id of my user in Azure DevOps Services.
(I am a Project Administrator)
Who can grant these Administer Permissions rights? Is it some kind of Organization Administrator? Where its screen? Documentation? Rest API?
I could not find anything.
If the UI interface is improperly operated, even PCA (Project Collection Administrators) or Organization Owner will also be trapped in the UI interface here (the PCA/ORG Owner may also lose all operating permissions on Env in the UI interface of Env's permission settings.). But REST API can get rid of this situation.
There is a REST API document about this, but there doesn't have a specific document for this.
Roleassignments - Set Role Assignments
You can use f12 to capture the detailed request(UI also based on the above REST API).
I write a python demo to achieve your requirement.
import requests
import json
url = "https://dev.azure.com/<Organization Name>/_apis/securityroles/scopes/distributedtask.environmentreferencerole/roleassignments/resources/<Project ID>_<Env ID>?api-version=5.0-preview.1"
payload = json.dumps([
{
"userId": "<User ID>",
"roleName": "Administrator"
}
])
headers = {
'Authorization': 'Basic <Your Personal Access Token>',
'Content-Type': 'application/json'
}
response = requests.request("PUT", url, headers=headers, data=payload)
print(response.text)
Use List Projects REST API to get the project id.
Use List Users REST API to get the user ID.
Get the Env ID from the Env page url.

How to use Workload identity to access ESP in the Google Kubernetes Engine with the Google Cloud .NET SDK?

Background
On the Google Kubernetes Engine we've been using Cloud Endpoints, and the Extensible Service Proxy (v2) for service-to-service authentication.
The services authenticate themselves by including the bearer JWT token in the Authorization header of the HTTP requests.
The identity of the services has been maintained with GCP Service Accounts, and during deployment, the Json Service Account key is mounted to the container at a predefined location, and that location is set as the value of the GOOGLE_APPLICATION_CREDENTIALS env var.
The services are implemented in C# with ASP.NET Core, and to generate the actual JWT token, we use the Google Cloud SDK (https://github.com/googleapis/google-cloud-dotnet, and https://github.com/googleapis/google-api-dotnet-client), where we call the following method:
var credentials = GoogleCredential.GetApplicationDefault();
If the GOOGLE_APPLICATION_CREDENTIALS is correctly set to the path of the Service Account key, then this returns a ServiceAccountCredential object, on which we can call the GetAccessTokenForRequestAsync() method, which returns the actual JWT token.
var jwtToken = await credentials.GetAccessTokenForRequestAsync("https://other-service.example.com/");
var authHeader = $"Bearer {jwtToken}";
This process has been working correctly without any issues.
The situation is that we are in the process of migrating from using the manually maintained Service Account keys to using Workload Identity instead, and I cannot figure out how to correctly use the Google Cloud SDK to generate the necessary JWT tokens in this case.
The problem
When we enable Workload Identity in the container, and don't mount the Service Account key file, nor set the GOOGLE_APPLICATION_CREDENTIALS env var, then the GoogleCredential.GetApplicationDefault() call returns a ComputeCredential instead of a ServiceAccountCredential.
And if we call the GetAccessTokenForRequestAsync() method, that returns a token which is not in the JWT format.
I checked the implementation, and the token seems to be retrieved from the Metadata server, of which the expected response format seems to be the standard OAuth 2.0 model (represented in this model class):
{
"access_token": "foo",
"id_token": "bar",
"token_type": "Bearer",
...
}
And the GetAccessTokenForRequestAsync() method returns the value of access_token. But as far as I understand, that's not a JWT token, and indeed when I tried using it to authenticate against ESP, it responded with
{
"code": 16,
"message": "JWT validation failed: Bad JWT format: Invalid JSON in header",
..
}
As far as I understand, normally the id_token contains the JWT token, which should be accessible via the IdToken property of the TokenResponse object, which is also accessible via the SDK, I tried accessing it like this:
var jwtToken = ((ComputeCredential)creds.UnderlyingCredential).Token.IdToken;
But this returns null, so apparently the metadata server does not return anything in the id_token field.
Question
What would be the correct way to get the JWT token with the .NET Google Cloud SDK for accessing ESP, when using Workload Identity in GKE?
To get an IdToken for the attached service account, you can use GoogleCredential.GetApplicationDefault().GetOidcTokenAsync(...).

Direct upload to Goolge Cloud Storage bucket from website

My company's website is managed and hosted by a third party.
We'd like to provide a portal on the website that allows our clients to upload files directly to a Google Cloud Storage bucket without the file going through the website (these uploads can span thousands of files and several GB).
I've found a good guide for how to do it on AWS (https://softwareontheroad.com/aws-s3-secure-direct-upload/) but can't even determine if the equivalent functionality exists for Google, let alone how to do it.
Has anyone done this before?
Please consider providing us some more technical details on what you want to achieve. Things like programming languages, server platform, cloud provider where the website is hosted…
In a generalistic way, I can tell you that Google Cloud Storage has a similar approach to upload files which is Signed URLs
For example, if you are coding in Python with this you can upload a file to a bucket using a signed URL:
import datetime
from google.cloud import storage
def generate_upload_signed_url_v4(bucket_name, blob_name):
"""Generates a v4 signed URL for uploading a blob using HTTP PUT.
# bucket_name = 'your-bucket-name'
# blob_name = 'your-object-name'
storage_client = storage.Client()
bucket = storage_client.bucket(bucket_name)
blob = bucket.blob(blob_name)
url = blob.generate_signed_url(
version="v4",
# This URL is valid for 15 minutes
expiration=datetime.timedelta(minutes=15),
# Allow PUT requests using this URL.
method="PUT",
content_type="application/octet-stream",
)
print("Generated PUT signed URL:")
print(url)
print("You can use this URL with any user agent, for example:")
print(
"curl -X PUT -H 'Content-Type: application/octet-stream' "
"--upload-file my-file '{}'".format(url)
)
return url
Then you can implement this as needed by your organization

Signed URL created by Google Cloud Storage Python library blob.generate_signed_url method gets "Access Denied" error

I am trying to create a signed URL for a private object stored in cloud storage.
The storage client is being created using a service account that has the Storage Admin role:
storage_client = storage.Client.from_service_account_json('service.json')
bucket = storage_client.bucket(bucket_name)
blob = bucket.blob(blob_name)
url = blob.generate_signed_url(
version="v4",
# This URL is valid for 15 minutes
expiration=datetime.timedelta(minutes=15),
# Allow GET requests using this URL.
method="GET"
)
This generates a URL that when accessed via a browser gives this error:
<Error>
<Code>AccessDenied</Code>
<Message>Access denied.</Message>
<Details>Anonymous caller does not have storage.objects.get access to the Google Cloud Storage object.</Details>
</Error>
What am I missing here? The service account has no problem interacting with the bucket or blob normally - I can download it/etc. It's just the Signed URL that doesn't work. I can make the object public and then download it - but that defeats the purpose of being able to generate a signed URL.
All of the other answers I've found seem to focus on issues using application default credentials or are very old examples from the v2 API.
Clearly there's something about how I'm using the service account - do I need to explicitly give it permissions on that particular object? Is the Storage Admin role not enough in this context?
Going crazy with this. Please help!

Can't connect to GCS bucket from Python despite being logged in

I have a GCS bucket set up that contains data that I want to access remotely. As per the instructions, I have logged in via gcloud auth login, and have confirmed that I have an active, credentialed account via gcloud auth list. However, when I try to access my bucket (using the Python google.cloud.storage API), I get the following:
HttpError: Anonymous caller does not have storage.objects.list access to <my-bucket-name>.
I'm not sure why it is being accessed anonymously, since I am clearly logged in. Is there something obvious I am missing?
The Python GCP library (and others) uses another authentication mechanism than the gcloud command.
Follow this guide to set up your environment and have access to GCS with Python.
gcloud aut login sets up the gcloud command tool with your credentials.
However, the way forward when executing code, is to have a Service Account. Then, when the env. variable GOOGLE_APPLICATION_CREDENTIALS has been set. Python will use the Service Account credentials
Edit
import os
os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = "path_to_your_.json_credential_file"
Edit
And then, to download gs://my_bucket/my_file.csv to a file: (from the python-docs-samples)
download_blob('my_bucket', 'my_file.csv', 'local/path/to/file.csv')
def download_blob(bucket_name, source_blob_name, destination_file_name):
"""Downloads a blob from the bucket."""
storage_client = storage.Client()
bucket = storage_client.get_bucket(bucket_name)
blob = bucket.blob(source_blob_name)
blob.download_to_filename(destination_file_name)
print('Blob {} downloaded to {}.'.format(
source_blob_name,
destination_file_name))