Create file in Google Cloud Storage with python - google-cloud-storage

This is the method that i used to save a new file in Google Cloud Storage
cloud_storage_path = "/gs/[my_app_name].appspot.com/%s/%s" % (user_key.id(), img_title)
blobstore_key = blobstore.create_gs_key(cloud_storage_path)
cloud_storage_file = cloudstorage_api.open(
filename=cloud_storage_path, mode="w", content_type=img_type
)
cloud_storage_file.write(img_content)
cloud_storage_file.close()
But when execute this method. The log file printed :
Path should have format /bucket/filename but got /gs/[my_app_name].appspot.com/6473924464345088/background.jpg
PS: i changed [my_app_name] and, [my_app_name].appspot.com is my bucket name
So, what will I do next in this case ?
I can not save the file to that path

Related

MSSQLToGCSOperator how to create empty file into bucket

we upgrade composer (1.19.13) and aiflow (2.3.3) version at GCP cloud.
in our previous version if the query does not return any data, there is a file empty created by the MSSQLToGCSOperator. But now when query return no data, there is no file created (empty file) into GCP bucket, and I got an error of file not found, at the following task.
I try this code:
mssql_to_gcs = MSSQLToGCSOperator(
task_id='MYSQL_TO_GCS_{0}'.format(TABLE_NAME),
mssql_conn_id='con_mssql_dba_prd',
gcp_conn_id='google_cloud_storage_default',
sql='select_{0}.sql'.format(TABLE_NAME),
bucket=SOURCE_BUCKET,
filename='composer/{0}/gcp_{0}{1}.json'.format(TABLE_NAME, DATE_FORMAT),
dag=dag
)
using this operator:
from airflow.providers.google.cloud.transfers.mssql_to_gcs import MSSQLToGCSOperator

Google Storage Python ACL Update not Working

I have uploaded one image file to my google storage bucket.
#Block 1
#Storing the local file inside the bucket
blob_response = bucket.blob(cloud_path)
blob_response.upload_from_filename(local_path, content_type='image/png')
File gets uploaded fine. I verify the file in bucket.
After uploading the file, in the same method, I am trying to update the acl for file to be publicly accessible as:
#Block 2
blob_file = storage.Blob(bucket=bucket20, name=path_in_bucket)
acl = blob_file.acl
acl.all().grant_read()
acl.save()
This does not make the file public.
Strange thing is that,after I run the above upload method, if I just call the #Block 2 code. separately in jupyter notebook; It is working fine and file become publicly available.
I have tried:
Checked existence of blob file in bucket after upload code.
Introducing 5 seconds delay after upload.
Any help is appreciated.
If you are changing the file uploaded from upload_from_filename() to public, you can reuse the blob from your upload. Also, add a reloading of acl prior to changing the permission. This was all done in 1 block in Jupyter Notebook using GCP AI Platform.
# Block 1
bucket_name = "your-bucket"
destination_blob_name = "test.txt"
source_file_name = "/home/jupyter/test.txt"
storage_client = storage.Client()
bucket = storage_client.bucket(bucket_name)
blob = bucket.blob(destination_blob_name)
blob.upload_from_filename(source_file_name)
print(blob) #prints the bucket, file uploded
blob.acl.reload() # reload the ACL of the blob
acl = blob.acl
acl.all().grant_read()
acl.save()
for entry in acl:
print("{}: {}".format(entry["role"], entry["entity"]))
Output:

Reading file content from Cloud Storage

I have been trying to read contents of a json file from Google Cloud Storage and encounter an error. Here is my code
from google.cloud import storage
import jsonclient = storage.Client()
bucket = client.get_bucket('bucket_name')
blob = bucket.get_blob('file.json')
u = blob.download_as_string()
print(u)
I see following error
TypeError: request() got an unexpected keyword argument 'data'
pretty much lost. Help is appreciated
You don't have to import the Client(), you just have to declare it like
client = storage.Client().
Use the code below to load the JSON file from Google Cloud Storage bucket. I have tested it myself and it is working.
from google.cloud import storage
client = storage.Client()
BUCKET_NAME = '[BUCKET_NAME]'
FILE_PATH = '[path/to/file.json]'
bucket = client.get_bucket(BUCKET_NAME)
blob = bucket.get_blob(FILE_PATH)
print('The JSON file is: ')
print(blob.download_as_string())
Replace the [BUCKET_NAME] with your bucket's name and the [path/to/file.json] to the path where your JSON file is located inside the bucket.

Deleting all blobs inside a path prefix using google cloud storage API

I am using google cloud storage python API. I came across a situation where I need to delete a folder that might have hundred of files using API. Is there an efficient way to do it without making recursive and multiple delete call?
One solution that I have is to list all blob objects in the bucket with given path prefix and delete them one by one.
The other solution is to use gsutil:
$ gsutil rm -R gs://bucket/path
Try something like this:
bucket = storage.Client().bucket(bucket_name)
blobs = bucket.list_blobs()
while True:
blob = blobs.next()
if not blob: break
if blob.name.startswith('/path'): blob.delete()
And if you want to delete the contents of a bucket instead of a folder within a bucket you can do it in a single method call as such:
bucket = storage.Client().bucket(bucket_name)
bucket.delete_blobs(bucket.list_blobs())
from google.cloud import storage
def deleteStorageFolder(bucketName, folder):
"""
This function deletes from GCP Storage
:param bucketName: The bucket name in which the file is to be placed
:param folder: Folder name to be deleted
:return: returns nothing
"""
cloudStorageClient = storage.Client()
bucket = cloudStorageClient.bucket(bucketName)
try:
bucket.delete_blobs(blobs=bucket.list_blobs(prefix=folder))
except Exception as e:
print str(e.message)
In this case folder = "path"

mount S3 to databricks

I'm trying understand how mount works. I have a S3 bucket named myB, and a folder in it called test. I did a mount using
var AwsBucketName = "myB"
val MountName = "myB"
My question is that: does it create a link between S3 myB and databricks, and would databricks access all the files include the files under test folder? (or if I do a mount using var AwsBucketName = "myB/test"does it only link databricks to that foldertestbut not anyother files that outside of that folder?)
If so, how do I say list files in test folder, read that file or or count() a csv file in scala? I did a display(dbutils.fs.ls("/mnt/myB")) and it only shows the test folder but not files in it. Quite new here. Many thanks for your help!
From the Databricks documentation:
// Replace with your values
val AccessKey = "YOUR_ACCESS_KEY"
// Encode the Secret Key as that can contain "/"
val SecretKey = "YOUR_SECRET_KEY".replace("/", "%2F")
val AwsBucketName = "MY_BUCKET"
val MountName = "MOUNT_NAME"
dbutils.fs.mount(s"s3a://$AccessKey:$SecretKey#$AwsBucketName", s"/mnt/$MountName")
display(dbutils.fs.ls(s"/mnt/$MountName"))
If you are unable to see files in your mounted directory it is possible that you have created a directory under /mnt that is not a link to the s3 bucket. If that is the case try deleting the directory (dbfs.fs.rm) and remounting using the above code sample. Note that you will need your AWS credentials (AccessKey and SecretKey above). If you don't know them you will need to ask your AWS account admin for them.
It only lists the folders and files directly under bucket.
In S3
<bucket-name>/<Files & Folders>
In Databricks
/mnt/<MOUNT-NAME>/<Bucket-Data-List>
Just like below (Output for dbutils.fs.ls(s"/mnt/$MountName"))
dbfs:/mnt/<MOUNT-NAME>/Folder/
dbfs:/mnt/<MOUNT-NAME>/file1.csv
dbfs:/mnt/<MOUNT-NAME>/file2.csv