Reading file content from Cloud Storage - google-cloud-storage

I have been trying to read contents of a json file from Google Cloud Storage and encounter an error. Here is my code
from google.cloud import storage
import jsonclient = storage.Client()
bucket = client.get_bucket('bucket_name')
blob = bucket.get_blob('file.json')
u = blob.download_as_string()
print(u)
I see following error
TypeError: request() got an unexpected keyword argument 'data'
pretty much lost. Help is appreciated

You don't have to import the Client(), you just have to declare it like
client = storage.Client().
Use the code below to load the JSON file from Google Cloud Storage bucket. I have tested it myself and it is working.
from google.cloud import storage
client = storage.Client()
BUCKET_NAME = '[BUCKET_NAME]'
FILE_PATH = '[path/to/file.json]'
bucket = client.get_bucket(BUCKET_NAME)
blob = bucket.get_blob(FILE_PATH)
print('The JSON file is: ')
print(blob.download_as_string())
Replace the [BUCKET_NAME] with your bucket's name and the [path/to/file.json] to the path where your JSON file is located inside the bucket.

Related

Why is "gs" is not defined in a python script accessing google cloud storage?

The purpose of the script is to import products to google vision API product search by using a CSV file containing information about the products. In this example the cloud storage address is the default one given in the documentation(https://cloud.google.com/vision/product-search/docs/create-product-set-search-products#using_a_dataset)(https://cloud.google.com/vision/product-search/docs/create-product-set#using_bulk_import_to_create_a_product_set_with_products). When I try to set the cvs_file_uri variable, vscode states that "gs" is not defined, despite google cloud storage being imported.
from google.cloud import vision
from google.protobuf import field_mask_pb2 as field_mask
from google.cloud import storage
#"""Import images of different products in the product set.
#Args:
# project_id: Id of the project.
# location: A compute region name.
# gcs_uri: Google Cloud Storage URI.
# Target files must be in Product Search CSV format.
#"""
client = vision.ProductSearchClient()
# A resource that represents Google Cloud Platform location.
location_path = f"projects/[project id]/locations/europe-west1"
# Set the input configuration along with Google Cloud Storage URI
############error is on the 2 lines below
gcs_source = vision.ImportProductSetsGcsSource(
csv_file_uri=gs://cloud-samples-data/vision/product_search/product_catalog.csv)
########################
input_config = vision.ImportProductSetsInputConfig(
gcs_source=gcs_source)
# Import the product sets from the input URI.
response = client.import_product_sets(
parent=location_path, input_config=input_config)
print('Processing operation name: {}'.format(response.operation.name))
# synchronous check of operation status
result = response.result()
print('Processing done.')
for i, status in enumerate(result.statuses):
print('Status of processing line {} of the csv: {}'.format(
i, status))
# Check the status of reference image
# `0` is the code for OK in google.rpc.Code.
if status.code == 0:
reference_image = result.reference_images[i]
print(reference_image)
else:
print('Status code not OK: {}'.format(status.message))

moving local data to google cloud bucket using python api

I can move data in google storage to buckets using the following:
gsutil cp afile.txt gs://my-bucket
How to do the same using the python api library:
from google.cloud import storage
storage_client = storage.Client()
# Make an authenticated API request
buckets = list(storage_client.list_buckets())
print(buckets)
Cant find anything more than the above.
There is an API Client Library code sample code here. My code typically looks like below which is a slight variant on the code they provide:
from google.cloud import storage
client = storage.Client(project='<myprojectname>')
mybucket = storage.bucket.Bucket(client=client, name='mybucket')
mydatapath = 'C:\whatever\something' + '\\' #etc
blob = mybucket.blob('afile.txt')
blob.upload_from_filename(mydatapath + 'afile.txt')
In case it is of interest, another method is to run the "gsutil" command line how you have typed in your Original Post using the subprocess command, e.g.:
import subprocess
subprocess.call("gsutil cp afile.txt gs://mybucket/", shell=True)
In my view, there are pros and cons of both methods depending on what you are trying to achieve - the latter method allows multi-threading if you have many files to upload whereas the former method perhaps allows better control, specification of metadata for each file, etc.

Deleting all blobs inside a path prefix using google cloud storage API

I am using google cloud storage python API. I came across a situation where I need to delete a folder that might have hundred of files using API. Is there an efficient way to do it without making recursive and multiple delete call?
One solution that I have is to list all blob objects in the bucket with given path prefix and delete them one by one.
The other solution is to use gsutil:
$ gsutil rm -R gs://bucket/path
Try something like this:
bucket = storage.Client().bucket(bucket_name)
blobs = bucket.list_blobs()
while True:
blob = blobs.next()
if not blob: break
if blob.name.startswith('/path'): blob.delete()
And if you want to delete the contents of a bucket instead of a folder within a bucket you can do it in a single method call as such:
bucket = storage.Client().bucket(bucket_name)
bucket.delete_blobs(bucket.list_blobs())
from google.cloud import storage
def deleteStorageFolder(bucketName, folder):
"""
This function deletes from GCP Storage
:param bucketName: The bucket name in which the file is to be placed
:param folder: Folder name to be deleted
:return: returns nothing
"""
cloudStorageClient = storage.Client()
bucket = cloudStorageClient.bucket(bucketName)
try:
bucket.delete_blobs(blobs=bucket.list_blobs(prefix=folder))
except Exception as e:
print str(e.message)
In this case folder = "path"

mount S3 to databricks

I'm trying understand how mount works. I have a S3 bucket named myB, and a folder in it called test. I did a mount using
var AwsBucketName = "myB"
val MountName = "myB"
My question is that: does it create a link between S3 myB and databricks, and would databricks access all the files include the files under test folder? (or if I do a mount using var AwsBucketName = "myB/test"does it only link databricks to that foldertestbut not anyother files that outside of that folder?)
If so, how do I say list files in test folder, read that file or or count() a csv file in scala? I did a display(dbutils.fs.ls("/mnt/myB")) and it only shows the test folder but not files in it. Quite new here. Many thanks for your help!
From the Databricks documentation:
// Replace with your values
val AccessKey = "YOUR_ACCESS_KEY"
// Encode the Secret Key as that can contain "/"
val SecretKey = "YOUR_SECRET_KEY".replace("/", "%2F")
val AwsBucketName = "MY_BUCKET"
val MountName = "MOUNT_NAME"
dbutils.fs.mount(s"s3a://$AccessKey:$SecretKey#$AwsBucketName", s"/mnt/$MountName")
display(dbutils.fs.ls(s"/mnt/$MountName"))
If you are unable to see files in your mounted directory it is possible that you have created a directory under /mnt that is not a link to the s3 bucket. If that is the case try deleting the directory (dbfs.fs.rm) and remounting using the above code sample. Note that you will need your AWS credentials (AccessKey and SecretKey above). If you don't know them you will need to ask your AWS account admin for them.
It only lists the folders and files directly under bucket.
In S3
<bucket-name>/<Files & Folders>
In Databricks
/mnt/<MOUNT-NAME>/<Bucket-Data-List>
Just like below (Output for dbutils.fs.ls(s"/mnt/$MountName"))
dbfs:/mnt/<MOUNT-NAME>/Folder/
dbfs:/mnt/<MOUNT-NAME>/file1.csv
dbfs:/mnt/<MOUNT-NAME>/file2.csv

Create file in Google Cloud Storage with python

This is the method that i used to save a new file in Google Cloud Storage
cloud_storage_path = "/gs/[my_app_name].appspot.com/%s/%s" % (user_key.id(), img_title)
blobstore_key = blobstore.create_gs_key(cloud_storage_path)
cloud_storage_file = cloudstorage_api.open(
filename=cloud_storage_path, mode="w", content_type=img_type
)
cloud_storage_file.write(img_content)
cloud_storage_file.close()
But when execute this method. The log file printed :
Path should have format /bucket/filename but got /gs/[my_app_name].appspot.com/6473924464345088/background.jpg
PS: i changed [my_app_name] and, [my_app_name].appspot.com is my bucket name
So, what will I do next in this case ?
I can not save the file to that path