I have uploaded one image file to my google storage bucket.
#Block 1
#Storing the local file inside the bucket
blob_response = bucket.blob(cloud_path)
blob_response.upload_from_filename(local_path, content_type='image/png')
File gets uploaded fine. I verify the file in bucket.
After uploading the file, in the same method, I am trying to update the acl for file to be publicly accessible as:
#Block 2
blob_file = storage.Blob(bucket=bucket20, name=path_in_bucket)
acl = blob_file.acl
acl.all().grant_read()
acl.save()
This does not make the file public.
Strange thing is that,after I run the above upload method, if I just call the #Block 2 code. separately in jupyter notebook; It is working fine and file become publicly available.
I have tried:
Checked existence of blob file in bucket after upload code.
Introducing 5 seconds delay after upload.
Any help is appreciated.
If you are changing the file uploaded from upload_from_filename() to public, you can reuse the blob from your upload. Also, add a reloading of acl prior to changing the permission. This was all done in 1 block in Jupyter Notebook using GCP AI Platform.
# Block 1
bucket_name = "your-bucket"
destination_blob_name = "test.txt"
source_file_name = "/home/jupyter/test.txt"
storage_client = storage.Client()
bucket = storage_client.bucket(bucket_name)
blob = bucket.blob(destination_blob_name)
blob.upload_from_filename(source_file_name)
print(blob) #prints the bucket, file uploded
blob.acl.reload() # reload the ACL of the blob
acl = blob.acl
acl.all().grant_read()
acl.save()
for entry in acl:
print("{}: {}".format(entry["role"], entry["entity"]))
Output:
Related
I have uploaded thousands of files to google storage, and i found out all the files miss content-type,so that my website cannot get it right.
i wonder if i can set some kind of policy like changing all the files content-type at the same time, for example, i have bunch of .html files inside the bucket
a/b/index.html
a/c/a.html
a/c/a/b.html
a/a.html
.
.
.
is that possible to set the content-type of all the .html files with one command in the different place?
You could do:
gsutil -m setmeta -h Content-Type:text/html gs://your-bucket/**.html
There's no a unique command to achieve the behavior you are looking for (one command to edit all the object's metadata) however, there's a command from gcloud to edit the metadata which you could use on a bash script to make a loop through all the objects inside the bucket.
1.- Option (1) is to use a the gcloud command "setmeta" on a bash script:
# kinda pseudo code here.
# get the list with all your object's names and iterate over the metadata edition command.
for OUTPUT in $(get_list_of_objects_names)
do
gsutil setmeta -h "[METADATA_KEY]:[METADATA_VALUE]" gs://[BUCKET_NAME]/[OBJECT_NAME]
# the "gs://[BUCKET_NAME]/[OBJECT_NAME]" would be your object name.
done
2.- You could also create a C++ script to achieve the same thing:
namespace gcs = google::cloud::storage;
using ::google::cloud::StatusOr;
[](gcs::Client client, std::string bucket_name, std::string object_name,
std::string key, std::string value) {
# you would need to find list all the objects, while on the loop, you can edit the metadata of the object.
for (auto&& object_metadata : client.ListObjects(bucket_name)) {
string bucket_name=object_metadata->bucket(), object_name=object_metadata->name();
StatusOr<gcs::ObjectMetadata> object_metadata =
client.GetObjectMetadata(bucket_name, object_name);
gcs::ObjectMetadata desired = *object_metadata;
desired.mutable_metadata().emplace(key, value);
StatusOr<gcs::ObjectMetadata> updated =
client.UpdateObject(bucket_name, object_name, desired,
gcs::Generation(object_metadata->generation()))
}
}
I am using google cloud storage python API. I came across a situation where I need to delete a folder that might have hundred of files using API. Is there an efficient way to do it without making recursive and multiple delete call?
One solution that I have is to list all blob objects in the bucket with given path prefix and delete them one by one.
The other solution is to use gsutil:
$ gsutil rm -R gs://bucket/path
Try something like this:
bucket = storage.Client().bucket(bucket_name)
blobs = bucket.list_blobs()
while True:
blob = blobs.next()
if not blob: break
if blob.name.startswith('/path'): blob.delete()
And if you want to delete the contents of a bucket instead of a folder within a bucket you can do it in a single method call as such:
bucket = storage.Client().bucket(bucket_name)
bucket.delete_blobs(bucket.list_blobs())
from google.cloud import storage
def deleteStorageFolder(bucketName, folder):
"""
This function deletes from GCP Storage
:param bucketName: The bucket name in which the file is to be placed
:param folder: Folder name to be deleted
:return: returns nothing
"""
cloudStorageClient = storage.Client()
bucket = cloudStorageClient.bucket(bucketName)
try:
bucket.delete_blobs(blobs=bucket.list_blobs(prefix=folder))
except Exception as e:
print str(e.message)
In this case folder = "path"
I'm trying understand how mount works. I have a S3 bucket named myB, and a folder in it called test. I did a mount using
var AwsBucketName = "myB"
val MountName = "myB"
My question is that: does it create a link between S3 myB and databricks, and would databricks access all the files include the files under test folder? (or if I do a mount using var AwsBucketName = "myB/test"does it only link databricks to that foldertestbut not anyother files that outside of that folder?)
If so, how do I say list files in test folder, read that file or or count() a csv file in scala? I did a display(dbutils.fs.ls("/mnt/myB")) and it only shows the test folder but not files in it. Quite new here. Many thanks for your help!
From the Databricks documentation:
// Replace with your values
val AccessKey = "YOUR_ACCESS_KEY"
// Encode the Secret Key as that can contain "/"
val SecretKey = "YOUR_SECRET_KEY".replace("/", "%2F")
val AwsBucketName = "MY_BUCKET"
val MountName = "MOUNT_NAME"
dbutils.fs.mount(s"s3a://$AccessKey:$SecretKey#$AwsBucketName", s"/mnt/$MountName")
display(dbutils.fs.ls(s"/mnt/$MountName"))
If you are unable to see files in your mounted directory it is possible that you have created a directory under /mnt that is not a link to the s3 bucket. If that is the case try deleting the directory (dbfs.fs.rm) and remounting using the above code sample. Note that you will need your AWS credentials (AccessKey and SecretKey above). If you don't know them you will need to ask your AWS account admin for them.
It only lists the folders and files directly under bucket.
In S3
<bucket-name>/<Files & Folders>
In Databricks
/mnt/<MOUNT-NAME>/<Bucket-Data-List>
Just like below (Output for dbutils.fs.ls(s"/mnt/$MountName"))
dbfs:/mnt/<MOUNT-NAME>/Folder/
dbfs:/mnt/<MOUNT-NAME>/file1.csv
dbfs:/mnt/<MOUNT-NAME>/file2.csv
This is the method that i used to save a new file in Google Cloud Storage
cloud_storage_path = "/gs/[my_app_name].appspot.com/%s/%s" % (user_key.id(), img_title)
blobstore_key = blobstore.create_gs_key(cloud_storage_path)
cloud_storage_file = cloudstorage_api.open(
filename=cloud_storage_path, mode="w", content_type=img_type
)
cloud_storage_file.write(img_content)
cloud_storage_file.close()
But when execute this method. The log file printed :
Path should have format /bucket/filename but got /gs/[my_app_name].appspot.com/6473924464345088/background.jpg
PS: i changed [my_app_name] and, [my_app_name].appspot.com is my bucket name
So, what will I do next in this case ?
I can not save the file to that path
Can I upload files into a specific folder in an S3 bucket rather than just uploading into the base folder of the bucket?
Yes, use the "path" parameter on the filepicker.store call.
filepicker.store(fpfile, {location:'S3', path:'myfolder/file.png'},
function(stored_fpfile){
console.log(stored_fpfile);
});
Documentation at https://developers.filepicker.io/docs/web/#store