I am using Google App Engine PHP SDK.
Google cloud storage allows users to check a "publicly shared?" field in the storage manager that allows you to share a URL to the data directly.
I'm using google app engine and sending data to the storage, but I would like to have it publicly shared by default.
this is code which I am using to upload files
require_once 'google/appengine/api/cloud_storage/CloudStorageTools.php';
use google\appengine\api\cloud_storage\CloudStorageTools;
$options = [ 'gs_bucket_name' => 'my_bucket' ];
$upload_url = CloudStorageTools::createUploadUrl('/test.php', $options);
$gs_name = $_FILES['sample']['tmp_name'];
move_uploaded_file($gs_name, 'gs://test_sample/');
How can I do this? Their docs does not seem to mention anything about this, except manually doing it.
You can define the permissions that your new uploaded files will have by default with the command:
gsutil defacl ch -u allUsers:R gs://<bucket>
And after you upload your files with your code they should be publicly shared.
Visit the following links for more information about the command:
https://cloud.google.com/storage/docs/gsutil/commands/defacl
https://cloud.google.com/storage/docs/gsutil/commands/acl
Hope it helps.
Related
I want to use a dataset (170+GB) in Google Colab. I have two questions:
Since the available space in Colab is about 66GB, is there a way to use the data from GCS directly in colab, if the data is hosted in GCS? If not, what is a possible solution?
How can I upload the dataset to GCS directly from a downloadable link, since I cannot wget into colab due to the limited available space?
Any help is appreciated.
Authenticate :
from google.colab import auth
auth.authenticate_user()
install google sdk:
!curl https://sdk.cloud.google.com | bash
init the SDK to configure the project settings.
!gcloud init
1 . Download file from Cloud Storage to Google Colab
!gsutil cp gs://google storage bucket/your file.csv .
2 . Upload file from Google Colab to Cloud
gsutil cp yourfile.csv gs://gs bucket/
Hope it helps. Source
I have a working example, that uses tf.io.gfile.copy (doc).
import tensorflow as tf
# Getting file names based on patterns
gcs_pattern = 'gs://flowers-public/tfrecords-jpeg-331x331/*.tfrec'
filenames = tf.io.gfile.glob(gcs_pattern)
#['gs://flowers-public/tfrecords-jpeg-331x331/flowers00-230.tfrec',
# 'gs://flowers-public/tfrecords-jpeg-331x331/flowers01-230.tfrec',
# 'gs://flowers-public/tfrecords-jpeg-331x331/flowers02-230.tfrec',
#...
# Downloading the first file
origin = filenames[0]
dest = origin.split("/")[-1]
tf.io.gfile.copy(origin, dest)
After that if I run the ls command, I can see the file (flowers00-230.tfrec).
In some cases you may need authentication (from G.MAHESH's answer):
from google.colab import auth
auth.authenticate_user()
I would like to download publicly available data from google cloud storage. However, because I need to be in a Python3.x environment, it is not possible to use gsutil. I can download individual files with wget as
wget http://storage.googleapis.com/path-to-file/output_filename -O output_filename
However, commands like
wget -r --no-parent https://console.cloud.google.com/path_to_directory/output_directoryname -O output_directoryname
do not seem to work as they just download an index file for the directory. Neither do rsync or curl attempts based on some initial attempts. Any idea of how to download publicly available data on google cloud storage as a directory?
The approach you mentioned above does not work because Google Cloud Storage doesn't have real "directories". As an example, "path/to/some/files/file.txt" is the entire name of that object. A similarly named object, "path/to/some/files/file2.txt", just happens to share the same naming prefix.
As for how you could fetch these files: The GCS APIs (both XML and JSON) allow you to do an object listing against the parent bucket, specifying a prefix; in this case, you'd want all objects starting with the prefix "path/to/some/files/". You could then make individual HTTP requests for each of the objects specified in the response body. That being said, you'd probably find this much easier to do via one of the GCS client libraries, such as the Python library.
Also, gsutil currently has a GitHub issue open to track adding support for Python 3.
I have uploaded file to google cloud storage via their php API
I am trying to use following code now to access it back.
$bucket = $storage->bucket('my-bucket');
$object = $bucket->object('filename.json');
$string = $object->downloadAsString();
echo $string;
I am just trying to retrieve a json , But problem is its getting cached and keep giving me old file which I modified 30 mins back.
How to disable Cache ? While using downloadAsString ?something like
https://storage.googleapis.com/my-bucket/filename.json?id=randomtimestamp
P.s: I know Disabling cache while uploading json will work, but I just want to disable cache in few of my php scripts.
I have an app that uploads photos regularly to a GCS bucket. When those photos are uploaded, I need to add thumbnails and do some analysis. How do I set up notifications for the bucket?
The way to do this is to create a Cloud Pub/Sub topic for new objects and to configure your GCS bucket to publish messages to that topic when new objects are created.
First, let's create a bucket PHOTOBUCKET:
$ gsutil mb gs://PHOTOBUCKET
Now, make sure you've activated the Cloud Pub/Sub API.
Next, let's create a Cloud Pub/Sub topic and wire it to our GCS bucket with gsutil:
$ gsutil notification create \
-t uploadedphotos -f json \
-e OBJECT_FINALIZE gs://PHOTOBUCKET
The -t specifies the Pub/Sub topic. If the topic doesn't already exist, gsutil will create it for you.
The -e specifies that you're only interested in OBJECT_FINALIZE messages (objects being created). Otherwise you'll get every kind of message in your topic.
The -f specifies that you want the payload of the messages to be the object metadata for the JSON API.
Note that this requires a recent version of gsutil, so be sure to update to the latest version of gcloud, or run gsutil update if you use a standalone gsutil.
Now we have notifications configured and pumping, but we'll want to see them. Let's create a Pub/Sub subscription:
$ gcloud beta pubsub subscriptions create processphotos --topic=uploadedphotos
Now we just need to read these messages. Here's a Python example of doing just that. Here are the relevant bits:
def poll_notifications(subscription_id):
client = pubsub.Client()
subscription = pubsub.subscription.Subscription(
subscription_id, client=client)
while True:
pulled = subscription.pull(max_messages=100)
for ack_id, message in pulled:
print('Received message {0}:\n{1}'.format(
message.message_id, summarize(message)))
subscription.acknowledge([ack_id])
def summarize(message):
# [START parse_message]
data = message.data
attributes = message.attributes
event_type = attributes['eventType']
bucket_id = attributes['bucketId']
object_id = attributes['objectId']
return "A user uploaded %s, we should do something here." % object_id
Here is some more reading on how this system works:
https://cloud.google.com/storage/docs/reporting-changes
https://cloud.google.com/storage/docs/pubsub-notifications
GCP also offers an earlier version of the Pub/Sub cloud storage change notifications called Object Change Notification. This feature will directly POST to your desired endpoint(s) when an object in that bucket changes. Google recommends the Pub/Sub approach.
https://cloud.google.com/storage/docs/object-change-notification
while using this example!
keep in mind two things
1) they have upgraded code to python 3.6 pub_v1 this might not be running on python 2.7
2) while calling poll_notifications(projectid,subscriptionname)
pass your GCP project id : e.g bold-idad & subscrition name e.g asrtopic
I'm trying to insert some storage data onto Bluemix, I searched many wiki pages but I couldn't come to conclude how to proceed. So can any one tell me how to store images, files in storage of Bluemix through any language code ( Java, Node.js)?
You have several options at your disposal for storing files in your app. None of them include doing it in the app container file system as the file space is ephemeral and will be recreated from the droplet each time a new instance of your app is created.
You can use services like MongoLab, Cloudant, Object Storage, and Redis to store all kinda of blob data.
Assuming that you're using Bluemix to write a Cloud Foundry application, another option is sshfs. At your app's startup time, you can use sshfs to create a connection to a remote server that is mounted as a local directory. For example, you could create a ./data directory that points to a remote SSH server and provides a persistent storage location for your app.
Here is a blog post explaining how this strategy works and a source repo showing it used to host a Wordpress blog in a Cloud Foundry app.
Note that as others have suggested, there are a number of services for storing object data. Go to the Bluemix Catalog [1] and select "Data Management" in the left hand margin. Each of those services should have sufficient documentation to get you started, including many sample applications and tutorials. Just click on a service tile, and then click on the "View Docs" button to find the relevant documentation.
[1] https://console.ng.bluemix.net/?ace_base=true/#/store/cloudOEPaneId=store
Check out https://www.ng.bluemix.net/docs/#services/ObjectStorageV2/index.html#gettingstarted. The storage service in Bluemix is OpenStack Swift running in Softlayer. Check out this page (http://docs.openstack.org/developer/swift/) for docs on Swift.
Here is a page that lists some clients for Swift.
https://wiki.openstack.org/wiki/SDKs
As I search There was a service that name was Object Storage service and also was created by IBM. But, at the momenti I couldn't see it in the Bluemix Catalog. I guess , They gave it back and will publish new service in the future.
Be aware that pobject store in bluemix is now S3 compatible. So for instance you can use Boto or boto3 ( for python guys ) It will work 100% API comaptible.
see some example here : https://ibm-public-cos.github.io/crs-docs/crs-python.html
this script helps you to list recursively all objects in all buckets :
import boto3
endpoint = 'https://s3-api.us-geo.objectstorage.softlayer.net'
s3 = boto3.resource('s3', endpoint_url=endpoint)
for bucket in s3.buckets.all():
print(bucket.name)
for obj in bucket.objects.all():
print(" - %s") % obj.key
If you want to specify your credentials this would be :
import boto3
endpoint = 'https://s3-api.us-geo.objectstorage.softlayer.net'
s3 = boto3.resource('s3', endpoint_url=endpoint, aws_access_key_id=YouRACCessKeyGeneratedOnYouBlueMixDAShBoard, aws_secret_access_key=TheSecretKeyThatCOmesWithYourAccessKey, use_ssl=True)
for bucket in s3.buckets.all():
print(bucket.name)
for obj in bucket.objects.all():
print(" - %s") % obj.key
If you want to create a "hello.txt" file in a new bucket. :
import boto3
endpoint = 'https://s3-api.us-geo.objectstorage.softlayer.net'
s3 = boto3.resource('s3', endpoint_url=endpoint, aws_access_key_id=YouRACCessKeyGeneratedOnYouBlueMixDAShBoard, aws_secret_access_key=TheSecretKeyThatCOmesWithYourAccessKey, use_ssl=True)
my_bucket=s3.create_bucket('my-new-bucket')
s3.Object(my_bucket, 'hello.txt').put(Body=b"I'm a test file")
If you want to upload a file in a new bucket :
import boto3
endpoint = 'https://s3-api.us-geo.objectstorage.softlayer.net'
s3 = boto3.resource('s3', endpoint_url=endpoint, aws_access_key_id=YouRACCessKeyGeneratedOnYouBlueMixDAShBoard, aws_secret_access_key=TheSecretKeyThatCOmesWithYourAccessKey, use_ssl=True)
my_bucket=s3.create_bucket('my-new-bucket')
timestampstr = str (timestamp)
s3.Bucket(my_bucket).upload_file(<location of yourfile>,<your file name>, ExtraArgs={ "ACL": "public-read", "Metadata": {"METADATA1": "resultat" ,"METADATA2": "1000","gid": "blabala000", "timestamp": timestampstr },},)