Can we rename cloud storage files using dataflow - google-cloud-storage

I am trying to rename cloud storage files using dataflow program . Is it possible to do that ? if yes then how .

While the Apache Beam SDK does not contain a ready-made PTransform for renaming files, there's nothing preventing you from doing it yourself - pipelines can contain arbitrary code in DoFn's, and you could either use the standard Google Cloud Storage Java APIs, or more conveniently, using Beam's FileSystems API. For example:
class RenameFn extends DoFn<KV<String, String>, Void> {
#ProcessElement
public void process(ProcessContext c) {
ResourceId src = FileSystems.matchNewResource(c.element().getKey());
ResourceId dest = FileSystems.matchNewResource(c.element().getValue());
FileSystems.rename(Arrays.asList(src), Arrays.asList(dest));
}
}

Related

Using the Azure Management Fluent Apis. I would like like to deploy an azure function including the zipped code. Is that possible?

I know how to create the app service plan . I need to be able to deploy the code and the configuration.
Yes, it is possible to create a function app using Azure Management Fluent Api.
You can use the below ifunctionapp interface to create the function app using fluent api's:
public interface IFunctionApp :
Microsoft.Azure.Management.AppService.Fluent.IWebAppBase,
Microsoft.Azure.Management.ResourceManager.Fluent.Core.IBeta,
Microsoft.Azure.Management.ResourceManager.Fluent.Core.IGroupableResource<Microsoft.Azure.Management.AppService.Fluent.IAppServiceManager,Microsoft.Azure.Management.AppService.Fluent.Models.SiteInner>,
Microsoft.Azure.Management.ResourceManager.Fluent.Core.IHasInner<Microsoft.Azure.Management.AppService.Fluent.Models.SiteInner>,
Microsoft.Azure.Management.ResourceManager.Fluent.Core.IHasManager<Microsoft.Azure.Management.AppService.Fluent.IAppServiceManager>,
Microsoft.Azure.Management.ResourceManager.Fluent.Core.ResourceActions.IRefreshable<Microsoft.Azure.Management.AppService.Fluent.IFunctionApp>,
Microsoft.Azure.Management.ResourceManager.Fluent.Core.ResourceActions.IUpdatable<Microsoft.Azure.Management.AppService.Fluent.FunctionApp.Update.IUpdate>
using the "IWithPackageUri.WithPackageUri(String) Method" by specifing the zipped packed to deploy
public Microsoft.Azure.Management.AppService.Fluent.WebDeployment.Definition.IWithExecute WithPackageUri (string packageUri);

Load model from Google Cloud Storage without downloading

Is there a way to serve model from Google Cloud Storage without actually downloading a copy of model? like streaming the data directly?
I'm trying to load a fasttext model that is hosted on Google Cloud Storage. everytime i run the program, it needs to get and download a copy of that model in the bucket.
language_model_filename = 'lid.176.bin' // filename in GCS
language_model_local = 'lid.176.bin' // local file name when downloaded
bucket = storage_client.get_bucket(CLOUD_STORAGE_BUCKET)
blob = bucket.blob(language_model_filename)
blob.download_to_filename(language_model_local)
language_model = FastText.load_model(language_model_local)
You can use Streaming Tranfers for that purpose. As explained in the documentation, you can use the third party boto client library plugin for Cloud Storage.
A streaming download example would look like this:
import sys
downloaded_file = 'saved_data_file'
MY_BUCKET = 'my_app_bucket'
object_name = 'data_file'
src_uri = boto.storage_uri(MY_BUCKET + '/' + object_name, 'gs')
src_uri.get_key().get_file(sys.stdout)

How can I replace 'system' libraries on IBM Analytics Engine?

To help debug an issue with a yarn application, I need to modify some of the system code on IAE to provide more debug output.
I have retrieved this jar file from the cluster to my local machine:
/usr/hdp/current/hadoop-client/hadoop-aws.jar
I've modified the bytecode to log more information when the an exception is thrown on checkOpen():
public class S3AOutputStream extends OutputStream {
...
void checkOpen() throws IOException {
if (closed.get()) {
// some log4j statements added to the bytecode here ...
throw new IOException("Output Stream closed");
}
}
...
}
However, I'm unable to save the library with my changes back to the cluster because I don't have root access.
How can I deploy my modified jar files to the cluster? Assume that I need to install the libraries on the name node and compute nodes.
This is not currently possible with IBM Analytics Engine.
Please raise a support ticket describing your issue.

Change storage class of (existing) objects in Google Cloud Storage

I recently learnt of the new storage tiers and reduced prices announced on the Google Cloud Storage platform/service.
So I wanted to change the default storage class for one of my buckets from Durable Reduced Availability to Coldline, as that is what is appropriate for the files that I'm archiving in that bucket.
I got this note though:
Changing the default storage class only affects objects you add to this bucket going forward. It does not change the storage class of objects that are already in your bucket.
Any advice/tips on how I can change class of all existing objects in the bucket (using Google Cloud Console or gsutil)?
The easiest way to synchronously move the objects to a different storage class in the same bucket is to use rewrite. For example, to do this with gsutil, you can run:
gsutil -m rewrite -s coldline gs://your-bucket/**
Note: make sure gsutil is up to date (version 4.22 and above support the -s flag with rewrite).
Alternatively, you can use the new SetStorageClass action of the Lifecycle Management feature to asynchronously (usually takes about 1 day) modify storage classes of objects in place (e.g. by using a CreatedBefore condition set to some time after you change the bucket's default storage class).
To change the storage class from NEARLINE to COLDLINE, create a JSON file with the following content:
{
"lifecycle": {
"rule": [
{
"action": {
"type": "SetStorageClass",
"storageClass": "COLDLINE"
},
"condition": {
"matchesStorageClass": [
"NEARLINE"
]
}
}
]
}
}
Name it lifecycle.json or something, then run this in your shell:
$ gsutil lifecycle set lifecycle.json gs://my-cool-bucket
The changes may take up to 24 hours to go through. As far as I know, this change will not cost anything extra.
I did this:
gsutil -m rewrite -r -s <storage-class> gs://my-bucket-name/
(-r for recursive, because I want all objects in my bucket to be affected).
You could now use "Data Transfer" to change a storage class by moving your bucket objects to a new bucket.
Access this from the left panel of Storage.
If you couldn't access to the gsutil console, as in Google Cloud Function environment because Cloud Functions server instances don't have gsutil installed. Gsutil works on your local machine because you do have it installed and configured there. For all these cases I suggest you to evaluate the update_storage_class() blob method in python. This method is callable when you retrieve the single blob (in other words it refers to your specific object inside your bucket). Here an example:
from google.cloud import storage
storage_client = storage.Client()
blobs = storage_client.list_blobs(bucket_name)
for blob in blobs:
print(blob.name)
print(blob.storage_class)
all_classes = ['NEARLINE_STORAGE_CLASS', 'COLDLINE_STORAGE_CLASS', 'ARCHIVE_STORAGE_CLASS', 'STANDARD_STORAGE_CLASS', 'MULTI_REGIONAL_LEGACY_STORAGE_CLASS', 'REGIONAL_LEGACY_STORAGE_CLASS']
new_class = all_classes[my_index]
update_storage_class(new_class)
References:
Blobs / Objects documentation: https://googleapis.dev/python/storage/latest/blobs.html#google.cloud.storage.blob.Blob.update_storage_class
Storage classes: https://cloud.google.com/storage/docs/storage-classes

How do you use storage service in Bluemix?

I'm trying to insert some storage data onto Bluemix, I searched many wiki pages but I couldn't come to conclude how to proceed. So can any one tell me how to store images, files in storage of Bluemix through any language code ( Java, Node.js)?
You have several options at your disposal for storing files in your app. None of them include doing it in the app container file system as the file space is ephemeral and will be recreated from the droplet each time a new instance of your app is created.
You can use services like MongoLab, Cloudant, Object Storage, and Redis to store all kinda of blob data.
Assuming that you're using Bluemix to write a Cloud Foundry application, another option is sshfs. At your app's startup time, you can use sshfs to create a connection to a remote server that is mounted as a local directory. For example, you could create a ./data directory that points to a remote SSH server and provides a persistent storage location for your app.
Here is a blog post explaining how this strategy works and a source repo showing it used to host a Wordpress blog in a Cloud Foundry app.
Note that as others have suggested, there are a number of services for storing object data. Go to the Bluemix Catalog [1] and select "Data Management" in the left hand margin. Each of those services should have sufficient documentation to get you started, including many sample applications and tutorials. Just click on a service tile, and then click on the "View Docs" button to find the relevant documentation.
[1] https://console.ng.bluemix.net/?ace_base=true/#/store/cloudOEPaneId=store
Check out https://www.ng.bluemix.net/docs/#services/ObjectStorageV2/index.html#gettingstarted. The storage service in Bluemix is OpenStack Swift running in Softlayer. Check out this page (http://docs.openstack.org/developer/swift/) for docs on Swift.
Here is a page that lists some clients for Swift.
https://wiki.openstack.org/wiki/SDKs
As I search There was a service that name was Object Storage service and also was created by IBM. But, at the momenti I couldn't see it in the Bluemix Catalog. I guess , They gave it back and will publish new service in the future.
Be aware that pobject store in bluemix is now S3 compatible. So for instance you can use Boto or boto3 ( for python guys ) It will work 100% API comaptible.
see some example here : https://ibm-public-cos.github.io/crs-docs/crs-python.html
this script helps you to list recursively all objects in all buckets :
import boto3
endpoint = 'https://s3-api.us-geo.objectstorage.softlayer.net'
s3 = boto3.resource('s3', endpoint_url=endpoint)
for bucket in s3.buckets.all():
print(bucket.name)
for obj in bucket.objects.all():
print(" - %s") % obj.key
If you want to specify your credentials this would be :
import boto3
endpoint = 'https://s3-api.us-geo.objectstorage.softlayer.net'
s3 = boto3.resource('s3', endpoint_url=endpoint, aws_access_key_id=YouRACCessKeyGeneratedOnYouBlueMixDAShBoard, aws_secret_access_key=TheSecretKeyThatCOmesWithYourAccessKey, use_ssl=True)
for bucket in s3.buckets.all():
print(bucket.name)
for obj in bucket.objects.all():
print(" - %s") % obj.key
If you want to create a "hello.txt" file in a new bucket. :
import boto3
endpoint = 'https://s3-api.us-geo.objectstorage.softlayer.net'
s3 = boto3.resource('s3', endpoint_url=endpoint, aws_access_key_id=YouRACCessKeyGeneratedOnYouBlueMixDAShBoard, aws_secret_access_key=TheSecretKeyThatCOmesWithYourAccessKey, use_ssl=True)
my_bucket=s3.create_bucket('my-new-bucket')
s3.Object(my_bucket, 'hello.txt').put(Body=b"I'm a test file")
If you want to upload a file in a new bucket :
import boto3
endpoint = 'https://s3-api.us-geo.objectstorage.softlayer.net'
s3 = boto3.resource('s3', endpoint_url=endpoint, aws_access_key_id=YouRACCessKeyGeneratedOnYouBlueMixDAShBoard, aws_secret_access_key=TheSecretKeyThatCOmesWithYourAccessKey, use_ssl=True)
my_bucket=s3.create_bucket('my-new-bucket')
timestampstr = str (timestamp)
s3.Bucket(my_bucket).upload_file(<location of yourfile>,<your file name>, ExtraArgs={ "ACL": "public-read", "Metadata": {"METADATA1": "resultat" ,"METADATA2": "1000","gid": "blabala000", "timestamp": timestampstr },},)