How can I download a file using the WebHDFS knox groovy library? - ibm-cloud

The WebHDFS examples shows how to list files and folders in hdfs, make a directory and upload a file using BigInsights WebHDFS?
How can I adapt the examples to download a file for BigInsights WebHDFS?

The knox api documentation provides many more examples, e.g.
import groovy.json.JsonSlurper
import org.apache.hadoop.gateway.shell.Hadoop
import org.apache.hadoop.gateway.shell.hdfs.Hdfs
gateway = "https://localhost:8443/gateway/sample"
username = "bob"
password = "bob-password"
dataFile = "README"
session = Hadoop.login( gateway, username, password )
text = Hdfs.get( session ).from( "/tmp/example/README" ).now().string
file = new File('README')
file << text
session.shutdown()

Related

Download tableau files from tableau server in a specific Project name folder on your local machine

I am trying to download all the Tableau files from the server on to my local machine .
The below code is downloading the files from a specific folder onto my C://Users/account/
However I want it should create something like below
1.Iterate through all the project names(Tableau server)
2. create a folder on C://Users/Account using the tableau server Project names.
3.Download all the .twb /.twbx files into the respective projects
# First
import tableauserverclient as TSC
import getpass
import os
# Second
user_login = os.getenv('USERNAME')
pw = getpass.getpass('Please enter the password to login to the tableau server\nPassword: ')
# Third
site = 'Global-IT'
server_name = 'http://metrics-it.corp.amazon.com/'
server = TSC.Server(server_name)
server.version = '3.9'
with server.auth.sign_in(TSC.TableauAuth(username=user_login, password=pw, site=site)):
for wb in [w for w in TSC.Pager(server.workbooks)
if w.project_name.startswith('Temporary - To be Decommisioned',)]:
file_path = server.workbooks.download(wb.id)
print(wb.name, ': ', wb.project_name)
print("Successfully downloaded workbook with workbook id: " + wb.id)
print("\nDownloaded the file to {0}.".format(file_path))
print()
print('-------------------------')
print()
Can someone help me with the code.

Google Storage Python ACL Update not Working

I have uploaded one image file to my google storage bucket.
#Block 1
#Storing the local file inside the bucket
blob_response = bucket.blob(cloud_path)
blob_response.upload_from_filename(local_path, content_type='image/png')
File gets uploaded fine. I verify the file in bucket.
After uploading the file, in the same method, I am trying to update the acl for file to be publicly accessible as:
#Block 2
blob_file = storage.Blob(bucket=bucket20, name=path_in_bucket)
acl = blob_file.acl
acl.all().grant_read()
acl.save()
This does not make the file public.
Strange thing is that,after I run the above upload method, if I just call the #Block 2 code. separately in jupyter notebook; It is working fine and file become publicly available.
I have tried:
Checked existence of blob file in bucket after upload code.
Introducing 5 seconds delay after upload.
Any help is appreciated.
If you are changing the file uploaded from upload_from_filename() to public, you can reuse the blob from your upload. Also, add a reloading of acl prior to changing the permission. This was all done in 1 block in Jupyter Notebook using GCP AI Platform.
# Block 1
bucket_name = "your-bucket"
destination_blob_name = "test.txt"
source_file_name = "/home/jupyter/test.txt"
storage_client = storage.Client()
bucket = storage_client.bucket(bucket_name)
blob = bucket.blob(destination_blob_name)
blob.upload_from_filename(source_file_name)
print(blob) #prints the bucket, file uploded
blob.acl.reload() # reload the ACL of the blob
acl = blob.acl
acl.all().grant_read()
acl.save()
for entry in acl:
print("{}: {}".format(entry["role"], entry["entity"]))
Output:

Scala config lookup secrets

I have application.conf file which contains secrets(password for DB etc..) And this secret will be mounted as a file(the file content will contain the actual secrets) in the running pod. How can scala config library be tweaked to handle this. i.e
instead of normal application.conf
db {
user = "username"
password = "xxx"
}
I would have something like this...
db {
user = "username"
password = "${file_location}"
}
As the file is parsed, it should identify that the value of key password, needs to be resolved by looking up the file and loading its contents.
A simple function can be written to load the content of this file, how can this is be integrated with seamlessly with scala config. ie. The rest of the code will continue to use
config.getString(db.password)
I assume you are using configuration of HOCON format and Typesafe configuration library for it.
I don't think it has such feature out of the box, but as an possible alternative you can take a look at include feature - you can include content of another file into your application.conf:
db {
user = "username"
}
include /path/to/pod.conf //include env specific configuration file
and put inside /path/to/pod.conf:
db {
password = "pod_db_pass"
}
So eventually contents of both files will be merged inside application during loading, and your final config will contain password at path db.password
UPDATE
Another possible option load password from file and merge into config file with withFallback method. Example:
import com.typesafe.config._
val passord = "password_from_file"
val passwordConfig = ConfigFactory.parseString(s"db.password=$passord")
val applicationConfig = ConfigFactory.parseString(s"db.user=db_user")// Replace this with `ConfigFactory.load()`
val config = applicationConfig.withFallback(passwordConfig)
println(config)
Printout result:
Config(SimpleConfigObject({"db":{"password":"password_from_file","user":"db_user"}}))
Scatie: https://scastie.scala-lang.org/WW3weuqiT9WRUKfdrZgwcw

moving local data to google cloud bucket using python api

I can move data in google storage to buckets using the following:
gsutil cp afile.txt gs://my-bucket
How to do the same using the python api library:
from google.cloud import storage
storage_client = storage.Client()
# Make an authenticated API request
buckets = list(storage_client.list_buckets())
print(buckets)
Cant find anything more than the above.
There is an API Client Library code sample code here. My code typically looks like below which is a slight variant on the code they provide:
from google.cloud import storage
client = storage.Client(project='<myprojectname>')
mybucket = storage.bucket.Bucket(client=client, name='mybucket')
mydatapath = 'C:\whatever\something' + '\\' #etc
blob = mybucket.blob('afile.txt')
blob.upload_from_filename(mydatapath + 'afile.txt')
In case it is of interest, another method is to run the "gsutil" command line how you have typed in your Original Post using the subprocess command, e.g.:
import subprocess
subprocess.call("gsutil cp afile.txt gs://mybucket/", shell=True)
In my view, there are pros and cons of both methods depending on what you are trying to achieve - the latter method allows multi-threading if you have many files to upload whereas the former method perhaps allows better control, specification of metadata for each file, etc.

Create file in Google Cloud Storage with python

This is the method that i used to save a new file in Google Cloud Storage
cloud_storage_path = "/gs/[my_app_name].appspot.com/%s/%s" % (user_key.id(), img_title)
blobstore_key = blobstore.create_gs_key(cloud_storage_path)
cloud_storage_file = cloudstorage_api.open(
filename=cloud_storage_path, mode="w", content_type=img_type
)
cloud_storage_file.write(img_content)
cloud_storage_file.close()
But when execute this method. The log file printed :
Path should have format /bucket/filename but got /gs/[my_app_name].appspot.com/6473924464345088/background.jpg
PS: i changed [my_app_name] and, [my_app_name].appspot.com is my bucket name
So, what will I do next in this case ?
I can not save the file to that path