Can I include a variable in an `sh` command in zeppelin? - pyspark

I'm using Zeppelin with Hadoop on a Spark cluster.
I'd like to run a command to check files on s3 and I'd like to use a variable.
This is my code
%sh
aws s3 ls s3://my-bucket/my_folder/
Can I replace my-bucket/my_folder/ with a variable?

What do you mean by "a variable"? A Python variable? If so, I'm not sure. But if you just want to pull the path out onto another line, you can use a shell variable:
%sh
export AWS_FOLDER=my-bucket/my_folder/
aws s3 ls s3://$AWS_FOLDER

Related

Azure Databricks: How to delete files of a particular extension outside of DBFS using python

I am able to delete a file of a particular extension from the directory /databricks/driver using the bash command in databricks.
%%bash
rm /databricks/driver/file*.xlsx
But I am unable to figure out, how to access and delete a file outside of dbfs in a python script,
I think using dbutils we cannot access files outside of DBFS and the below command outputs False as its looking in DBFS.
dbutils.fs.rm("/databricks/driver/file*.xlsx")
I am eager to be corrected.
Not sure how to do it using dbutils but I am able to delete it using glob
import os
from glob import glob
for file in glob('/databricks/driver/file*.xlsx'):
os.remove(file)

Riak-KV: how to create bucket in docker-compose file?

I try to use the original riak-kv image in docker-compose and I want on init add one bucket but docker-compose up won't start. How I can edit volumes.schemas to add bucket on init?
Original image allows to add riak.conf file in docker-compose ? If yes, then how can I do that?
Creating a bucket type with a custom datatype
I assume you want to create a bucket type when starting your container. You have to create a file in the /etc/riak/schemas directory with the bucket's name, like bucket_name.dt. The file should contain a single line with the type you would like to create (e.g. counter, set, map, hll).
You can also use the following command to create the file:
echo "counter" > schemas/bucket_name.dt
After that you have to just mount the schemas folder with the file to the /etc/riak/schemas directory in the container:
docker run -d -P -v $(pwd)/schemas:/etc/riak/schemas basho/riak-ts
Creating a bucket type with default datatype
Currently creating a bucket type with default datatype is only available if you add a custom post-start script under the /etc/riak/poststart.d directory.
Create a shell script with the command you would like to run. An example can be found here.
You have to mount it as a read-only file into the /etc/riak/poststart.d folder:
docker run -d -P -v $(pwd)/poststart.d/03-bootstrap-my-datatype.sh:/etc/riak/poststart.d/03-bootstrap-my-datatype.sh:ro basho/riak-ts
References
See the whole documentation for the docker images here. The rest can be found in GitHub.
Also, the available datatypes can be found here.

File path from within Azure CLI task

I have an Azure CLI task which references a PowerShell script (via build artifact) running az commands. Most of these commands work successfully, but when attempting to execute the following command:
az appconfig kv import --name $resourceName -s file --path appconfig.json --format json
I've noticed that the information was not present against the Azure resource and the log file has "File is not available".
I must be referencing the file incorrectly from the build artifact but if anyone could provide some clarity around this that would be great.
I must be referencing the file incorrectly from the build artifact
You can try to add $(System.ArtifactsDirectory) to the json file path. For example: --path $(System.ArtifactsDirectory)/appconfig.json.
System.ArtifactsDirectory: The directory to which artifacts are downloaded during deployment of a release. Example: C:\agent\_work\r1\a
For details ,please refer to predefined variables .
This can be a little tricky to figure out.
System.ArtifactsDirectory is the default variable that indicates the directory to which artifacts are downloaded during deployment of a release.
However, to use a default variable in your script, you must first replace the . in the default variable names with _. For example, to print the value of artifact variable System.ArtifactsDirectory in a PowerShell script, you would have to use $env:SYSTEM_ARTIFACTSDIRECTORY.
I have a similar setup and do it this way within my PowerShell script:
# Define the path to the file
$appSettingsFile="$env:SYSTEM_ARTIFACTSDIRECTORY\<rest_of_the_path>\appconfig.json"
# Pass it to the Azure CLI command
az appconfig kv import -n $appConfigName -s file --path $appSettingsFile --format json --separator . --yes
It is also helpful to view the current values of all variables to see what they contain before using them.
References:
Default variables - System
Using default variables

GCS - multiple credentials in a single boto file

New to GCS (just got started with it today). Looks very promising.
Is there anyway to use multiple S3 accounts (or GCS) in a single boto file? I only see the option to assign keys to one S3 and one GCS account in a single file. I'd like to use multiple credentials.
We're like to copy from S3 to S3, or GCS to GCS, with each of those buckets using different keys.
You should be able to setup multiple profiles within your .boto file.
You could add something like:
[profile prod]
gs_access_key_id=....
gs_secret_access_key=....
[profile dev]
gs_access_key_id=....
gs_secret_access_key=....
And then from your code you can add a profile_name= parameter to the connection call:
con = boto.gs.connection(profile_name="dev")
You can definitely use multiple boto files, just make sure that the credentials in each of them are valid. Every time you need to switch between them, run the following command with the right path.
$ BOTO_CONFIG=/path/to_boto gsutil cp SOME_FILE gs://bucket
Example :
BOTO_CONFIG=/etc/boto.cfg gsutil -m cp text.txt gs://bucket
Additionally, you can have aliases for your different profiles. Just create an alias for each command and you are set !

A script that deletes all tables in Hbase

I can tell hbase to disable and delete particular tables using:
disable 'tablename'
drop 'tablename'
But I want to delete all the tables in the database without hardcoding the names of any of the tables. Is there a way to do this? I want to do this through the command-line utility ./hbase shell, not through Java or Thrift.
disable_all and drop_all have been added as commands in the HBase ruby shell. These commands were added in jira HBASE-3506 These commands take a regex of tables to disable/drop. And they will ask for confirmation before continuing. That should make droping lots of tables pretty easy and not require outside libraries or scripting.
I have a handy script that does exactly this, using the Python Happybase library:
import happybase
c = happybase.Connection()
for table in c.tables():
c.disable_table(table)
c.delete_table(table)
print "Deleted: " + table
You will need Happybase installed to use this script, and you can install it as:
sudo easy_install happybase
You can pipe commands to the bin/hbase shell command. From there you can use some scripting to grab the table names and pipe the disable/delete commands back to hbase.
i.e.
echo "list" | bin/hbase shell | ./filter_table_names.pl > table_names.txt
./turn_table_names_into_disable_delete_commands.pl table_names.txt | bin/hbase shell
There is a hack.
Open $HBASE_HOME/lib/ruby/shell/commands/list.rb file and add below line at the bottom of command method.
return list
After that, list command returns an array of names of all tables.
And then do just like this.
list.each {|t| disable t;drop t}
I'm not deleting tables through the hbase shell but I deleting them from the command line by,
- deleting my hadoop distributed filesystem directory, then,
- creating a new clean hadoop distributed filesystem directory, then,
- formatting my hadoop distributed filesystem with 'hadoop namenode -format', then,
- start-all.sh and start-hbase.sh
Reference:
http://hadoop.apache.org/common/docs/r0.20.1/api/overview-summary.html#overview_description
If you're looking for something that will do this in a 'one-liner' via a shell script you can use this method:
$ echo 'list.each {|t| disable t; drop t}; quit;' | hbase shell
NOTE: The above was run from Bash shell prompt. It echoes the commands into hbase shell and does a loop through all the tables that are returned from the list command, and then disables & drops each table as it iterates through the array that list returned. Once it's done, it quits.