What parameter(s) do I have to pass `gsutil` to access a Google Cloud local storage? (storage-testbench) - google-cloud-storage

For test purposes, I want to run the storage-testbench simulator. It allows me to send REST commands to a local server which is supposed to work like a Google Cloud Storage facility.
In my tests, I want to copy 3 files from my local hard drive to that local GCS-like storage facility using gsutil cp .... I found out that in order to connect to that specific server, I need additional options on the command line as follow:
gsutil \
-o "Credentials:gs_json_host=127.0.0.1" \
-o "Credentials:gs_json_port=9000" \
-o "Boto:https_validate_certificates=False" \
cp -p test my-file.ext gs://bucket-name/my-file.ext
See .boto for details on defining the credentials.
Unfortunately, I get this error:
CommandException: No URLs matched: test
The name at the end (test) is the project identifier (-p test). There is an example in the README.md of the storage-testbench project, although it's just a variable in a URI.
How do I make the cp command work?
Note:
The gunicorn process shows that the first GET from the cp command works as expected. It returns a 200. So the issue seems to be inside gsutil. Also, I'm able to create the bucket just fine:
gsutil \
-o "Credentials:gs_json_host=127.0.0.1" \
-o "Credentials:gs_json_port=9000" \
-o "Boto:https_validate_certificates=False" \
mb -p test gs://bucket-name
Trying the mb a second time gives me a 509 as expected.
More links:
gsutil global options
gsutil cp ...

Related

Copy a file into kubernetes pod without using kubectl cp

I have a use case where my pod is run as non-rootuser and its running a python app.
Now I want to copy file from master node to running pod. But when I try to run
kubectl cp app.py 103000-pras-dev/simplehttp-777fd86759-w79pn:/tmp
This command hungs up but when i run pod as root user and then run the same command
it executes successfully. I was going through the code of kubectl cp where it internally uses tar command.
Tar command has got multiple flags like --overwrite --no-same-owner, --no-preserve and few others. Now from kubectl cp we can't pass all those flag to tar. Is there any way by which I can copy file using kubectl exec command or any other way.
kubectl exec simplehttp-777fd86759-w79pn -- cp app.py /tmp/ **flags**
If the source file is a simple text file, here's my trick:
#!/usr/bin/env bash
function copy_text_to_pod() {
namespace=$1
pod_name=$2
src_filename=$3
dest_filename=$4
base64_text=`cat $src_filename | base64`
kubectl exec -n $namespace $pod_name -- bash -c "echo \"$base64_text\" | base64 -d > $dest_filename"
}
copy_text_to_pod my-namespace my-pod-name /path/of/source/file /path/of/target/file
Maybe base64 is not necessary. I put it here in case there is some special character in the source file.
Meanwhile I found a hack, disclaimer this is not the exact kubectl cp just a workaround.
I have written a go program where I have created a goroutine to read file and attached that to stdin and ran kubectl exec tar command with proper flags. Here is what I did
reader, writer := io.Pipe()
copy := exec.CommandContext(ctx, "kubectl", "exec", pod.Name, "--namespace", pod.Namespace, "-c", container.Name, "-i",
"--", "tar", "xmf", "-", "-C", "/", "--no-same-owner") // pass all the flags you want to
copy.Stdin = reader
go func() {
defer writer.Close()
if err := util.CreateMappedTar(writer, "/", files); err != nil {
logrus.Errorln("Error creating tar archive:", err)
}
}()
Helper function definition
func CreateMappedTar(w io.Writer, root string, pathMap map[string]string) error {
tw := tar.NewWriter(w)
defer tw.Close()
for src, dst := range pathMap {
if err := addFileToTar(root, src, dst, tw); err != nil {
return err
}
}
return nil
}
Obviously, this thing doesn't work because of permission issue but *I was able to pass tar flags
If it is only a text file it can be also "copied" via netcat.
1) You have to be logged on both nodes
$ kubectl exec -ti <pod_name> bash
2) Make sure to have netcat, if not install them
$ apt-get update
$ apt-get install netcat-openbsd
3) Go to the folder with permissions i.e.
/tmp
4) Inside the container where you have python file write
$ cat app.py | nc -l <random_port>
Example
$ cat app.py | nc -l 1234
It will start listening on provided port.
5) Inside the container where you want have the file
$ nc <PodIP_where_you_have_py_file> > app.py
Example
$ nc 10.36.18.9 1234 > app.py
It must be POD IP, it will not recognize pod name. To get ip use kubectl get pods -o wide
It will copy content of app.py file to the other container file. Unfortunately, you will need to add permissions manual or you can use script like (sleep is required due to speed of "copying"):
#!/bin/sh
nc 10.36.18.9 1234 > app.py | sleep 2 |chmod 770 app.py;
Copy a file into kubernetes pod without using kubectl cp
kubectl cp is bit of a pain to work with. For example:
installing kubectl and configuring it (might need it on multiple machines). In our company, most people only have a restrictive kubectl access from rancher web GUI. No CLI access is provided for most people.
network restrictions in enterprises
Large file downloads/uploads may stop or freeze sometimes probably because traffic goes through k8s API server.
weird tar related errors keep popping up etc..
One of the reasons for lack of support to copy the files from a pod(or other way around) is because k8s pods were never meant to be used like a VM.. They are meant to be ephemeral. So, the expectation is to not store/create any files on the pod/container disk.
But sometimes we are forced to do this, especially while debugging issues or using external volumes..
Below is the solution we found effective. This might not be right for you/your team.
We now instead use azure blob storage as a mediator to exchange files between a kubernetes pod and any other location. The container image is modified to include azcopy utility (Dockerfile RUN instruction below will install azcopy in your container).
RUN /bin/bash -c 'wget https://azcopyvnext.azureedge.net/release20220511/azcopy_linux_amd64_10.15.0.tar.gz && \
tar -xvzf azcopy_linux_amd64_10.15.0.tar.gz && \
cp ./azcopy_linux_amd64_*/azcopy /usr/bin/ && \
chmod 775 /usr/bin/azcopy && \
rm azcopy_linux_amd64_10.15.0.tar.gz && \
rm -rf azcopy_linux_amd64_*'
Checkout this SO question for more on azcopy installation.
When we need to download a file,
we simply use azcopy to copy the file from within the pod to azure blob storage. This can be done either programmatically or manually.
Then we download the file to local machine from azure blob storage explorer. Or some job/script can pick up this file from blob container.
Similar thing is done for upload as well. The file is first placed in blob storage container. This can be done manually using blob storage explorer or can be done programmatically. Next, from within the pod azcopy can pull the file from blob storage and place it inside the pod.
The same can be done with AWS (S3) or GCP or using any other cloud provider.
Probably even SCP, SFTP, RSYNC can be used.

How to get output of gcloud composer command?

I'm executing gcloud composer commands:
gcloud composer environments run airflow-composer \
--location europe-west1 --user-output-enabled=true \
backfill -- -s 20171201 -e 20171208 dags.my_dag_name \
kubeconfig entry generated for europe-west1-airflow-compos-007-gke.
It's a regular airflow backfill. The command above is printing the results at the end of the whole backfill range, is there any way to get the output in a streaming manner ? Each time a DAG gets backfilled it will be printed in the standard output, like in a regular airflow-cli.

How to setup gsutil to run from Anacron?

As user, gsutil works nice.
gsutil works nice when called from crontab (user).
As root, gsutil says:
Caught non-retryable exception while listing gs://....: ServiceException: 401 Anonymous users does not have storage.objects.list access to bucket ...."
gsutil does not work when called from Anacron (root).
Other scripts called from Anacron run nice.
The ~/.boto file contains credentials, and is located in user HOME directory.
So maybe that is causing the exception.
I tried setting BOTO_CONFIG, but it didn't change results:
$ gsutil -D ls 2>&1 | grep config_file_list
config_file_list: ['/home/wolfv/.boto']
$ sudo gsutil -D ls 2>&1 | grep config_file_list
config_file_list: []
$ BOTO_CONFIG="/root/.boto"
$ sudo gsutil -D ls 2>&1 | grep config_file_list
config_file_list: []
How to setup gsutil to run from Anacron?
$ gsutil -D
gsutil version: 4.22
checksum: 2434a37a663d09ae21d1644f64ce60ca (OK)
boto version: 2.42.0
python version: 2.7.13 (default, Jan 12 2017, 17:59:37) [GCC 6.3.1 20161221 (Red Hat 6.3.1-1)]
OS: Linux 4.9.11-200.fc25.x86_64
multiprocessing available: True
using cloud sdk: True
config path: /home/wolfv/.boto
gsutil path: /home/wolfv/Downloads/google-cloud-sdk/platform/gsutil/gsutil
compiled crcmod: True
installed via package manager: False
editable install: False
Command being run: /home/wolfv/Downloads/google-cloud-sdk/platform/gsutil/gsutil -o GSUtil:default_project_id=redacted -D
config_file_list: ['/home/wolfv/.config/gcloud/legacy_credentials/redacted/.boto', '/home/wolfv/.boto']
config: [('debug', '0'), ('working_dir', '/mnt/pyami'), ('https_validate_certificates', 'True'), ('debug', '0'), ('working_dir', '/mnt/pyami'), ('content_language', 'en'), ('default_api_version', '2'), ('default_project_id', 'redacted')]
UPDATE_1
export BOTO_CONFIG worked for the terminal:
$ sudo -s
[root] # export BOTO_CONFIG=/home/wolfv/.boto
[root] # gsutil -D ls 2>&1 | grep config_file_list
config_file_list: ['/home/wolfv/.boto']
[root] # vi /root/.bashrc
add this line to end of .bashrc:
export BOTO_CONFIG=/home/wolfv/.boto
exit
open new terminal and test the new BOTO_CONFIG in bash.rc
$ sudo -s
[root] # gsutil -D ls 2>&1 | grep config_file_list
config_file_list: ['/home/wolfv/.boto']
exit
Unfortunately export BOTO_CONFIG in /root/.bashrc did not help Anacron call gsutil.
The backup log shows that Anacron called the backup script, and the backup script call to gsutil failed.
Does it matter in which initialization script sets path BOTO_CONFIG?
To make the path permanently accessible to Anacron (root), in which file should set BOTO_CONFIG?:
/etc/profile
/root/.bash_profile
/root/.bashrc
UPDATE_2
My credentials are now invlalid, probably from some change I made.
Here is my attempt at houglum's suggestions for BOTO_CONFIG.
First authorize login to get that out of the way:
$ gcloud auth login
Your browser has been opened to visit:
https://accounts.google.com/o/oauth2/auth?redirect_uri=http%3A%2F%2Flocalhost%3A8085%2F&prompt=select_account&response_type=code&client_id=redacted.apps.googleusercontent.com&scope=https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fuserinfo.email+https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fcloud-platform+https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fappengine.admin+https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fcompute+https%3A%2F%2Fwww.googleapis.com%2Fauth%2Faccounts.reauth&access_type=offline
Created new window in existing browser session.
WARNING: `gcloud auth login` no longer writes application default credentials.
If you need to use ADC, see:
gcloud auth application-default --help
You are now logged in as [edacted].
Your current project is [redacted]. You can change this setting by running:
$ gcloud config set project PROJECT_ID
Defining BOTO_CONFIG inline does not work:
$ BOTO_CONFIG=/home/wolfv/.boto gsutil ls
Your credentials are invalid. Please run
$ gcloud auth login
Exporting BOTO_CONFIG does not work:
$ export BOTO_CONFIG=/home/wolfv/.boto; gsutil ls
Your credentials are invalid. Please run
$ gcloud auth login
Sourcing bashrc does not work:
$ ls /home/wolfv/.bashrc
/home/wolfv/.bashrc
$ . /home/wolfv/.bashrc; gsutil ls
Your credentials are invalid. Please run
$ gcloud auth login
UPDATE_3
My credentials work if I remove my credentials from .boto, and use auth login instead (based on Your credentials are invalid. Please run $ gcloud auth login)
$ gcloud auth login redacted#email.com
WARNING: `gcloud auth login` no longer writes application default credentials.
If you need to use ADC, see:
gcloud auth application-default --help
You are now logged in as [redacted#email.com].
Your current project is [redacted-123]. You can change this setting by running:
$ gcloud config set project PROJECT_ID
After using auth login, gsutil works from the terminal:
$ gsutil ls
gs://redacted/
gs://redacted/
gs://redacted/
And the backup script that calls gsutil also works from the terminal:
$ ~/scripts/backup_to_gcs/backup_to_gcs.sh
backup_to_gcs.sh in progress ...
backup_to_gcs.sh completed successfully
However, backup_to_gcs.sh fails when called from crontab.
How to run gsutil from crontab?
UPDATE_4
This is in my anacron file:
1 10 anacron_test_id BOTO_PATH=/home/wolfv/.config/gcloud/legacy_credentials/wolfvolpi#gmail.com/.boto:/home/wolfv/.boto /home/wolfv/scripts/backup_to_gcs/backup_to_gcs.sh
anacron runs the backup_to_gcs.sh script as expected, but the backup fails.
When backup_to_gcs.sh script is called from command line, it works fine.
Probably because gsutil runs as user, but does not run as root:
$ gsutil ls
gs://wolfv/
gs://wolfv-test-log/
gs://wolfv2/
gs://wolfvtest/
$ BOTO_PATH=/home/wolfv/.config/gcloud/legacy_credentials/wolfvolpi#gmail.com/.boto:/home/wolfv/.boto gsutil ls
gs://wolfv/
gs://wolfv-test-log/
gs://wolfv2/
gs://wolfvtest/
$ sudo BOTO_PATH=/home/wolfv/.config/gcloud/legacy_credentials/wolfvolpi#gmail.com/.boto:/home/wolfv/.boto gsutil ls
sudo: gsutil: command not found
$ sudo gsutil ls
sudo: gsutil: command not found
Two days ago root was able to run gsutil.
Since then I used dnf history rollback to uninstall a different software.
Could that have effected gsutil authentication?
UPDATE_5
I followed the instructions on https://cloud.google.com/storage/docs/authentication#gsutilauth
USING SERVICE ACCOUNT
$ gcloud auth activate-service-account --key-file=/home/wolfv/REDACTED.json
Activated service account credentials for: [REDACTED#appspot.gserviceaccount.com]
But still, root could not run gsutil:
$ sudo gsutil ls
sudo: gsutil: command not found
$ gsutil ls -la gs://wolfvtest/test_lifecycle/
CommandException: You have multiple types of configured credentials (['Oauth 2.0 User Account', 'OAuth 2.0 Service Account']), which is not supported. One common way this happens is if you run gsutil config to create credentials and later run gcloud auth, and create a second set of credentials. Your boto config path is: ['/home/wolfv/.boto', '/home/wolfv/.config/gcloud/legacy_credentials/my-project#appspot.gserviceaccount.com/.boto']. For more help, see "gsutil help creds".
The help referse to a page that no longer mentions "auth" https://developers.google.com/cloud/sdk/gcloud/#gcloud.auth
So I have one too many credentials:
$ gsutil -D
...
config_file_list: ['/home/wolfv/.boto', '/home/wolfv/.config/gcloud/legacy_credentials/my-project#appspot.gserviceaccount.com/.boto']
Are any of these credentials used by root (for anacron)?
They are not in the root directory.
Should credintals needed for anacron be in the root directory?
UPDATE_5
I tried again after installing Fedora 26 on How to authorize root to run gsutil?
When you execute BOTO_CONFIG=<value> in the shell, you're not actually defining an environment variable, but rather a local shell variable (see this thread for more details). You want to either define the variable inline with the command:
BOTO_CONFIG=/path/to/config gsutil ls
or first export the BOTO_CONFIG environment variable, then run the gsutil command:
export BOTO_CONFIG=/path/to/config; gsutil ls
EDIT:
I just noticed that in addition to your own $HOME/.boto file, you're relying on gcloud's credentials that get set up from gcloud auth login. When you run this, gcloud creates another .boto file for you, and when you run gsutil from gcloud's wrapper script, it loads that .boto file first, followed by whatever .boto file(s) you specify with either the BOTO_CONFIG or BOTO_PATH environment variable.
If you want to run as root (which the cron job does) and use both those .boto files, you'll need to instead use the BOTO_PATH variable to list them, separated by colons, also making sure the BOTO_CONFIG environment variable is not set (BOTO_CONFIG takes precedence over BOTO_PATH... the gsutil docs mention this briefly):
BOTO_PATH=/home/wolfv/.config/gcloud/legacy_credentials/REDACTED/.boto:/home/wolfv/.boto gcloud ls
EDIT 2:
1) When you get the error "sudo: gsutil: command not found", it means that the root user cannot find the gsutil executable in its PATH. You should use the absolute path to the gsutil executable instead -- from your post, it looks like this is /home/wolfv/Downloads/google-cloud-sdk/platform/gsutil/gsutil.
2) When you activate service account credentials, the gcloud wrapper for gsutil will create a separate .boto file (with a path containing legacy_credentials/myproject#appspot[...]), and prefer to use this one if it's present. It contains the attribute gs_service_key_file, while your other .boto file probably contains gs_oauth2_refresh_token -- loading multiple .boto files with multiple credentials attributes like this will result in the error you're seeing.
If you want to use gcloud to manage your auth credentials, you generally shouldn't put anything under the [Credentials] section of your $HOME/.boto file.

How to share entire Google Cloud Bucket with GSUTIL

Is there a command using GSUTIL that will allow me to share publicly everything in a specific Bucket? Right now, I'm forced to go through and check "share publicly" individually on EVERY SINGLE FILE in the console.
The best way to do this is:
gsutil -m acl ch -u 'AllUsers:R' gs://your-bucket/**
will update ACLs for each existing object in the bucket.
If you want newly created objects in this bucket to also be public, you should also run:
gsutil defacl ch -u 'AllUsers:R' gs://your-bucket
This question was also asked here but the answer recommends using acl set public-read which has the downside of potentially altering your existing ACLs.
$> gsutil acl ch -g All:R -r gs://bucketName
gsutil is the command-line utility for GCS.
"acl ch" means "Modify an ACL."
"-g All:R" means "include read permissions for all users."
"-r" means "recursively"
and the rest is the path.
If you have a whole lot of files and you want MORE SPEED, you can use -m to mean "and also do this multithreaded!", like so:
$> gsutil -m acl ch -g All:R -r gs://bucketName

Google Cloud Storage: bulk edit ACLs

We are in the process of moving our servers into the Google Cloud Compute Engine and starting to look the Cloud Storage as a CDN option. I uploaded about 1,000 files through the Developer Console but the problem is all the Object Permissions for All Users is set at None. I can't find any way to edit all the permissions to give All Users Reader access. Am I missing something?
You can use the gsutil acl ch command to do this as follows:
gsutil -m acl ch -R -g All:R gs://bucket1 gs://bucket2/object ...
where:
-m sets multi-threaded mode, which is faster for a large number of objects
-R recursively processes the bucket and all of its contents
-g All:R grants all users read-only access
See the acl documentation for more details.
You can use Google Cloud Shell as your console via a web browser if you just need to run a single command via gsutil, as it comes preinstalled in your console VM.
In addition to using the gsutil acl command to change the existing ACLs, you can use the gsutil defacl command to set the default object ACL on the bucket as follows:
gsutil defacl set public-read gs://«your bucket»
You can then upload your objects in bulk via:
gsutil -m cp -R «your source directory» gs://«your bucket»
and they will have the correct ACLs set. This will all be much faster than using the web interface.
You can set the access control permission by using "predefinedAcl" the code is as follows.
Storage.Objects.Insert insertObject =client.objects().insert(, ,);
insertObject.setPredefinedAcl("publicRead");
This will work fine
Do not miss to put jolly characters after the bucket's object to apply changes to each files - example:
gsutil -m acl ch -R -g All:R gs://bucket/files/*
for all files inside the 'files' folder, or:
gsutil -m acl ch -R -g All:R gs://bucket/images/*.jpg
for each jpg file inside the 'images' folder.