how to download a file using gsutil - powershell

I'm starting to use gsutil on windows xp.I have Python 2.7 in c:\Python27. I have setup and can run Python scripts including gsutil in my Windows PowerShell ide succesfully to list my files. EG gsutil ls -L gs://mybucket shows my files present and correct. The developers guide suggests the following example to download a file from storage
gsutil cp gs://cats/*.jpg file://pets/
I dont understand the syntax here. I have a file in storage gs://pussy/debug.txt and I want to download it to c:\test\debug.txt
How should I write this command.
I tried
gsutil cp gs://pussy/debug.txt file c:\test\
but it gives me the following error
At line:1 char:7
+ gsutil <<<< cp gs://pussy/debug.txt file c:\test\
+ CategoryInfo : NotSpecified: (Copying gs://pussy/debug.txt...:S
tring) [], RemoteException
+ FullyQualifiedErrorId : NativeCommandError
Traceback (most recent call last):
File "c:\gsutil\gsutil.py", line 88, in <module>
sys.exit(gslib.__main__.main())
File "c:\gsutil\gslib\__main__.py", line 199, in main
parallel_operations)
File "c:\gsutil\gslib\__main__.py", line 287, in _RunNamedCommandAndHandleExc
eptions
parallel_operations)
File "c:\gsutil\gslib\command_runner.py", line 188, in RunNamedCommand
return command_inst.RunCommand()
File "c:\gsutil\gslib\commands\cp.py", line 2273, in RunCommand
shared_attrs)
File "c:\gsutil\gslib\command.py", line 803, in Apply
use_thr_exc_handler=ignore_subprocess_failures)
File "c:\gsutil\gslib\command.py", line 908, in _ApplyThreads
return_value = func(args)
File "c:\gsutil\gslib\commands\cp.py", line 2143, in _CopyFunc
self._PerformCopy(exp_src_uri, dst_uri))
File "c:\gsutil\gslib\commands\cp.py", line 1560, in _PerformCopy
src_key = src_uri.get_key(False, download_headers)
File "c:\gsutil\third_party\boto\boto\storage_uri.py", line 189, in get_key
key = bucket.get_key(self.object_name, headers, version_id)
File "c:\gsutil\third_party\boto\boto\file\bucket.py", line 92, in get_key
fp = open(key_name, 'rb')
IOError: [Errno 2] No such file or directory: u'file'
Can anyone help ?

The first command is assuming unix paths. You'll need to do the following to use it on windows:
gsutil cp gs://folder/filename c:\destfolder\file
OR
gsutil cp gs://folder/filename file:///c|/destfolder/file
Or possibly even
gsutil cp 'gs://folder/filename' 'file:///c|/destfolder/file'
Or with variables
$src = 'gs://folder/filename';
$dest = 'file:///c:/destfolder/file'
gsutil cp $src $dest

I've had the same issue.
They way I make it work is simple, do not include ":" they are not allow on this type of path.
For example try something like this:
python gsutil cp -r gs://pubsite_prod_rev_XXXXXXXXXXXXXXXXXX/reviews/reviews_com.YYYYY.ZZZZ_201501.csv 'test/'
It will create a folder called test and save the file inside
Hope it helps.

Related

Why am I getting a read-only file system from github and an error when trying to install apache airflow?

I am working on VirtualBox 6.0 with Python 3.5. I am trying to install airflow from github using the requirements-python3.5.txt file (https://raw.githubusercontent.com/apache/airflow/v1-10-stable/requirements/requirements-python3.5.txt). However, when I try to download this file from the command line, I get a read-only file system:
vagrant#learnairflow:~$ source .sandbox/bin/activate
(.sandbox) vagrant#learnairflow:~$ wget https://raw.githubusercontent.com/apache/airflow/v1-10-stable/requirements/requirements-python3.5.txt
--2020-06-13 15:47:54-- https://raw.githubusercontent.com/apache/airflow/v1-10-stable/requirements/requirements-python3.5.txt
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 199.232.48.133
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|199.232.48.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 6210 (6.1K) [text/plain]
requirements-python3.5.txt: Read-only file system
Cannot write to ‘requirements-python3.5.txt’ (Success).
Subsequently, when I try to install airflow I get the following error:
(.sandbox) vagrant#learnairflow:~$ pip install "apache-airflow[celery, crypto, mysql, rabbitmq, redis]"==1.10.10 --constraint requirements-python3.5.txt
WARNING: The directory '/home/vagrant/.cache/pip' or its parent directory is not owned or is not writable by the current user. The cache has been disabled. Check the permissions and owner of that directory. If executing pip with sudo, you may want sudo's -H flag.
ERROR: Exception:
Traceback (most recent call last):
File "/home/vagrant/.sandbox/lib/python3.5/site-packages/pip/_internal/cli/base_command.py", line 188, in _main
status = self.run(options, args)
File "/home/vagrant/.sandbox/lib/python3.5/site-packages/pip/_internal/cli/req_command.py", line 185, in wrapper
return func(self, options, args)
File "/home/vagrant/.sandbox/lib/python3.5/site-packages/pip/_internal/commands/install.py", line 288, in run
wheel_cache = WheelCache(options.cache_dir, options.format_control)
File "/home/vagrant/.sandbox/lib/python3.5/site-packages/pip/_internal/cache.py", line 296, in __init__
self._ephem_cache = EphemWheelCache(format_control)
File "/home/vagrant/.sandbox/lib/python3.5/site-packages/pip/_internal/cache.py", line 265, in __init__
globally_managed=True,
File "/home/vagrant/.sandbox/lib/python3.5/site-packages/pip/_internal/utils/temp_dir.py", line 137, in __init__
path = self._create(kind)
File "/home/vagrant/.sandbox/lib/python3.5/site-packages/pip/_internal/utils/temp_dir.py", line 185, in _create
tempfile.mkdtemp(prefix="pip-{}-".format(kind))
File "/usr/local/lib/python3.5/tempfile.py", line 358, in mkdtemp
prefix, suffix, dir, output_type = _sanitize_params(prefix, suffix, dir)
File "/usr/local/lib/python3.5/tempfile.py", line 130, in _sanitize_params
dir = gettempdir()
File "/usr/local/lib/python3.5/tempfile.py", line 296, in gettempdir
tempdir = _get_default_tempdir()
File "/usr/local/lib/python3.5/tempfile.py", line 231, in _get_default_tempdir
dirlist)
FileNotFoundError: [Errno 2] No usable temporary directory found in ['/tmp', '/var/tmp', '/usr/tmp', '/home/vagrant']
I've tried using the sudo command but it doesn't work either. Do you have any idea of what might be causing this error and how to fix it? Thank you in advance!

gcloud Object metadata supplied for destination object had no object name

I try to rsync a local folder with a google cloud bucket. However, I get the following exception from gcloud:
ArgumentException: Object metadata supplied for destination object had no object name.
Does anybody now a workaround for this?
StackTrace:
018-04-15T08:30:06.3806055Z - [2 files][ 9.0 MiB/ 16.3 MiB] 69.2 KiB/s
2018-04-15T08:30:06.3806130Z DEBUG: Exception stack trace:
2018-04-15T08:30:06.3806196Z Traceback (most recent call last):
2018-04-15T08:30:06.3806287Z File "C:\Program Files (x86)\Google\Cloud SDK\google-cloud-sdk\platform\gsutil\gslib\__main__.py", line 571, in _RunNamedCommandAndHandleExceptions
2018-04-15T08:30:06.3806398Z user_project=user_project)
2018-04-15T08:30:06.3806489Z File "C:\Program Files (x86)\Google\Cloud SDK\google-cloud-sdk\platform\gsutil\gslib\command_runner.py", line 319, in RunNamedCommand
2018-04-15T08:30:06.3806582Z return_code = command_inst.RunCommand()
2018-04-15T08:30:06.3806672Z File "C:\Program Files (x86)\Google\Cloud SDK\google-cloud-sdk\platform\gsutil\gslib\commands\rsync.py", line 1462, in RunCommand
2018-04-15T08:30:06.3806763Z fail_on_error=True, seek_ahead_iterator=seek_ahead_iterator)
2018-04-15T08:30:06.3807079Z File "C:\Program Files (x86)\Google\Cloud SDK\google-cloud-sdk\platform\gsutil\gslib\command.py", line 1383, in Apply
2018-04-15T08:30:06.3807172Z arg_checker, should_return_results, fail_on_error)
2018-04-15T08:30:06.3807263Z File "C:\Program Files (x86)\Google\Cloud SDK\google-cloud-sdk\platform\gsutil\gslib\command.py", line 1454, in _SequentialApply
2018-04-15T08:30:06.3807353Z worker_thread.PerformTask(task, self)
2018-04-15T08:30:06.3807449Z File "C:\Program Files (x86)\Google\Cloud SDK\google-cloud-sdk\platform\gsutil\gslib\command.py", line 2120, in PerformTask
2018-04-15T08:30:06.3807540Z results = task.func(cls, task.args, thread_state=self.thread_gsutil_api)
2018-04-15T08:30:06.3807636Z File "C:\Program Files (x86)\Google\Cloud SDK\google-cloud-sdk\platform\gsutil\gslib\commands\rsync.py", line 1252, in _RsyncFunc
2018-04-15T08:30:06.3807736Z gzip_exts=cls.gzip_exts, preserve_posix=cls.preserve_posix_attrs)
2018-04-15T08:30:06.3807835Z File "C:\Program Files (x86)\Google\Cloud SDK\google-cloud-sdk\platform\gsutil\gslib\copy_helper.py", line 3515, in PerformCopy
2018-04-15T08:30:06.3807922Z allow_splitting=allow_splitting, gzip_encoded=gzip_encoded)
2018-04-15T08:30:06.3808025Z File "C:\Program Files (x86)\Google\Cloud SDK\google-cloud-sdk\platform\gsutil\gslib\copy_helper.py", line 2021, in _UploadFileToObject
2018-04-15T08:30:06.3808115Z parallel_composite_upload, logger)
2018-04-15T08:30:06.3808208Z File "C:\Program Files (x86)\Google\Cloud SDK\google-cloud-sdk\platform\gsutil\gslib\copy_helper.py", line 1872, in _DelegateUploadFileToObject
2018-04-15T08:30:06.3808308Z elapsed_time, uploaded_object = upload_delegate()
2018-04-15T08:30:06.3808401Z File "C:\Program Files (x86)\Google\Cloud SDK\google-cloud-sdk\platform\gsutil\gslib\copy_helper.py", line 2004, in CallNonResumableUpload
2018-04-15T08:30:06.3808489Z gzip_encoded=gzip_encoded_file)
2018-04-15T08:30:06.3808591Z File "C:\Program Files (x86)\Google\Cloud SDK\google-cloud-sdk\platform\gsutil\gslib\copy_helper.py", line 1583, in _UploadFileToObjectNonResumable
2018-04-15T08:30:06.3808684Z fields=UPLOAD_RETURN_FIELDS, gzip_encoded=gzip_encoded)
2018-04-15T08:30:06.3808778Z File "C:\Program Files (x86)\Google\Cloud SDK\google-cloud-sdk\platform\gsutil\gslib\cloud_api_delegator.py", line 287, in UploadObject
2018-04-15T08:30:06.3808970Z gzip_encoded=gzip_encoded)
2018-04-15T08:30:06.3809064Z File "C:\Program Files (x86)\Google\Cloud SDK\google-cloud-sdk\platform\gsutil\gslib\gcs_json_api.py", line 1376, in UploadObject
2018-04-15T08:30:06.3809148Z gzip_encoded=gzip_encoded)
2018-04-15T08:30:06.3809245Z File "C:\Program Files (x86)\Google\Cloud SDK\google-cloud-sdk\platform\gsutil\gslib\gcs_json_api.py", line 1152, in _UploadObject
2018-04-15T08:30:06.3809332Z ValidateDstObjectMetadata(object_metadata)
2018-04-15T08:30:06.3809425Z File "C:\Program Files (x86)\Google\Cloud SDK\google-cloud-sdk\platform\gsutil\gslib\cloud_api_helper.py", line 40, in ValidateDstObjectMetadata
2018-04-15T08:30:06.3809534Z 'Object metadata supplied for destination object had no object name.')
2018-04-15T08:30:06.3809621Z ArgumentException: ArgumentException: Object metadata supplied for destination object had no object name.
2018-04-15T08:30:06.3809696Z
2018-04-15T08:30:06.3809769Z ArgumentException: Object metadata supplied for destination object had no object name.
Used version on Windows:
Google Cloud SDK 197.0.0
bq 2.0.31
core 2018.04.06
gsutil 4.30
The above error, and per our discussion at this issue tracker report, is specific to using gsutil tool on Windows, and due to using forward-slash ./ rather than using backslash .\ in both command line and Powershell. It could be related to the fact that Windows MS-DOS uses backslashes to separate directories in file paths C:\someDirectory\anotherDirectory, and not forward-slashes like other operating systems. Hence, the following command should work:
$gsutil rsync -d .\ gs://bucketname/folder

gsutil - no locks available

Has anyone seen this error from gsutil or know how to fix it? I get it when I try to run any gsutil command, but here is an example trying to use ls on a bucket in my google cloud project.
$ gsutil ls gs://BUCKET/FOLDER
Traceback (most recent call last):
File "/home/gmcinnes/bin/google-cloud-sdk/bin/bootstrapping/gsutil.py", line 68, in <module>
bootstrapping.PrerunChecks(can_be_gce=True)
File "/home/gmcinnes/bin/google-cloud-sdk/bin/bootstrapping/bootstrapping.py", line 279, in PrerunChecks
CheckCredOrExit(can_be_gce=can_be_gce)
File "/home/gmcinnes/bin/google-cloud-sdk/bin/bootstrapping/bootstrapping.py", line 167, in CheckCredOrExit
cred = c_store.Load()
File "/home/gmcinnes/bin/google-cloud-sdk/bin/bootstrapping/../../lib/googlecloudsdk/core/credentials/store.py", line 206, in Load
cred = store.get()
File "/home/gmcinnes/bin/google-cloud-sdk/bin/bootstrapping/../../lib/oauth2client/client.py", line 350, in get
self.acquire_lock()
File "/home/gmcinnes/bin/google-cloud-sdk/bin/bootstrapping/../../lib/oauth2client/multistore_file.py", line 222, in acquire_lock
self._multistore._lock()
File "/home/gmcinnes/bin/google-cloud-sdk/bin/bootstrapping/../../lib/oauth2client/multistore_file.py", line 281, in _lock
self._file.open_and_lock()
File "/home/gmcinnes/bin/google-cloud-sdk/bin/bootstrapping/../../lib/oauth2client/locked_file.py", line 370, in open_and_lock
self._opener.open_and_lock(timeout, delay)
File "/home/gmcinnes/bin/google-cloud-sdk/bin/bootstrapping/../../lib/oauth2client/locked_file.py", line 211, in open_and_lock
raise e
IOError: [Errno 37] No locks available
Thanks
Figured it out. The filesystem on that machine was full. I cleaned it up and it works now.
$ df
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/sda1 10079084 9678804 0 100% /

How to properly run gsutil from crontab?

This is my entry in /etc/crontab, CentOS 6.6:
0 0 */1 * * fredrik /home/fredrik/google-cloud-sdk/bin/gsutil -d -m rsync -r -C [src] [dst] &> [log]
And I'm getting this error: OSError: [Errno 13] Permission denied: '/.config'
The command runs fine if executed in the shell. I've noticed I cannot run 0 0 */1 * * fredrik gsutil ... without the full path to gsutil, so I'm assuming I'm missing something in the environment in which cron is running...?
Here's the full traceback:
Traceback (most recent call last):
File "/home/fredrik/google-cloud-sdk/bin/bootstrapping/gsutil.py", line 68, in <module>
bootstrapping.PrerunChecks(can_be_gce=True)
File "/home/fredrik/google-cloud-sdk/bin/bootstrapping/bootstrapping.py", line 279, in PrerunChecks
CheckCredOrExit(can_be_gce=can_be_gce)
File "/home/fredrik/google-cloud-sdk/bin/bootstrapping/bootstrapping.py", line 167, in CheckCredOrExit
cred = c_store.Load()
File "/home/fredrik/google-cloud-sdk/bin/bootstrapping/../../lib/googlecloudsdk/core/credentials/store.py", line 195, in Load
account = properties.VALUES.core.account.Get()
File "/home/fredrik/google-cloud-sdk/bin/bootstrapping/../../lib/googlecloudsdk/core/properties.py", line 393, in Get
return _GetProperty(self, _PropertiesFile.Load(), required)
File "/home/fredrik/google-cloud-sdk/bin/bootstrapping/../../lib/googlecloudsdk/core/properties.py", line 618, in _GetProperty
value = callback()
File "/home/fredrik/google-cloud-sdk/bin/bootstrapping/../../lib/googlecloudsdk/core/properties.py", line 286, in <lambda>
'account', callbacks=[lambda: c_gce.Metadata().DefaultAccount()])
File "/home/fredrik/google-cloud-sdk/bin/bootstrapping/../../lib/googlecloudsdk/core/credentials/gce.py", line 179, in Metadata
_metadata_lock.lock(function=_CreateMetadata, argument=None)
File "/usr/lib64/python2.6/mutex.py", line 44, in lock
function(argument)
File "/home/fredrik/google-cloud-sdk/bin/bootstrapping/../../lib/googlecloudsdk/core/credentials/gce.py", line 178, in _CreateMetadata
_metadata = _GCEMetadata()
File "/home/fredrik/google-cloud-sdk/bin/bootstrapping/../../lib/googlecloudsdk/core/credentials/gce.py", line 73, in __init__
_CacheIsOnGCE(self.connected)
File "/home/fredrik/google-cloud-sdk/bin/bootstrapping/../../lib/googlecloudsdk/core/credentials/gce.py", line 186, in _CacheIsOnGCE
config.Paths().GCECachePath()) as gcecache_file:
File "/home/fredrik/google-cloud-sdk/bin/bootstrapping/../../lib/googlecloudsdk/core/util/files.py", line 465, in OpenForWritingPrivate
MakeDir(full_parent_dir_path, mode=0700)
File "/home/fredrik/google-cloud-sdk/bin/bootstrapping/../../lib/googlecloudsdk/core/util/files.py", line 44, in MakeDir
os.makedirs(path, mode=mode)
File "/usr/lib64/python2.6/os.py", line 150, in makedirs
makedirs(head, mode)
File "/usr/lib64/python2.6/os.py", line 157, in makedirs
mkdir(name, mode)
OSError: [Errno 13] Permission denied: '/.config'
Thanks to Mike and jterrace for helping me getting this working. In the end, I had to revise these environment variables: PATH, HOME, BOTO_CONFIG (except for any other default ones).
PATH=/sbin:/bin:/usr/sbin:/usr/bin:/home/fredrik/google-cloud-sdk/bin
HOME=/home/fredrik
BOTO_CONFIG="/home/fredrik/.config/gcloud/legacy_credentials/[your-email-address]/.boto"
# Example of job definition:
# .---------------- minute (0 - 59)
# | .------------- hour (0 - 23)
# | | .---------- day of month (1 - 31)
# | | | .------- month (1 - 12) OR jan,feb,mar,apr ...
# | | | | .---- day of week (0 - 6) (Sunday=0 or 7) OR sun,mon,tue,wed,thu,fri,sat
# | | | | |
# * * * * * user-name command to be executed
0 0 */1 * * fredrik gsutil -d -m rsync -r -C /local-folder/ gs://my-bucket/my-folder/ > /logs/gsutil.log 2>&1
The > gsutil.log 2>&1 pipes both stdout and stderr to the same file. Also, it will overwrite the log file the next time gsutil runs. In order to make it append to the log file, use >> gsutil.log 2>&1. This should be safe on both Linux and OS X.
I'm noticing that the debug flag -d creates enormous log files on large data volumes, so I might opt out on that flag, personally.
You're probably getting a different boto config file when running from cron. Please try running the following both ways (as root, and then via cron), and see if you get different config file lists for the two cases:
gsutil -D ls 2>&1 | grep config_file_list
The reason this happens is that cron unsets most environment variables before running jobs, so you need to manually set the BOTO_CONFIG environment variable in your cron script before running gsutil, i.e.,:
BOTO_CONFIG="/root/.boto"
gsutil rsync ...
I believe you're getting this error because the HOME environment variable is not set when running under cron. Try setting HOME=/home/fredrik.
because cron is ran in a very limited environment, you need to source your .bash_profile to get your environment config.
* * * * * source ~/.bash_profile && your_cmd_here
For anyone trying to manage images with gsutil from PHP running Apache -
Made a new directory called apache-shared and chgrp/chown'd www-data (or whichever user your Apache runs on, run "top" to check). Copied the .boto file into the directory and ran the following without issue:
shell_exec('export BOTO_CONFIG=/apache-shared/.boto && export PATH=/sbin:/bin:/usr/sbin:/usr/bin:/home/user/google-cloud-sdk/bin && gsutil command image gs://bucket');

Doxyclean Error

I'm trying to run doxyclean but can't get it to work, any help would be appreciated...
I'm running from terminal :
./doxyclean.py --input=./xml/ --output=./clean/ --name="MyProject" --phone -v
I have my doxygen xml in the folder xml, in the same directory as doxyclean.py
The result is :
Checking arguments
Cleaning XML files:
Traceback (most recent call last):
File "./doxyclean.py", line 1220, in <module>
sys.exit(main())
File "./doxyclean.py", line 1171, in main
cleanXML(filePath, xmlOutputDirectory)
File "./doxyclean.py", line 93, in cleanXML
if not fileIsDocumented(filePath):
File "./doxyclean.py", line 62, in fileIsDocumented
originaldoc = minidom.parse(filePath)
File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/xml/dom/minidom.py", line 1918, in parse
return expatbuilder.parse(file)
File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/xml/dom/expatbuilder.py", line 924, in parse
result = builder.parseFile(fp)
File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/xml/dom/expatbuilder.py", line 207, in parseFile
parser.Parse(buffer, 0)
xml.parsers.expat.ExpatError: not well-formed (invalid token): line 17, column 155
thanks
did you try other ways, my suggestions
use different version of doxyclean
change the arguments to different style, maybe give full paths to the folders.
./doxyclean.py -i Users/xxx/doxyclean/xml/ -o ./clean/ -p
regenerate xml from doxygen
doxyclean works fine. I got this error too and I tried my suggestion 2 and it worked.