New to GCS (just got started with it today). Looks very promising.
Is there anyway to use multiple S3 accounts (or GCS) in a single boto file? I only see the option to assign keys to one S3 and one GCS account in a single file. I'd like to use multiple credentials.
We're like to copy from S3 to S3, or GCS to GCS, with each of those buckets using different keys.
You should be able to setup multiple profiles within your .boto file.
You could add something like:
[profile prod]
gs_access_key_id=....
gs_secret_access_key=....
[profile dev]
gs_access_key_id=....
gs_secret_access_key=....
And then from your code you can add a profile_name= parameter to the connection call:
con = boto.gs.connection(profile_name="dev")
You can definitely use multiple boto files, just make sure that the credentials in each of them are valid. Every time you need to switch between them, run the following command with the right path.
$ BOTO_CONFIG=/path/to_boto gsutil cp SOME_FILE gs://bucket
Example :
BOTO_CONFIG=/etc/boto.cfg gsutil -m cp text.txt gs://bucket
Additionally, you can have aliases for your different profiles. Just create an alias for each command and you are set !
Related
I've setup some Nearline buckets and enabled versioning and object lifecycle management. The use-case is to replace my current backup solution, Crashplan.
Using gsutil I can see the different versions of a file using a command like gsutil ls -al gs://backup/test.txt.
First, is there any way of finding files that don't have a live version (e.g. deleted) but still have a version attached?
Second, is there any easier way of managing versions? For instance if I delete a file from my PC, it will no longer have a live version in my bucket but will still have the older versions associated. Say, if I didn't know the file name would I just have to do a recursive ls on the entire bucket and sift through the output?
Would love a UI that supported versioning.
Thanks.
To check if the object currently has no life version use x-goog-if-generation-match header equal to 0, for example :
gsutil -h x-goog-if-generation-match:0 cp file.txt gs://bucket/file.txt
will fail (PreconditionException: 412 Precondition Failed) if file has a live version and will succeed if it has only archived versions.
In order to automatically synchronize your local folder and folder in the bucket (or the other way around) use gcloud rsync:
gcloud rsync -r -d ./test gs://bucket/test/
notice the trailing / in gs://bucket/test/, without it you will receive
CommandException: arg (gs://graham-dest/test) does not name a directory, bucket, or bucket subdir.
-r synchronize all the directories in ./test recursively to gs://bucket/test/`
-d will delete all files from gs://bucket/test/that are not found in./test`
Regarding UI, there already exists a future request. I don't know anything about third party applications however.
I have datewise folders in the form of root-dir/yyyy/mm/dd
under which there are so many files present.
I want to update the timestamp of all the files falling under certain date-range,
for example 2 weeks ie. 14 folders, so that these these files can be picked up by my file-Streaming Data Ingestion process.
What is the easiest way to achieve this?
Is there a way in UI console? or is it through gsutil?
please help
GCS objects are immutable, so the only way to "update" the timestamp would be to copy each object on top of itself, e.g., using:
gsutil cp gs://your-bucket/object1 gs://your-bucket/object1
(and looping over all objects you want to do this to).
This is a fast (metadata-only) operation, which will create a new generation of each object, with a current timestamp.
Note that if you have versioning enabled on the bucket doing this will create an extra version of each file you copy this way.
When you say "folders in the form of root-dir/yyyy/mm/dd", do you mean that you're copying those objects into your bucket with names like gs://my-bucket/root-dir/2016/12/25/christmas.jpg? If not, see Mike's answer; but if they are named with that pattern and you just want to rename them, you could use gsutil's mv command to rename every object with that prefix:
$ export BKT=my-bucket
$ gsutil ls gs://$BKT/**
gs://my-bucket/2015/12/31/newyears.jpg
gs://my-bucket/2016/01/15/file1.txt
gs://my-bucket/2016/01/15/some/file.txt
gs://my-bucket/2016/01/15/yet/another-file.txt
$ gsutil -m mv gs://$BKT/2016/01/15 gs://$BKT/2016/06/20
[...]
Operation completed over 3 objects/12.0 B.
# We can see that the prefixes changed from 2016/01/15 to 2016/06/20
$ gsutil ls gs://$BKT/**
gs://my-bucket/2015/12/31/newyears.jpg
gs://my-bucket/2016/06/20/file1.txt
gs://my-bucket/2016/06/20/some/file.txt
gs://my-bucket/2016/06/20/yet/another-file.txt
I have backup files in different directories in one drive. Files in those directories can be quite big up to 800GB or so. So I have a batch file with a set of scripts which upload/syncs files to S3.
See example below:
aws s3 sync R:\DB_Backups3\System s3://usa-daily/System/ --exclude "*" --include "*/*/Diff/*"
The upload time can vary but so far so good.
My question is, how do I edit the script or create a new one which checks in the s3 bucket that the files have been uploaded and ONLY if they have been uploaded then deleted them from the local drive, if not leave them on the drive?
(Ideally it would check each file)
I'm not familiar with aws s3, or aws cli command that can do that? Please let me know if I made myself clear or if you need more details.
Any help will be very appreciated.
Best would be to use mv with --recursive parameter for multiple files
When passed with the parameter --recursive, the following mv command recursively moves all files under a specified directory to a specified bucket and prefix while excluding some files by using an --exclude parameter. In this example, the directory myDir has the files test1.txt and test2.jpg:
aws s3 mv myDir s3://mybucket/ --recursive --exclude "*.jpg"
Output:
move: myDir/test1.txt to s3://mybucket2/test1.txt
Hope this helps.
As the answer by #ketan shows, Amazon aws client cannot do batch move.
You can use WinSCP put -delete command instead:
winscp.com /log=S3.log /ini=nul /command ^
"open s3://S3KEY:S3SECRET#s3.amazonaws.com/" ^
"put -delete C:\local\path\* /bucket/" ^
"exit"
You need to URL-encode special characters in the credentials. WinSCP GUI can generate an S3 script template, like the one above, for you.
Alternatively, since WinSCP 5.19, you can use -username and -password switches, which do not need any encoding:
"open s3://s3.amazonaws.com/ -username=S3KEY -password=S3SECRET" ^
(I'm the author of WinSCP)
Is there a way to remove files from GoogleCloudStorage, using the CLI by their creation date?
For example:
I would like to remove all files in a specific path, which their creation date is lower than 2016-12-01
There's no built-in way in the CLI to delete by date. There are a couple ways to accomplish something like this. One possibility is to use an object naming scheme that prefixes object names by their creation date. Then it is easy to remove them with wildcards, for example:
gsutil -m rm gs://your-bucket/2016-12-01/*
Another approach would be to write a short parser for gsutil ls -L gs://your-bucket that filters object names by their creation date, then call gsutil -m rm -I with the resulting object names.
If you just want to automatically delete objects older than a certain age, then there is a much easier way than using the CLI: you can configure an Object Lifecycle Management policy on your bucket.
So I run the following:
gsutil -m cp -R file.png gs://bucket/file.png
And I get the following error message:
Copying file://file.png [Content-Type=application/pdf]...
Uploading file.png: 42.59 KiB/42.59 KiB
AccessDeniedException: 401 Login Required
CommandException: 1 files/objects could not be transferred.
I'm not sure what the problem is since I ran config and I can see all my buckets. Does anyone know what I need to do?
Note: I do not have gcloud, I just installed gsutil and ran the config.
Login to Google Cloud is needed for accessing any Cloud service. You need to use below command which will guide you through login steps like typing verification code you generate by opening browser link given in console.
gcloud auth login
I was getting a similar response, and was able to solve this problem by looking at the read permissions on the .boto file. In my case, I was using a service account and the .boto file that was created by
gsutil config -e
only had read permissions set for user. Since it was being read by a service running as a different user, it wasn't able to read the file and yielding a 401 Login Required error. I fixed it by adding read permissions for the service's group.
In the least sophisticated case, you could fix it by giving any user read permission with
chmod a+r .boto
A more detailed explanation for troubleshooting
To get more information, run the same command with a -D flag, like:
gsutil -m -D cp ....
In the debug output, look at:
Command being run: /path/to/gsutil
config_file_list: /path/to/boto/config
Create your login credentials using the executable at /path/to/gsutil, not gcloud auth or any other gsutil executable on the machine, using:
/path/to/gsutil config
For a service account, use:
/path/to/gsutil config -e
These should create a .boto config file in your home directory, $HOME/.boto. If you are running the gsutil command this file should be referenced in the config_file_list variable in the debug output. If not, see below to change it.
Running gsutil under a service account or as another user
If you are running as another user, and need to reference a newly-created config file, set the environment variable BOTO_CONFIG (don't forget to export it):
BOTO_CONFIG=/path/to/$HOME/.boto
export BOTO_CONFIG
By setting this variable, when you execute gsutil, it will reference the config file you have placed in BOTO_CONFIG. You can confirm that you are referencing the correct config file by looking at the config_file_list variable in the gsutil -D command's output.
make sure the referenced .boto file is readable by the user who is executing the gsutil command
Running the /path/to/gsutil with the BOTO_CONFIG variable set allowed me to execute gsutil as another user, referencing an arbitrary BOTO_CONFIG file that was set up with a service-account's credentials.
To set up the service account:
https://console.cloud.google.com/permissions/serviceaccounts
The key file from the service account set-up process needs to be downloaded, and the path to it is requested during the gsutil config -e step.
This may be an issue with how gsutil/boto handles the OS path separators on Windows, as referenced here. This should eventually get merged into the sdk tools, but until then the following should work:
Go to
google-cloud-sdk\platform\gsutil\third_party\boto\boto\pyami\config.py
and replace the line:
for path in os.environ['BOTO_PATH'].split(':'):
with:
for path in os.environ['BOTO_PATH'].split(os.path.pathsep):
Next, go to
google-cloud-sdk\bin\bootstrapping\gsutil.py
replace the lines that use ':'
if boto_config:
boto_path = ':'.join([boto_config, gsutil_path])
elif boto_path:
# this is ':' for windows as well, hardcoded into the boto source.
boto_path = ':'.join([boto_path, gsutil_path])
else:
path_parts = ['/etc/boto.cfg',
os.path.expanduser(os.path.join('~', '.boto')),
gsutil_path]
boto_path = ':'.join(path_parts)
with
if boto_config:
boto_path = os.path.pathsep.join([boto_config, gsutil_path])
elif boto_path:
# this is ':' for windows as well, hardcoded into the boto source.
boto_path = os.path.pathsep.join([boto_path, gsutil_path])
else:
path_parts = ['/etc/boto.cfg',
os.path.expanduser(os.path.join('~', '.boto')),
gsutil_path]
boto_path = os.path.pathsep.join(path_parts)
Restart cmd and now the error should go away.