Copying directly from S3 to Google Cloud Storage - google-cloud-storage

I can migrate data from Amazon AWS S3 to Azure using AWS SDK for Java and Azure SDk for Java. Now I want to do migrate data from Amazon AWS S3 to Google Cloud storage using Java.

The gsutil command-line tool supports S3. After you've configured gsutil, you'll see this in your ~/.boto file:
# To add aws credentials ("s3://" URIs), edit and uncomment the
# following two lines:
#aws_access_key_id =
#aws_secret_access_key =
Fill in the aws_access_key_id and aws_secret_access_key settings with your S3 credentials and uncomment the variables.
Once that's set up, copying from S3 to GCS is as easy as:
gsutil cp -R s3://bucketname gs://bucketname
If you have a lot of objects, run with the -m flag to perform the copy in parallel with multiple threads:
gsutil -m cp -R s3://bucketname gs://bucketname

Use the Google Cloud Storage transfer tool.

The answer suggested by jterrace (aws key and secret in .boto file) is correct and worked for me for many regions but not for some regions that need only AWS Signature Version 4. For instance while connecting to 'Mumbai' region I got this error:
BadRequestException: 400 InvalidRequest
The authorization mechanism you have provided is not supported. Please use AWS4-HMAC-SHA256
In order to overcome this problem (make gsutil use AWS Signature v4) I had to add following additional lines to ~/.boto file. These lines create a new section [s3] in the config file:
[s3]
host = s3.ap-south-1.amazonaws.com
use-sigv4 = True
Reference:
Interoperability support for AWS Signature Version 4
Gsutil cannot copy to s3 due to authentication

Create a new .boto file
[Credentials]
aws_access_key_id = ACCESS_KEY_ID
aws_secret_access_key = SECRET_ACCESS_KEY
and this command
BOTO_CONFIG=.boto gsutil -m cp s3://bucket-name/filename gs://bucket-name
or this
BOTO_CONFIG=.boto gsutil -m cp gs://bucket-name/filename s3://bucket-name

AWS_ACCESS_KEY_ID=XXXXXXXX AWS_SECRET_ACCESS_KEY=YYYYYYYY gsutil -m cp s3://bucket-name/filename gs://bucket-name
This approach allows to copy data from s3 to gcs without the need of a a boto file. There can be situations where storing the credentials file in the running virtual machine is not recommended. With this approach we can integrate the gcp secret manager and generate the above command during runtime and execute, preventing the need to store the credentials permanently as a file stored in the machine.

Related

Difference between aws mv & aws cp object

I have two commands in aws cli
aws s3 cp test1 s3://buckettest1
&
aws s3 mv test_data s3://buckettest1
Seems that both do HTTP PUT request to s3 server & adds object to my bucket. What differs?
Just like cp and mv commands differ in any other platform.
aws cp will copy a local file or S3 object to another location locally or in S3.
aws mv will move a local file or S3 object to another location locally or in S3, i.e. it will delete it from source and put it on the target path.

Running Script In AWS?

I don't know this is a relevant question or not? I have one csv file and 1 shape file in my own drive and I used to run one script in cmd and combined these two files and stored in a pgsql using pgfuttur. I want to do the same thing in aws. If I kept these two files on a bucket is it possible to do the same with the below command I used in cmd?
shp2pgsql -I "<Shape file directory>" <Tablename> | psql -U <username> -d <DatabaseName> Example : shp2pgsql -I "C:\Test\filep" public.geo | psql -U postgres -d gisDB
if yes please help me to get this. If no please let me know the reason.. [Please note that I am new to AWS]
You can do it in two ways:
You plan to do it only once or few times: Download the files locally using AWS CLI: aws s3 cp or aws s3 sync and then pass those files as input
You will be accessing multiple files: Use another AWS service to expose your S3 objects as files. Check AWS Storage Gateway and choose AWS Storage Gateway for Files. Once configured, you can refer to the S3 objects as files.

How can we run gcloud/gsutil/bq command for different accounts in parallel in one server?

I have installed gcloud/bq/gsutil command line tool in one linux server.
And we have several accounts configured in this server.
**gcloud config configurations list**
NAME IS_ACTIVE ACCOUNT PROJECT DEFAULT_ZONE DEFAULT_REGION
gaa True a#xxx.com a
gab False b#xxx.com b
Now I have problem to both run gaa/gab in this server at same time. Because they have different access control on BigQuery and Cloud Stroage.
I will use below commands (bq and gsutil commands):
Set up account
Gcloud config set account a#xxx.com
Copy data from bigquery to Cloud
bq extract --compression=GZIP --destination_format=NEWLINE_DELIMITED_JSON 'nl:82421.ga_sessions_20161219' gs://ga-data-export/82421/82421_ga_sessions_20161219_*.json.gz
Download data from Cloud to local system
gsutil -m cp gs://ga-data-export/82421/82421_ga_sessions_20161219*gz
If only run one account, it is not a problem.
But there are several accounts need to run on one server at same time, I have no idea how to deal with this case.
Per the gcloud documentation on configurations, you can switch your active configuration via the --configuration flag for any gcloud command. However, gsutil does not have such a flag; you must set the environment variable CLOUDSDK_ACTIVE_CONFIG_NAME:
$ # Shell 1
$ export CLOUDSDK_ACTIVE_CONFIG_NAME=gaa
$ gcloud # ...
$ # Shell 2
$ export CLOUDSDK_ACTIVE_CONFIG_NAME=gab
$ gsutil # ...

How to download a file from AWS S3 with version in Command line?

I have created an AWS S3 bucket with versioning and created a file with multiple versions. Is there any way I could download a file with a particular version using command line or API?
To list out a particular file version use below command and you will get ObjectVersionId
aws s3api list-object-versions --bucket bucketname --prefix folder/test/default.json --output json
ObjectVersionId:"Cu9ksraX_OOpbAtobdlYuNPCoJFY4N3S"
aws s3api get-object --bucket bucketname --key folder/test/default.json D:/verions/default.json --version-id Cu9ksraX_OOpbAtobdlYuNPCoJFY4N3S
Via command line:
aws s3api get-object --version-id ...
To first get a list of the available versions:
aws s3api list-object-versions ...
There are similar methods in the respective AWS SDKs.
AWS has several high-level s3 commands; aws s3 cp, aws s3 ls, aws s3 mv, aws s3 rm, and sync. You can use the aws s3 cp like you would a copy source to destination.
aws s3 cp s3://BUCKET-NAME/PATH/FILE.EXTENTION (LOCAL_PATH/)FILE.EXTENTION

Is it possible to automate Gsutil login?

Is it possible to automate gsutil based file upload to google cloud store so that the user intervention is not required for login?
My usecase is to have a jenkins job which polls a SCM location for changes to a set of files. If it detects any changes it will upload all files to a specific Google Cloud Store bucket.
After you configure your credentials once gsutil requires no further intervention. I suspect that you ran gsutil configure as user X but Jenkins runs as user Y. As a result, ~jenkins/.boto does not exist. If you place the .boto file in the right location you should be all set.
Another alternative is to use multiple .boto files and then tell gsutil which one to use with the BOTO_CONFIG environment variable:
gsutil config # complete oauth flow
cp ~/.boto /path/to/existing.boto
# detect that we need to upload
BOTO_CONFIG=/path/to/existing.boto gsutil -m cp files gs://bucket
I frequently use this pattern to use gsutil with multiple accounts:
gsutil config # complete oauth flow for user A
mv ~/.boto user-a.boto
gsutil config # complete oauth flow for user B
mv ~/.boto user-b.boto
BOTO_CONFIG=user-a.boto gsutil cp a-file gs://a-bucket
BOTO_CONFIG=user-b.boto gsutil cp b-file gs//b-bucket