Using Google Cloud Storage with rsync

Using Google Cloud Storage with rsync - google-cloud-storage

I am new to Google Cloud. We have historically used AWS for online backups -- essentially, our local servers ran rsync to an EC2 instance at AWS and it all worked fine. I'm now trying to migrate from AWS to Google and of course the setup is pretty different. With gsutil rsync it looked to me as though I wouldn't need to spin up a Compute Engine at all, I could just push stuff straight into gs://aws_mnt bucket
Having installed the SDK on our AWS instance I was able to push all our backups to the gs://aws_mnt bucket very easily using gsutil cp -n
But going forward I want to run a cron job on the local server which uses rsync rather than cp for obvious reasons.
I have two issues:
Despite reading the appropriate documentation (here) I am so stupid I can't figure out how to permanently authorise the local server so I don't have to do gcloud auth login and get a code from a browser each session, as for a cron job that's not really going to work.
When I try to use gsutil rsync from the local server to the gs://aws_mnt bucket that was pre-populated from AWS, I get an error:
gsutil rsync /mnt/archive/backups gs://aws_mnt/kahless
Building synchronization state...
Skipping cloud sub-directory placeholder object gs://aws_mnt/kahless/
Starting synchronization
There is some discussion of this error on github and I've produced detailed output from
gsutil -D -m rsync /mnt/archive/backups gs://aws_mnt/kahless
But since this is a brand-new install of the SDK I can't imagine the thread hasn't already been dealt with so I must be doing something wrong?
Rus

In response to your questions:
Once you have configured credentials using gcloud auth, the 'gcloud auth login' command will cause them to be selected until you login to a different credential... and that state will persist and not require you to go through the browser session again unless/until you revoke those credentials. Note: If you're thinking of running commands from an unattended script (e.g., via cron) please consider using service account credentials. For more details please see https://developers.google.com/cloud/sdk/gcloud/#gcloud.auth
That "skipping..." message is not an error - it's just informing you that gsutil is skipping trying to download the placeholder object, because such objects aren't needed in (and would interfere with) directories in the local file system. I'll update the message in the next version of gsutil to make this more clear. So, what you saw was that the second run of gsutil rsync found nothing to do after comparing the source and destination, and completed normally.

Related

Connect containerized self-hosted agent with Azure DevOps

I followed the instructions in the ms docs guide, and the agent started without any issues. However it never showed up in my agent pool. I tried a different version of the start.sh script found on github and it connected immediately. Is there anything else I can do to try and troubleshoot this? Logs from the non-working agent below
❯ kubectl logs azpagent-55864668dc-zgdrn
1. Determining matching Azure Pipelines agent...
2. Downloading and installing Azure Pipelines agent...
3. Configuring Azure Pipelines agent...
>> End User License Agreements:
Building sources from a TFVC repository requires accepting the Team Explorer Everywhere End User License Agreement. This step is not required for building sources from Git repositories.
A copy of the Team Explorer Everywhere license agreement can be found at:
/azp/agent/externals/tee/license.html
>> Connect:
Connecting to server ...
>> Register Agent:
Scanning for tool capabilities.
Connecting to the server.
Successfully replaced the agent
Testing agent connection.
2019-08-03 04:22:56Z: Settings Saved.
4. Running Azure Pipelines agent...
Starting Agent listener interactively
Started listener process
Started running service
Scanning for tool capabilities.
Connecting to the server.
2019-08-03 04:23:08Z: Agent connect error: The signature is not valid.. Retrying until reconnected.
Not really sure what else to try -- has anyone else seen this issue, or had success with the linux agent guide?

Looking at the error message:
The signature is not valid.
There might be a problem with the provided PAT. I'd suggest generating a new PAT, as described by this guide, and trying again.
Let me know if this has helped.

Update
According to the error info The signature is not valid..
Are you using and building sources from a TFVC repository which requires accepting the Team Explorer Everywhere End User License Agreement. This step is not required for building sources from Git repositories.
If so have a try with building from Git repo.
The doc you referred a different version of the start.sh script which is deprecated. It's for an old build agent.
According to this and related error The signature is not valid.. Retrying until reconnected. Few things I would suggest:
You may on a pretty old agent version , try the latest version agent
https://github.com/microsoft/azure-pipelines-agent/releases
You need to restart the agent process in order to make those
environment take affect.
Check with your IT department, make sure the network between your
build machine and tfs server/Azure DevOps Service is reliable, see
whether there is any change in your network.
Also make sure your build machine/VM not run out of resource.

In case this or a similar issue occurs for anyone else, the suggestion from #juliobbv was very helpful. If you comment out the last line of the script, and replace it with
./bin/Agent.Listener run & wait $!
you can get a clearer view of any error messages.
In my case, I didn't realize that AGENT_NAME and POOL were no longer the same variable, and the original error message didn't indicate that the issue was my lack of permissions to the default pool.
My final changes to the script are below -- I defaulted to the agent name using the hostname, and maintained the previous behavior of using a custom pool
./config.sh --unattended \
--agent "$(hostname)" \
--url "$AZP_URL" \
--auth PAT \
--token $(cat "$AZP_TOKEN_FILE") \
--pool "${AZP_POOL:-Default}" \
--work "${AZP_WORK:-_work}" \
--replace \
--acceptTeeEula & wait $!

phpPgAdmin on OpenShift is not accessible from application URL

Installed this (https://github.com/BanzaiMan/openshift-origin-cartridge-phppgadmin) to my Tiny Tiny RSS application on OpenShift to manage my database.* However, after the installation and a few restarts, .../phppgadmin/ URL gives me 404 error. Any ideas? Could it be the github cartridge is using old environmental variables? Thanks!
*The reason I want to install phppgadmin in the first place is to vacuum my ever-expanding database on Tiny Tiny RSS application. vacuumdb and vacuumdb -f -a only claim ~50mb and app-tidy does ~100mb, as opposed to ~600mb previously. So, I need to find another solution, like phppgadmin, to address my quota limitations.

Instead of using phppgadmin, it would be a much better idea to download pgadmin onto your computer and then use the rhc port-forward command to connect to your database and do your maintenance. That cartridge is pretty old and likely will not get updated anytime soon (or ever).

Automating gsutil commands

I'm trying to automate some gsutils commands, but struggling to see where the authentication files are kept and how to re-use (if thats what happens).
I've gone through the gcloud init process in bash...
curl https://sdk.cloud.google.com | bash
gcloud init
All works well when I run
'gsutil ls'
Now I'm trying to automate the process, so this would work on a new server adding into a crontab on it (rather than creating a new config each time).
I saw a mention of setting env variable GOOGLE_APPLICATION_CREDENTIALS, so I copied my credentials from web login to a file and tried it, eg trying as a different user to test
export GOOGLE_APPLICATION_CREDENTIALS=/home/user/.gsutil/mycreds
and then gsutil ls, but fails.
So I assume I've got the whole credentials thing a bit wrong. I'm assuming there is a file somewhere that was originally created by gcloud which I could use, but I can't see it anywhere ?
I've looked at the answer here but doesn't seem up to date now, as per last comment.
Edit: I have followed Zacharys steps, gcloud auth activate-service-account --key-file=myfilelocation
However, with 'gsutil ls' I now get..
You are attempting to perform an operation that requires a project id, with none configured. Please re-run gsutil config and make sure to follow the instructions for finding and entering your default project id.
So my next question would be, where is it looking for the project id ? If I run gsutil config, it seems to create a new set of auth which then creates another error, so have removed that.

You should be able to do this without diving in too deep to the implementation of authentication for gsutil.
If you're using standalone gsutil (if you installed via this method), the instructions in the linked question are still valid (as Travis points out).
If you'd like to continue using the gsutil supplied via the Cloud SDK, you should use service accounts. Service accounts are the preferred method of authenticating on headless machines or in non-interactive contexts.
Your flow would look something like the following:
Create a service account via the Google Cloud Developers Console.
On the remote machine, install the Cloud SDK and gsutil. If you're not installing interactively, it's better to skip the curl ... | bash method. Instead, download this install archive, extract it, and run the install.sh script. This script has options (visible with --help); if you specify choices to all of these options, it won't prompt you.
Copy the service account to the remote machine. Run gcloud auth activate-service-account --key-file=/path/to/service-account.json.
Run gsutil. You should be appropriately authenticated.

You have to set default project and user in gsutil. Run the following command:
gcloud init
Choose 1. It shows you different users; select the user and then select the project.

I was trying to create a bucket with project id as name:
$ gsutil mb -l eu gs://PROJECT-ID
Creating gs://root****/...
Error: You are attempting to perform an operation that requires a project id, with none configured. Please re-run gsutil config and make sure to follow the instructions for finding and entering your default project id.
Steps that resolved for me:
gcloud auth login
gcloud config set project <PROJECT-ID>
gsutil mb -l eu gs://<PROJECT-ID>
Creating gs://root***/...
The error is gone out of the way and it works as expected.

Google Cloud Storage - GSUtil Update fails because file used by another process

We use an ETL process to pull data from Google Cloud Storage, but annoyingly it hangs everytime Google releases udpates to GSUtil, because it sits at a prompt asking if you want to update the library. Fine if you are doing this manually, but not cool when it's being run in an automated SSIS package, as jobs don't finish for days and you keep wasting time with the same stupid cause.
I thought I was going to be cleaver, and add "python gsutil update -n" to the top of the bash script I'm automating the building/execution of in my SSIS Package in the hope to curb this problem, but when I run this command from the prompt in either Windows Server 2008r2 or Windows 7 I get the following:
C:\gsutil>python gsutil update -f -n
Copying gs://pub/gsutil.tar.gz...
OSError: The process cannot access the file because it is being used by another process.
Any help?
P.S. - Also, Google engineers... can you PLEASE remove these prompts? for all of us using these tools in automated processes? I have other things to work on, instead of constantly going back to things like this every few days/weeks.

What version of gsutil are you running?
Also, to be clear: Are you talking about the fact that gsutil checks for available software updates periodically, and if it finds them it then prompts you whether you want to update? Or are you talking about the fact that the gsutil update command asks if you want to perform the update?
If the former, gsutil shouldn't be performing this check/prompting if you are running gsutil from a script not connected to at TTY. If that's not working correctly we'd like to know.
And also, if that's the problem you're having, you can completely disable automated software update checks by setting software_update_check_period=0 in the [GSUtil] section of your .boto config file.

gsutil authentication code failure

I need to download files from GCS to local machine. I tried gsutil config and download in my machine. In my win 8 64 bit, all worked fine. However when I try to setup the same in another dedicated machine which is win vista 23 bit, entering authentication code just shows Failure.
Python gsutil.py config -b pops up the browser with request url, I got the authentication code in browser...pasting that gives Failure.
I am new to Python and could not trace back the problem. Does gsutil have any limitation on number of machines we can configure for same login/project?
Appreciate any help in debugging this.
Also, is it possible to move GCS data into Google SQL Cloud. I am assuming running a script on Google App engine that reads from GCS, parse the data and insert to Google SQL cloud is possible. Are their any documentations/tools to do this?
Thanks
Dhurka

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse