Restore Entire Bucket - google-cloud-storage

I have a few buckets on google cloud storage and I'm guessing my lifecycle file was not setup correctly because all of my buckets are now empty.
{
"lifecycle": {
"rule": [
{
"action": {"type": "Delete"},
"condition": {
"age": 60,
"isLive": false
}
},
{
"action": {"type": "Delete"},
"condition": {
"numNewerVersions": 3
}
}
]
}
}
Since I have versioning turned on and I can see the files using ls -la on the bucket is there any way to restore ALL of the files and how can I keep them from being deleted again?

Given the lifecycle configuration above, your live objects shouldn't have been deleted, at least not by the GCS lifecycle management process.
Regardless, the fastest way to restore your objects that I can think of is following gsutil's instructions for copying versioned buckets:
$ gsutil mb gs://mynewbucket
# This step is only necessary if you want to keep all archived versions.
$ gsutil versioning set on gs://mynewbucket
$ gsutil cp -r -A gs://oldbucket/* gs://mynewbucket
This will copy all of the archived versions of each object, in order of oldest to newest, from the old bucket to the new bucket. If you don't enable versioning in the new bucket, then you'll only end up with one copy of each object in the end (each copied version of an object will overwrite the previous version of that object). Additionally, for any object that may have had no live version, the most recent archived version will become the live version of it in the new bucket.

Related

AWS CDK asset path is incorrect

On September 6, I ran a build using CodePipeline. It generates a CloudFormation template for a project's stack using CDK. The stack has assets (a Lambda Layer), and the assets are correctly placed in the cdk.out folder. This can be seen in the CloudFormation template:
"Metadata": {
"aws:cdk:path": "MyStack/MyLayer/Resource",
"aws:asset:path": "asset.ccb8fd8b4259a8f517879d7aaa083679461d02b9d60bfd12725857d23567b70f",
"aws:asset:property": "Content"
}
Starting yesterday, builds were failing with "Uploaded file must be a non-empty zip". When I investigated further, I noticed that the template was no longer correct. It has the asset path set to the source code of the Lambda instead:
"Metadata": {
"aws:cdk:path": "MyStack/MyLayer/Resource",
"aws:asset:path": "/codebuild/output/src216693626/src/src/lambdas/layers",
"aws:asset:property": "Content"
}
When I build, I've added additional commands to the buildspec file which shows that the assets.abcdef folder has the layer and its dependencies, while the src folder does not. Yet the template is now different.
No code was changed in this time period, and I've tried both CDK version 1.105.0 and 1.119.0.
This code declares the Layer:
new lambdapython.PythonLayerVersion(this.stack, 'MyLayer', {
entry: path.join(__dirname, '../../src/lambdas/layers'),
description: 'Common utilities for the Lambdas',
compatibleRuntimes: [lambda.Runtime.PYTHON_3_8],
layerVersionName: `${Aws.STACK_NAME}Utils`,
});
Is there a known way for me to force the stack to use the assets in the cdk.out folder? Has something changed in the last couple of days with respect to how CDK generates the template's asset path?
It turns out that I had added a cdk ls to print out additional debugging information while troubleshooting another problem. That command re-synthesized the stack, but with the incorrect asset path.
build: {
commands: [
'cd ' + config.cdkDir,
'cdk synth',
'cdk ls --long'
]
}
The solution was to delete the cdk ls --long from the buildspec definition.

Why using Google Cloud Drive Rest API file.list can not get all the files?

I am using the following CURL command to retrieve all my google drive files, however, it only list a very limited part of the whole bunch of files. Why?
curl -H "Authorization: Bearer ya29.hereshouldbethemaskedaccesstokenvalue" https://www.googleapis.com/drive/v3/files
result
{
"kind": "drive#fileList",
"incompleteSearch": false,
"files": [
{
"kind": "drive#file",
id": "2fileidxxxxxxxx",
"name": "testnum",
"mimeType": "application/vnd.google-apps.folder"
},
{
"kind": "drive#file",
"id": "1fileidxxxxxxx",
"name": "test2.txt",
...
}
token scope includes
https://www.googleapis.com/auth/drive.file
https://www.googleapis.com/auth/drive.appdata
Using the Android SDK also facing the same issue.
Any help would be appreciated.
Results from files.list are paginated -- your response should include a "nextPageToken" field, and you'll have to make another call for the next page of results. See documentation here about the files list call. You may want to use one of the client libraries to make this call (see the examples at the bottom of the page)
I have the same problem when try to get list of files in Google Drive folder. This folder has more than 5000 files, but API return only two of them. The problem is -- when files in folder shared with anyone with a link, in fact it isn't shared with you until you open it. Owner of this folder must specify you as viewer.

Heroku Review Apps: copy DB to review app

Trying to fully automate Heroku's Review Apps (beta) for an app. Heroku wants us to use db/seeds.rb to seed the recently spun up instance's DB.
We don't have a db/seeds.rb with this app. We'd like to set up a script to copy the existing DB from the current parent (staging) and use that as the DB for the new app under review.
This I can do manually:
heroku pg:copy myapp::DATABASE_URL DATABASE_URL --app myapp-pr-1384 --confirm myapp-pr-1384
But I can't get figure out how to get the app name that Heroku creates into the postdeploy script.
Anyone tried this and know how it might be automated?
I ran into this same issue and here is how I solved it.
Set up the database url you want to copy from as an environment variable on the base app for the pipeline. In my case this is STAGING_DATABASE_URL. The url format is postgresql://username:password#host:port/db_name.
In your app.json file make sure to copy that variable over.
In your app.json provision a new database which will set the DATABASE_URL environment variable.
Use the following script to copy over the database pg_dump $STAGING_DATABASE_URL | psql $DATABASE_URL
Here is my app.json file for reference:
{
"name": "app-name",
"scripts": {
"postdeploy": "pg_dump $STAGING_DATABASE_URL | psql $DATABASE_URL && bundle exec rake db:migrate"
},
"env": {
"STAGING_DATABASE_URL": {
"required": true
},
"HEROKU_APP_NAME": {
"required": true
}
},
"formation": {
"web": {
"quantity": 1,
"size": "hobby"
},
"resque": {
"quantity": 1,
"size": "hobby"
},
"scheduler": {
"quantity": 1,
"size": "hobby"
}
},
"addons": [
"heroku-postgresql:hobby-basic",
"papertrail",
"rediscloud"
],
"buildpacks": [
{
"url": "heroku/ruby"
}
]
}
An alternative is to share the database between review apps. You can inherit DATABASE_URL in your app.json file.
PS: This is enough for my case which is a small team, keep in mind that maybe is not enough for yours. And, I keep my production and test (or staging, or dev, whatever you called it) data separated.
Alternatively:
Another solution using pg_restore, thanks to
https://gist.github.com/Kalagan/1adf39ffa15ae7a125d02e86ede04b6f
{
"scripts": {
"postdeploy": "pg_dump -Fc $DATABASE_URL_TO_COPY | pg_restore --clean --no-owner -n public -d $DATABASE_URL && bundle exec rails db:migrate"
}
}
I ran into problem after problem trying to get this to work. This postdeploy script finally worked for me:
pg_dump -cOx $STAGING_DATABASE_URL | psql $DATABASE_URL && bundle exec rails db:migrate
I see && bundle exec rails db:migrate as part of the postdeploy step in a lot of these responses.
Should that actually just be bundle exec rails db:migrate in the release section of app.json?

Set Google Storage Bucket's default cache control

Is there any way to set Bucket's default cache control (trying to override the public, max-age=3600 in bucket level every time creating a new object)
Similar to defacl but set the cache control
If someone is still looking for an answer, one needs to set the metadata while adding the blob.
For those who want to update the metadata for all existing objects in the bucket, you can use setmeta from gsutil - https://cloud.google.com/storage/docs/gsutil/commands/setmeta
You just need to do the following :
gsutil setmeta -r -h "Cache-control:public, max-age=12345" gs://bucket_name
Using gsutil
-h: Allows you to specify certain HTTP headers
-r: Recursive
-m: To performing a sequence of gsutil operations that may run significantly faster.
gsutil -m setmeta -r -h "Cache-control:public, max-age=259200" gs://bucket-name
It is possible to write a Google Cloud Storage Trigger.
This function sets the Cache-Control metadata field for every new object in a bucket:
from google.cloud import storage
CACHE_CONTROL = "private"
def set_cache_control_private(data, context):
"""Background Cloud Function to be triggered by Cloud Storage.
This function changes Cache-Control meta data.
Args:
data (dict): The Cloud Functions event payload.
context (google.cloud.functions.Context): Metadata of triggering event.
Returns:
None; the output is written to Stackdriver Logging
"""
print('Setting Cache-Control to {} for: gs://{}/{}'.format(
CACHE_CONTROL, data['bucket'], data['name']))
storage_client = storage.Client()
bucket = storage_client.get_bucket(data['bucket'])
blob = bucket.get_blob(data['name'])
blob.cache_control = CACHE_CONTROL
blob.patch()
You also need a requirements.txt file for the storage import in the same directory. Inside the requirements there is the google-cloud-storage package:
google-cloud-storage==1.10.0
You have to deploy the function to a specific bucket:
gcloud beta functions deploy set_cache_control_private \
--runtime python37 \
--trigger-resource gs://<your_bucket_name> \
--trigger-event google.storage.object.finalize
For debugging purpose you can retrieve logs with gcloud command as well:
gcloud functions logs read --limit 50
I know that this is quite an old question and you're after a default action (which I'm not sure exists), but the below worked for me on a recent PHP project after much frustration:
$object = $bucket->upload($tempFile, [
'predefinedAcl' => "PUBLICREAD",
'name' => $destination,
'metadata' => [
'cacheControl' => 'Cache-Control: private, max-age=0, no-transform',
]
]);
Same can be applied in node:
const storage = new Storage();
const bucket = storage.bucket(BUCKET_NAME);
const blob = bucket.file(FILE_NAME);
const uploadProgress = new Promise((resolve, reject) => {
const blobStream = blob.createWriteStream();
blobStream.on('error', err => {
reject(err);
throw new Error(err);
});
blobStream.on('finish', () => {
resolve();
});
blobStream.end(file.buffer);
});
await uploadProgress;
if (isPublic) {
await blob.makePublic();
}
blob.setMetadata({ cacheControl: 'public, max-age=31536000' });
There is no way to specify a default cache control. It must be set when creating the object.
If you're using a python app, you can use the option "default_expiration" in your app.yaml to set a global default value for the Cache-Control header: https://cloud.google.com/appengine/docs/standard/python/config/appref
For example:
runtime: python27
api_version: 1
threadsafe: yes
default_expiration: "30s"

JSDoc: Lookup tutorials from different directories

Is there a way to ask JSDoc (either in the command line or through grunt-jsdoc plugin) to lookup tutorials from different directories ?
As per the documentation, -u allows to specify the Directory in which JSDoc should search for tutorials. (it says the Directory instead of Directories).
I tried the following with no luck:
specify different strings separated by space or comma
specify one string with shell/ant regular expression
As suggested by #Vasil Vanchuk, a solution would be creating links to all tutorial files within a single directory. As such, JSDoc3 will be happy and it will proceed with the generation of all tutorials.
Creating/maintaining links manually would be a tedious task. Hence, and for people using grunt, the grunt-contrib-symlink come in handy. Using this plugin, the solution is reduced to a config task.
My Gruntfile.js looks like the following:
clean:['tmp', 'doc'],
symlink: {
options: {
overwrite: false,
},
tutorials: {
files: [{
cwd: '../module1/src/main/js/tut',
dest: 'tmp/tutorial-generation-workspace',
expand: true,
src: ['*'],
}, {
cwd: '../module2/src/main/js/tut',
dest: 'tmp/tutorial-generation-workspace',
expand: true,
src: ['*'],
}]
}
},
jsdoc: {
all: {
src: [
'../module1/src/main/js/**/*.js',
'../module2/src/main/js/**/*.js',
'./README.md',
],
options: {
destination: 'doc',
tutorials: 'tmp/tutorial-generation-workspace',
configure : "jsdocconf.json",
template: 'node_modules/grunt-jsdoc/node_modules/ink-docstrap/template',
},
}
},
grunt.loadNpmTasks('grunt-contrib-symlink');
grunt.loadNpmTasks('grunt-jsdoc');
grunt.registerTask('build', ['clean', 'symlink', 'jsdoc']);
grunt.registerTask('default', ['build']);
Integrating new module is translated by updating symlink and jsdoc tasks.
You could just copy the files instead of linking to a bunch of directories.
E.g. create a dir in your project for documentation in which you'll copy over all relevant tutorials from where ever.
Then, in your npm scripts you can have something like this:
"copy:curry": "cp node_modules/#justinc/jsdocs/tutorials/curry.md doc/tutorials",
"predocs": "npm run copy:curry",
The docs script (not shown) runs jsdoc. predocs runs automatically before docs and, in this case copies over a tutorial in one of my packages over to doc/tutorials. You can then pass doc/tutorials as the single directory housing all your tutorials.
In predocs you can keep adding things to copy with bash's && - or if that's not available for whatever reason, you'll find npm packages which let you do this (therefore not relying on whatever shell you're using).
Now that I think about it, it's best to also delete doc/tutorials in predocs:
"predocs": "rm -rf doc/tutorials && mkdir -p doc/tutorials && npm run copy:tutorials",
That way any tutorials you once copied there (but are now not interested in) will be cleared each time you generate the docs.
btw, I opened an issue for this: https://github.com/jsdoc3/jsdoc/issues/1330