google cloud python kubernetes service performing action on bucket write - kubernetes

I am trying to write some python service that will be deployed on kubernetes that does something similar to a cloud function triggered by google.storage.object.finalize action and listening on a bucket. In essence I need to replace a cloud function that was created with the following parameters:
--trigger-resource YOUR_TRIGGER_BUCKET_NAME
--trigger-event google.storage.object.finalize
however I can't find online any resource on how to do this. What would be the best way for some python script deployed in kubernetes to observe actions performed on a bucket and do something when a new file gets written into it? Thank you

You just need to enable pubsub notifications on the bucket to publish to a pub/sub topic: https://cloud.google.com/storage/docs/pubsub-notifications
And then have you python application listen to a subscription on the topic that you picked, either in a pull or push setup: https://cloud.google.com/pubsub/docs/pull.

Related

How can I install connector config in kafka connect

Is there any other way to deploy connector config rather than POSTing connector config to kafka connect REST api? https://docs.confluent.io/platform/current/connect/references/restapi.html#tasks
I am thinking of any form of persistent approach like a volume or s3, where connect during bootstrap would grap those configs would be great. Don't know/can't find if thats anywhere available.
regards
The REST API is the only way.
You can use abstractions like Terraform or Kubernetes resources, however, which wrap an HTTP client.
If you use other storage, that'll require you to write extra code to download files and call the REST API.

Is there a way to deploy scheduled queries to GCP directly through a github action, with a configurable schedule?

Currently using GCP BigQuery UI for scheduled queries, everything is manually done.
Wondering if there's a way to automatically deploy to GCP using a config JSON that contains the scheduled query's parameters and scheduled times through github actions?
So far, this is one option I've found that makes it more "automated":
- store query in a file on Cloud Storage. When invoking Cloud Function, you read the file content and you perform a bigQuery job on it.
- have to update the file content to update the query
- con: read file from storage, then call BQ: 2 api calls and query file to manage
Currently using DBT in other repos to automate and make this process quicker: https://docs.getdbt.com/docs/introduction
Would prefer the github actions version though, just haven't found a good docu yet :)

Trigger a dataflow job deployed through Cloud Run on object creation in GCP Storage Bucket

I have created a dataflow pipeline which read a file from GCS bucket and process it. It is working when I execute the job from my local.
I deployed the dataflow job in Cloud Run with trigger on storage.object.create.
But when I upload any file in GCS bucket, no trigger message shows in the log or dataflow job not executed.
Trigger config
Ingress:Allow traffic
Authentication:Allow authentication
Event source:Cloud Storage
Event type:google.cloud.audit.log.v1.written
Create time:2021-02-12 (16:05:25)
Receive events from:All regions (global)
Service URL path:/
Service account:sdas-pipeline#sdas-demo-project.iam.gserviceaccount.com
Service name:storage.googleapis.com
Method name:storage.objects.create
What am I missing here? Please suggest.
The reason why your Cloud Run service isn't triggered is because there might be no audit logs written whenever an object is created/uploaded to your bucket. An Eventarc trigger is initiated whenever an event is written on Audit logs and by default, Cloud Storage is disabled:
The solution is to enable Audit Logs for Cloud Storage. It can be done two ways:
Enable it on the first time you create an Eventarc trigger.
Or go to IAM & Admin > Audit logs and make sure that all fields are checked for Cloud Storage:
As a reference, Audit logs can be seen on Home > Activity, here's an example:

a way to script automatically to start and stop the sql database in gcp

i want to run a job in cloud scheduler in gcp to start and stop the sql database in weekdays at working hours.
I have tried by triggering cloud function and using pubsub but i am not getting proper way to do it.
You can use the Cloud SQL Admin API to start or stop and instance. Depending on your language, there are clients available to help you do this. This page contains examples using curl.
Once you've created two Cloud Functions (one to start, and one to stop), you can configure the Cloud Scheduler to send a pub/sub trigger to your function. Check out this tutorial which walks you through the process.
In order to achieve this you can use a Cloud Function to make a call to the Cloud SQL Admin API to start and stop your Cloud SQL instance (you will need 2 Cloud functions). You can see my code on how to use a Cloud Function to start a Cloud SQL instance and stop a Cloud SQL instance
After creating your Cloud Function you can configure the Cloud Scheduler to trigger the http address of each Cloud function

Is it possible to get a notification if kubernetes job fails

I would like to know if it is possible to send notification using yaml config if the kubernetes job fails?
For example, I have a kubetnetes job which runs once in everyday. Now i have been running a jenkins job to check and send notification if the job fails. Do we have any options to get notification from kubernetes jobs directly if it fails? It should be something like we add in job yaml
I'm not sure about any built in notification support. That seems like the kind of feature you can find in external dedicated monitoring/notification tools such as Prometheus or Logstash output.
For example, you can try this tutorial to leverage the prometheus metrics generated by default in many kubernetes clusters: https://medium.com/#tristan_96324/prometheus-k8s-cronjob-alerts-94bee7b90511
Or you can theoretically setup Logstash and monitor incoming logs sent by filebeat and conditionally send alerts as part of the output stage of the pipelines via the "email output plugin"
Other methods exist as well as mentioned in this similar issue: How to send alerts based on Kubernetes / Docker events?
For reference, you may also wish to read this request as discussed in github: https://github.com/kubernetes/kubernetes/issues/22207