a way to script automatically to start and stop the sql database in gcp - google-cloud-sql

i want to run a job in cloud scheduler in gcp to start and stop the sql database in weekdays at working hours.
I have tried by triggering cloud function and using pubsub but i am not getting proper way to do it.

You can use the Cloud SQL Admin API to start or stop and instance. Depending on your language, there are clients available to help you do this. This page contains examples using curl.
Once you've created two Cloud Functions (one to start, and one to stop), you can configure the Cloud Scheduler to send a pub/sub trigger to your function. Check out this tutorial which walks you through the process.

In order to achieve this you can use a Cloud Function to make a call to the Cloud SQL Admin API to start and stop your Cloud SQL instance (you will need 2 Cloud functions). You can see my code on how to use a Cloud Function to start a Cloud SQL instance and stop a Cloud SQL instance
After creating your Cloud Function you can configure the Cloud Scheduler to trigger the http address of each Cloud function

Related

google cloud python kubernetes service performing action on bucket write

I am trying to write some python service that will be deployed on kubernetes that does something similar to a cloud function triggered by google.storage.object.finalize action and listening on a bucket. In essence I need to replace a cloud function that was created with the following parameters:
--trigger-resource YOUR_TRIGGER_BUCKET_NAME
--trigger-event google.storage.object.finalize
however I can't find online any resource on how to do this. What would be the best way for some python script deployed in kubernetes to observe actions performed on a bucket and do something when a new file gets written into it? Thank you
You just need to enable pubsub notifications on the bucket to publish to a pub/sub topic: https://cloud.google.com/storage/docs/pubsub-notifications
And then have you python application listen to a subscription on the topic that you picked, either in a pull or push setup: https://cloud.google.com/pubsub/docs/pull.

Consume a REST API and store into BigQuery

We are trying to ingest data from Zoho Creator which exposes data through REST APIs.
Currently we are manually running a python script which retrieves the data and converts it into an AVRO file. Then we invoke a BQ load command to load it into BigQuery as large string objects per request.
What are the options to deploy this on GCP? Looking for minimal coding/configuration options.
We are thinking of a Python operator on Composer or Cloud functions.
Thanks.
You can do exactly the same thing with Cloud Functions or Cloud Run.
With Cloud Functions you have to wrap your script in a Cloud Functions function pattern
With Cloud Run, you have to wrap your script in a function and then to deploy a webserver which call this function. Finally, you need to build a container that you can deploy on Cloud Run. It seems more boiler plate but it offers more flexibility and runtime customization.
Do you want to call it on a scheduler? User Cloud Scheduler for that.
EDIT 1
If you use a Cloud Composer python operator, it can also work. But I don't like this solution:
You execute your business logic inside Composer, a workflow orchestrator. In term of separation of concern, it's not so good
The runtime isolation isn't guaranteed. I prefer Cloud Function for that.
If tomorrow you need to perform 100 requests in parallel, Composer isn't so scalable, Cloud Functions is.
There are a lot of change that Cloud Functions processing will be free. Cloud Composer cost at least $400 per month. If you already have a cluster, why not, but if not....
Cloud Run (more than Cloud Functions) is portable. if you need to deploy that elsewhere, you can. With composer you are sticky to the Python operator and you need rework to achieve the portability.

Stopping Cloud Data Fusion Instance

I have production pipelines which only runs for couple of hours using Google Data Fusion. I would like to stop the Data Fusion Instance and start it the next day. I don't see an option to stop the instance. Is there anyway we can stop the instance and start the same instance again ?
As per design Data Fusion instance is running in a GCP tenancy unit that guarantees the user fully automated way to manage all the cloud resources and services (GKE cluster, Cloud Storage, Cloud SQL, Persistent Disk, Elasticsearch, and Cloud KMS, etc.) for storing, developing and executing customer pipelines. Therefore, there is no possibility to terminate Data Fusion instance, thus all the pipeline service execution contributors are launching on demand and clearing after pipeline completion, find here the price charging concepts.

How to execute Amazon Lambda functions on dedicated EC2 server?

I am currently developing the backend for my app based on Amazon Web Services. I pretended to use DynamoDB to store the user's data, but finally opted for MongoDB, which I have already installed in my EC2 instance.
I have some code written in Python to update/query... the DB, so that when a Cognito event triggers my lambda function, this code is directly executed on my instance so I can access my DB. Any ideas how can I accomplish this?
As mentioned by Gustavo Tavares, "the whole point of lambda is to run code without the need to deploy EC2 instances". And you do not have to put your EC2 with database to "public" subnets for Lambda to access them. Actually, you should never do that.
When creating/editing Lambda configuration you may select to run it in any of you VPCs (Configuration -> Advanced Settings -> VPC). Then select Subnet(s) to run your Lambda in. This will create ENIs (Elastic Network Interface) for the virtual machines you Lambdas will run on.
Your subnets must have Routing/ACL configured to access the subnets where Database resides. At least one of the SecurityGroups associated with Lambda must also have Outbound traffic allowed to the Database subnet on appropriate ports (27017).
Since you mentioned that your Lambdas are "back-end" then you should probably put them in the same "private" subnets as your MongoDB and avoid any access/routing headache.
One way to accomplish this is to give the Lambda a SAM Template, then use sam local invoke inside of the EC2 instance to execute locally.
OK BUT WHY OH WHY WOULD ANYONE DO THIS?
If your Lambda requires access to both a VPC and the Internet, and doesn't use a lot of memory and doesn't really require scalability, and you already wrote the code (*), it's actually 10x cheaper(**) and higher-performing to launch a t3.nano EC2 Spot Instance on a public subnet than to add a NAT Gateway to the Lambda function.
(*) if you have not written the code yet, don't even bother to make it a Lambda.
(**) 10x cheaper as in $3 vs $30, so this really only applies to hobbyist projects on a shoestring budget. Don't do this at work, because the cost of engineers' time to manage and maintain an EC2 instance will far exceed $30/month over the long term.
If you want Lambda to execute code on your ec2-instances you'll need to use the SDK for the language you're writing your lambda in. Then you can simply use the AWS API to run commands on your EC2 instance.
See: http://docs.aws.amazon.com/systems-manager/latest/userguide/run-command.html
I think you misunderstood the idea of AWS lambda.
The whole point of lambda is to run code without the need to deploy EC2 instances. You upload the code and the infrastructure is provisioned on the fly. If your application does not need the infrastructure anymore (after a brief period), it vanishes and you will not be charged for the idle time. If you need it again a new infrastructure is provisioned.
If you have a service, like your MongoDB, running in EC2 instances your lambda functions can access it like any other code. You just need configure your lambda code to connect to the EC2 instance, like you would be doing if your database were installed in any other internet faced server.
For example: You can put your MongoDB server in a public subnet of your VPC and assign an elastic IP for your server. In your Python lambda code you configure your driver to connect to this elastic IP and update the database.
It will work like every service were deployed in different servers across internet: Cognito connect to Lambda functions across internet and then the python code deployed in lambda connect to your MongoDB across internet.
If I can give you an advice, try DynamoDB a little more. With DynamoDB it will be even more simple to make all this work, because you will not need to configure a public subnet and request an elastic IP. And the API for DynamoDB is not very different of the MongDB API.

Using Presto on Cloud Dataproc with Google Cloud SQL?

I use both Hive and MySQL (via Google Cloud SQL) and I want to use Presto to connect to both easily. I have seen there is a Presto initialization action for Cloud Dataproc but it does not work with Cloud SQL out of the box. How can I get that initialization action to work with Cloud SQL so I can use both Hive/Spark and Cloud SQL with Presto?
The easiest way to do this is to edit the initialization action installing Presto on the Cloud Dataproc cluster.
Cloud SQL setup
Before you do this, however, make sure to configure Cloud SQL so it will work with Presto. You will need to:
Create a user for Presto (or have a user ready)
Adjust any necessary firewall rules so your Cloud Dataproc cluster can connect to the Cloud SQL instance
Changing the initialization action
In the Presto initialization action there is a section which sets up the Hive configuration and looks like this:
cat > presto-server-${PRESTO_VERSION}/etc/catalog/hive.properties <<EOF
connector.name=hive-hadoop2
hive.metastore.uri=thrift://localhost:9083
EOF
You can add a new section like this (below) which sets up the mysql properties. Add something like this:
cat > presto-server-${PRESTO_VERSION}/etc/catalog/mysql.properties <<EOF
connector.name=mysql
connection-url=jdbc:mysql://<ip_address>:3306
connection-user=<username>
connection-password=<password>
EOF
You will obviously want to replace <ip_address>, <username>, and <password> with your correct values. Moreover, if you have multiple Cloud SQL instances to connect to, you can add multiple sections and give them different names, so long as the filename ends in .properties.