How to schedule compute instances/VM start/stop and backup of instances/VM
Do we have any scheduler in google cloud or provided by third-party?
With kubernetes and jenkins you can deploy a container (most likely triggered by cron) then shut it down or back it up on some interval.
https://github.com/GoogleCloudPlatform/kube-jenkins-imager
You can apparently bundle your app and configuration into an immutable custom image for improved boot time and reliability:
https://cloud.google.com/solutions/automated-build-images-with-jenkins-kubernetes
But no handy start/stop gui at the moment it seems.
I would like a tool to : at every (x) am/pm; spin up debian with my app, start node with pm2, send out messages, spin down debian into immutable "archive" on disk. save money by not leaving it on all day. Programming the infrastructure is more difficult than programming the app from my perspective, atm, but it definitely seems like a great opportunity to save time and/or make money by creating such a "streamlined gui" for this.
Related
I have a BE service in NestJS that is deployed in Vercel.
I need several schedulers, so I have used #nestjs/schedule lib, which is super easy to use.
Locally, everything works perfectly.
For some reason, the only thing that is not working in my production environment is those schedulers. Everything else is working - endpoints, data base access..
Does anyone has an idea why? is it something with my deployment? maybe Vercel has some issue with that? maybe this schedule library requires something the Vercel doesn't have?
I am clueless..
Cold boot is the process of starting a computer from shutdown or a powerless state and setting it to normal working condition.
Which means that the code you deployed in a serveless manner, will run when the endpoint is called. The platform you are using spins up a virtual machine, to execute your code. And keeps the machine running for a certain period of time, incase you get another API hit, it's cheaper and easier on them to keep the machine running for lets say 5 minutes or 60 seconds, than to redeploy it on every call after shutting the machine when function execution ends.
So in your case, most likely what is happening is that the machine that you are setting the cron on, is killed after a period of time. Crons are system specific tasks which run in the kernel. But if the machine is shutdown, the cron dies with it. The only case where the cron would run, is if the cron was triggered at a point of time, before the machine was shut down.
Certain cloud providers give you the option to keep the machines alive. I remember google cloud used to follow the path of that if a serveless function is called frequently, it shifts from cold boot to hot start, which doesn't kill the machine entirely, and if you have traffic the machines stay alive.
From quick research, vercel isn't the best to handle crons, due to the nature of the infrastructure, and this is what you are looking for. In general, crons aren't for serveless functions. You can deploy the crons using queues for example or another third party service, check out this link by vercel.
I have a python app that builds a dataset for a machine learning task on GCP.
Currently I have to start an instance of a VM that we have, and then SSH in, and run the app, which will complete in 2-24 hours depending on the size of the dataset requested.
Once the dataset is complete the VM needs to be shutdown so we don't incur additional charges.
I am looking to streamline this process as much as possible, so that we have a "1 click" or "1 command" solution, but I'm not sure the best way to go about it.
From what I've read about so far it seems like containers might be a good way to go, but I'm inexperienced with docker.
Can I setup a container that will pip install the latest app from our private GitHub and execute the dataset build before shutting down? How would I pass information to the container such as where to get the config file etc? It's conceivable that we will have multiple datasets being generated at the same time based on different config files.
Is there a better gcloud feature that suits our purpose more effectively than containers?
I'm struggling to get information regarding these basic questions, it seems like container tutorials are dominated by web apps.
It would be useful to have a batch-like container service that runs a container until its process completes. I'm unsure whether such a service exists. I'm most familiar with Google Cloud Platform and this provides a wealth of compute and container services. However -- to your point -- these predominantly scale by (HTTP) requests.
One possibility may be Cloud Run and to trigger jobs using Cloud Pub/Sub. I see there's async capabilities too and this may be interesting (I've not explored).
Another runtime for you to consider is Kubernetes itself. While Kubernetes requires some overhead in having Google, AWS or Azure manage a cluster for you (I strongly recommend you don't run Kubernetes yourself) and some inertia in the capacity of the cluster's nodes vs. the needs of your jobs, as you scale the number of jobs, you will smooth these needs. A big advantage with Kubernetes is that it will scale (nodes|pods) as you need them. You tell Kubernetes to run X container jobs, it does it (and cleans-up) without much additional management on your part.
I'm biased and approach the container vs image question mostly from a perspective of defaulting to container-first. In this case, you'd receive several benefits from containerizing your solution:
reproducible: the same image is more probable to produce the same results
deployability: container run vs. manage OS, app stack, test for consistency etc.
maintainable: smaller image representing your app, less work to maintain it
One (beneficial!?) workflow change if you choose to use containers is that you will need to build your images before using them. Something like Knative combines these steps but, I'd stick with doing-this-yourself initially. A common solution is to trigger builds (Docker, GitHub Actions, Cloud Build) from your source code repo. Commonly you would run tests against the images that are built but you may also run your machine-learning tasks this way too.
Your containers would container only your code. When you build your container images, you would pip install, perhaps pip install --requirement requirements.txt to pull the appropriate packages. Your data (models?) are better kept separate from your code when this makes sense. When your runtime platform runs containers for you, you provide configuration information (environment variables and|or flags) to the container.
The use of a startup script seems to better fit the bill compared to containers. The instance always executes startup scripts as root, thus you can do anything you like, as the command will be executed as root.
A startup script will perform automated tasks every time your instance boots up. Startup scripts can perform many actions, such as installing software, performing updates, turning on services, and any other tasks defined in the script.
Keep in mind that a startup script cannot stop an instance but you can stop an instance through the guest operating system.
This would be the ideal solution for the question you posed. This would require you to make a small change in your Python app where the Operating system shuts off when the dataset is complete.
Q1) Can I setup a container that will pip install the latest app from our private GitHub and execute the dataset build before shutting down?
A1) Medium has a great article on installing a package from a private git repo inside a container. You can execute the dataset build before shutting down.
Q2) How would I pass information to the container such as where to get the config file etc?
A2) You can use ENV to set an environment variable. These will be available within the container.
You may consider looking into Docker for more information about container.
I'm comparing salt-cloud and terraform as tools to manage our infrastructure at GCE. We use salt stack to manage VM configurations, so I would naturally prefer to use salt-cloud as an integral part of the stack and phase out terraform as a legacy thing.
However my use case is critical on VM deployment time because we offer PaaS solution with VMs deployed on customer request, so need to deliver ready VMs on a click of a button within seconds.
And what puzzles me is why salt-cloud takes so long to deploy basic machines.
I have created neck-to-neck simple test with deploying three VMs based on default CentOS7 image using both terraform and salt-cloud (both in parallel mode). And the time difference is stunning - where terraform needs around 30 seconds to deploy requested machines (which is similar to time needed to deploy through GCE GUI), salt-cloud takes around 220 seconds to deploy exactly same machines under same account in the same zone. Especially strange is that first 130 seconds salt-cloud does not start deploying and does seemingly nothing at all, and only after around 130 seconds pass it shows message deploying VMs and those VMs appear in GUI as in deployment.
Is there something obvious that I'm missing about salt-cloud that makes it so slow? Can it be sped up somehow?
I would prefer to user full salt stack, but with current speed issues it has I cannot really afford that.
Note that this answer is a speculation based on what I understood about terraform and salt-cloud, I haven't verified with an experiment!
I think the reason is that Terraform keeps state of the previous run (either locally or remotely), while salt-cloud doesn't keep state and so queries the cloud before actually provisioning anything.
These two approaches (keeping state or querying before doing something) are needed, since both tools are idempotent (you can run them multiple times safely).
For example, I think that if you remove the state file of Terraform and re-run it, it will assume there is nothing in the cloud and will actually instantiate a duplicate. This is not to imply that terraform does it wrong, it is to show that state is important and Terraform docs say clearly that when operating in a team the state should be saved remotely, exactly to avoid this kind of problem.
Following my line of though, this should also mean that if you either run salt-cloud in verbose debug mode or look at the network traffic generated by it, in the first 130 secs you mention (before it says "deploying VMs"), you should see queries from salt-cloud to the cloud provider to dynamically construct the state.
Last point, the fact that salt-cloud doesn't store the state of a previous run doesn't mean that it is automatically safe to use in a team environment. It is safe to use as long as no two team members run it at the same time. On the other hand, terraform with remote state on Consul allows for example to lock, so that team concurrent usage will always be safe.
This is a question of integration:
I would like to run Jenkins on Google Compute Engine. I can do this, but I will quickly break my budget if I leave an 8-core virtual machine running at all times. As a solution, I think I can leave a micro instance with a low amount of memory powered on and acting as the jenkins master running at all times. It seems that I should be able to configure github to startup a jenkins slave (with 8 cores) whenever a push is performed. How do I connect github post-commit hooks to Google Compute Engine to achieve this? A complete answer is probably asking too much, but even just pointers to the relevant documentation would be helpful.
Alternatively, how would you solve my problem?
You can run an AppEngine instance and use the URL it provides as the target of your GitHub on-commit web hook. This way, you won't be charged unless the instance is actually running, which may even be cheaper than running a micro instance 24x7 on Compute Engine.
You can then start/stop instances on Compute Engine or trigger actions on them from your code running on App Engine.
Here's a related question which has an answer for how to authenticate to Compute Engine from code running on AppEngine.
I ended up using a preemptible instance that automatically gets restarted every few minutes. I had to setup the instance manager to perform this restart, and I had to use the API, since this is a bit of an advanced and peculiar use of the features.
I'm trying to understand why it can take from 20-60min to deploy a small application to Azure (using the configuration/package upload method, not from within VS).
I've read through this situation and this one but I'm still a little unclear - is there a weird non-technology ritual that occurs while the instances are distributing, like somebody over at Microsoft lighting a candle or doing a dance?
As a fellow Azure user, I share your pain - deploying isn't "quick"/"painless" - and this hurts especially when you're in a development cycle and want to test dev iterations on Azure. However, in general deployments should take much less than 60 minutes - and less than 20 minutes too.
Steve Marx provided a brief overview of the steps involved in deployment:
http://blog.smarx.com/posts/what-happens-when-you-deploy-on-windows-azure
And he references a deeper level explanation at: http://channel9.msdn.com/blogs/pdc2008/es19
There's a lot that goes on behind the scenes when you deploy an application to the Azure cloud. I don't have any special insight into what's going on behind the curtain, but having worked on the VS tools to upload projects to the Azure cloud, these are my impressions as an outsider looking in:
Among other things:
Hardware must be allocated from the available pool of servers
The VHD of the core OS must be uploaded to the machine
A VM instance must be initialized and booted off that VHD image
Your application package must be copied to the VM and installed
The VM monitor must wait for your service to start up, or fail
The data center load balancer and firewall must be made aware of your application's service endpoints
Once all of that has synchronized, your app is accessible from the web.
The VHD image is probably gigabytes in size, much larger than your app upload. Even on a superfast datacenter network, it takes time to move that much stuff into the VM, unpack it, and boot from it. Also, the load balancer and firewall are probably optimized to make routing requests the highest priority. Reconfiguring the firewall and load balancer is lower priority, and has to be done without interrupting traffic flow.
Also note that all this work only has to be done for a new deployment. Updating an existing deployment rolls out much faster - 2 to 3 minutes instead of 20 to 30 minutes.
Check out this PDC10 video by Mark Russinovich. He goes into great detail on what's going on inside Azure with some insights into the (admittedly slow) deployment process.
Original link is no longer working. Here's another link to a version of the same presentation: https://channel9.msdn.com/events/Build/BUILD2011/SAC-853T