loading models in FastAPI projects at startup - deployment

so I'm currently working on a FastAPI project that serves multiple NLP services. To do so I want to provide different models from spacy as well as huggingface.
Since those models are quite big the inference time when loading the models for each post request is quite long. My idea is to load all the models on FastAPI startup (in the app/main.py), however, I'm not sure, whether this is a good choice/idea or if there a some drawbacks to this approach since the models will be in the cache(?). (Info: I want to dockerize the project and deploy it on a virtual machine afterwards)
So far I wasn't able to find any guidance on the internet, so I hope to get a good answer here :)
Thanks in advance!

If you are deploying your app using gunicorn + uvicorn worker stack. You can use gunicorn's --preload flag.
From the documentation of gunicorn
preload_app
--preload Default: False
Load application code before the worker processes are forked.
By preloading an application you can save some RAM resources as well
as speed up server boot times. Although, if you defer application
loading to each worker process, you can reload your application code
easily by restarting workers.
You just need to use --preload flag with your running options.
gunicorn --workers 2 --preload --worker-class=uvicorn.workers.UvicornWorker my_app:app

Related

Best way to deploy long-running high-compute app to GCP

I have a python app that builds a dataset for a machine learning task on GCP.
Currently I have to start an instance of a VM that we have, and then SSH in, and run the app, which will complete in 2-24 hours depending on the size of the dataset requested.
Once the dataset is complete the VM needs to be shutdown so we don't incur additional charges.
I am looking to streamline this process as much as possible, so that we have a "1 click" or "1 command" solution, but I'm not sure the best way to go about it.
From what I've read about so far it seems like containers might be a good way to go, but I'm inexperienced with docker.
Can I setup a container that will pip install the latest app from our private GitHub and execute the dataset build before shutting down? How would I pass information to the container such as where to get the config file etc? It's conceivable that we will have multiple datasets being generated at the same time based on different config files.
Is there a better gcloud feature that suits our purpose more effectively than containers?
I'm struggling to get information regarding these basic questions, it seems like container tutorials are dominated by web apps.
It would be useful to have a batch-like container service that runs a container until its process completes. I'm unsure whether such a service exists. I'm most familiar with Google Cloud Platform and this provides a wealth of compute and container services. However -- to your point -- these predominantly scale by (HTTP) requests.
One possibility may be Cloud Run and to trigger jobs using Cloud Pub/Sub. I see there's async capabilities too and this may be interesting (I've not explored).
Another runtime for you to consider is Kubernetes itself. While Kubernetes requires some overhead in having Google, AWS or Azure manage a cluster for you (I strongly recommend you don't run Kubernetes yourself) and some inertia in the capacity of the cluster's nodes vs. the needs of your jobs, as you scale the number of jobs, you will smooth these needs. A big advantage with Kubernetes is that it will scale (nodes|pods) as you need them. You tell Kubernetes to run X container jobs, it does it (and cleans-up) without much additional management on your part.
I'm biased and approach the container vs image question mostly from a perspective of defaulting to container-first. In this case, you'd receive several benefits from containerizing your solution:
reproducible: the same image is more probable to produce the same results
deployability: container run vs. manage OS, app stack, test for consistency etc.
maintainable: smaller image representing your app, less work to maintain it
One (beneficial!?) workflow change if you choose to use containers is that you will need to build your images before using them. Something like Knative combines these steps but, I'd stick with doing-this-yourself initially. A common solution is to trigger builds (Docker, GitHub Actions, Cloud Build) from your source code repo. Commonly you would run tests against the images that are built but you may also run your machine-learning tasks this way too.
Your containers would container only your code. When you build your container images, you would pip install, perhaps pip install --requirement requirements.txt to pull the appropriate packages. Your data (models?) are better kept separate from your code when this makes sense. When your runtime platform runs containers for you, you provide configuration information (environment variables and|or flags) to the container.
The use of a startup script seems to better fit the bill compared to containers. The instance always executes startup scripts as root, thus you can do anything you like, as the command will be executed as root.
A startup script will perform automated tasks every time your instance boots up. Startup scripts can perform many actions, such as installing software, performing updates, turning on services, and any other tasks defined in the script.
Keep in mind that a startup script cannot stop an instance but you can stop an instance through the guest operating system.
This would be the ideal solution for the question you posed. This would require you to make a small change in your Python app where the Operating system shuts off when the dataset is complete.
Q1) Can I setup a container that will pip install the latest app from our private GitHub and execute the dataset build before shutting down?
A1) Medium has a great article on installing a package from a private git repo inside a container. You can execute the dataset build before shutting down.
Q2) How would I pass information to the container such as where to get the config file etc?
A2) You can use ENV to set an environment variable. These will be available within the container.
You may consider looking into Docker for more information about container.

NixOS within NixOS?

I'm starting to play around with NixOS deployments. To that end, I have a repo with some packages defined, and a configuration.nix for the server.
It seems like I should then be able to test this configuration locally (I'm also running NixOS). I imagine it's a bad idea to change my global configuration.nix to point to the deployment server's configuration.nix (who knows what that will break); but is there a safe and convenient way to "try out" the server locally - i.e. build it and either boot into it or, better, start it as a separate process?
I can see docker being one way, of course; maybe there's nothing else. But I have this vague sense Nix could be capable of doing it alone.
There is a fairly standard way of doing this that is built into the default system.
Namely nixos-rebuild build-vm. This will take your current configuration file (by default /etc/nixos/configuration.nix, build it and create a script allowing you to boot the configuration into a virtualmachine.
once the script has finished, it will leave a symlink in the current directory. You can then boot by running ./result/bin/run-$HOSTNAME-vm which will start a boot of your virtualmachine for you to play around with.
TLDR;
nixos-rebuild build-vm
./result/bin/run-$HOSTNAME-vm
nixos-rebuild build-vm is the easiest way to do this, however; you could also import the configuration into a NixOS container (see Chapter 47. Container Management in the NixOS manual and the nixos-container command).
This would be done with something like:
containers.mydeploy = {
privateNetwork = true;
config = import ../mydeploy-configuration.nix;
};
Note that you would not want to specify the network configuration in mydeploy-configuration.nix if it's static as that could cause conflicts with the network subnet created for the container.
As you may already know, system configurations can coexist without any problems in the Nix store. The problem here is running more than one system at once. For this, you need an isolation or virtualization tools like Docker, VirtualBox, etc.
NixOS Containers
NixOS provides an efficient implementation of the container concept, backed by systemd-nspawn instead of an image-based container runtime.
These can be specified declaratively in configuration.nix or imperatively with the nixos-container command if you need more flexibility.
Docker
Docker was not designed to run an entire operating system inside a container, so it may not be the best fit for testing NixOS-based deployments, which expect and provide systemd and some services inside their units of deployment. While you won't get a good NixOS experience with Docker, Nix and Docker are a good fit.
UPDATE: Both 'raw' Nix packages and NixOS run in Docker. For example, Arion supports images from plain Nix, NixOS modules and 'normal' Docker images.
NixOps
To deploy NixOS inside NixOS it is best to use a technology that is designed to run a full Linux system inside.
It helps to have a program that manages the integration for you. In the Nix ecosystem, NixOps is the first candidate for this. You can use NixOps with its multiple backends, such as QEMU/KVM, VirtualBox, the (currently experimental) NixOS container backend, or you can use the none backend to deploy to machines that you have created using another tool.
Here's a complete example of using NixOps with QEMU/KVM.
Tests
If the your goal is to run automated integration tests, you can make use of the NixOS VM testing framework. This uses Linux KVM virtualization (expose /dev/kvm in sandbox) to run integrations test on networks of virtual machines, and it runs them as a derivation. It is quite efficient because it does not have to create virtual machine images because it mounts the Nix store in the VM. These tests are "built" like any other derivation, making them easy to run.
Nix store optimization
A unique feature of Nix is that you can often reuse the host Nix store, so being able to mount a host filesystem in the container/vm is a nice feature to have in your solution. If you are creating your own solutions, depending on you needs, you may want to postpone this optimization, because it becomes a bit more involved if you want the container/vm to be able to modify the store. NixOS tests solve this with an overlay file system in the VM. Another approach may be to bind mount the Nix store forward the Nix daemon socket.

Managing resources (database, elasticsearch, redis, etc) for tests using Docker and Jenkins

We need to use Jenkins to test some web apps that each need:
a database (postgres in our case)
a search service (ElasticSearch in our case, but only sometimes)
a cache server, such as redis
So far, we've just had these services running on the Jenkins master, but this causes problems when we want to upgrade Postgres, ES or Redis versions. Not all apps can move in lock step, and we want to run the tests on new versions before committing to move an app in production.
What we'd like to do is have these services provided on a per-job-run basis, each one running in its own container.
What's the best way to orchestrate these containers?
How do you start up these ancillary containers and tear them down, regardless of whether to job succeeds or not?
how do you prevent port collisions between, say, the database in a run of a job for one web app and the database in the job for another web app?
Check docker-compose and write a docker-compose file for your tests.
The latest network features of Docker (private network) will help you to isolate builds running in parallel.
However, start learning docker-compose as if you only had one build at the same time. When confident with this, look further for advanced docker documentation around networking.

Production sailsjs app with no downtime with pm2

I have a sailsjs app running in cluster mode with pm2 and two instances. One of the main reasons for wanting the two instances was so I could restart/update the app without having to bring the entire app down.
However, in the middle of a restart of one instance pm2 restart 4, the site is all wonky (that's the technical term) if I refresh it. I'm assuming this is because grunt is doing it's thing and the .tmp folder gets destroyed for both instances?
Is the only real approach with sailsjs to have two complete instances running on different ports and use something like nginx as the load balancer, or am I missing something with PM2 that would allow for staged restarts without any downtime or hiccups in the resources being available?
There are a few issues here.
You need to provide what versions of sails.js/node.js/pm2 you're
running. In short, describe your environment as completely as
possible.
Describing your issue more completely helps people write more concise and clear answers.
node.js cluster mode may change (as of v0.12.4) and is still considered "Unstable": https://nodejs.org/api/cluster.html#cluster_cluster
In the following thread, "mikermcneil commented on Dec 3, 2014" says to disable Grunt for production with pm2: https://github.com/balderdashy/sails/pull/1716
Let me clarify by saying I've used pm2 until just recently. In addition to Grunt, it has issues with socket connections while nginx handles it just fine. Trust me, chasing down that bug was not fun. Here's a link to the thread: https://github.com/Unitech/PM2/issues/389
As an alternate solution I chose to use nginx with parallel sails.js apps, using redis for sockets and sessions. Use forever to keep the apps running and disable grunt. Point nginx to the assets folder to serve static files quickly, bypassing sails.js and add caching to those assets.
Hope this helps!

Dotcloud multiple services

I'm new to dotcloud, and am confused about how multiple services work together.
my yaml build file is:
www:
type: python
db:
type: postgresql
worker:
type: python-worker
broker:
type: rabbitmq
And my supervisord file contains commands to start django celery & celerycam.
When I push my code out to my app, I can see that both the www & worker services start up their own instances of celery & celery cam, and also for example the log files will be different. This makes sense (although isn't made very clear in the dotcloud documentation in IMO - the documentation talks about setting up a worker service, but not how to combine that with other services), but does raise the question of how to configure an application where the python service mainly serves the web page, whilst the python worker service works on background tasks, eg: celery.
The dotcloud documentation daemon makes mention of this:
"However, you should be aware that when you scale your application,
the cron tasks will be scheduled in all scaled instances – which is
probably not what you need! So in many cases, it will still be better
to use a separate service.
Similarly, a lot of (non-worker) services already run Supervisor, so
you can run additional background jobs in those services. Then again,
remember that those background jobs will run in multiple instances if
you scale your application. Moreover, if you add background jobs to
your web service, it will get less resources to serve pages, and your
performance will take a significant hit."
How do you configure dotcloud & your application to run just the webserver on one service, and background tasks on the worker service? Would you scale workers by increasing the concurrency setting in celery (and scaling the one service vertically), by adding extra worker services, or both?
Would you do this so that firstly the webserver service doesn't have to use resources in processing background tasks, and secondly so that you could scale the worker services independently of the webserver service?
There are two tricks.
First you could use different approots for your www and worker services to separate the code they will run:
www:
type: python
approot: frontend
# ...
worker:
type: python-worker
approot: backend
# ...
Second, since your postinstall script is different for each approot, you can copy a file out to become the correct supervisord.conf for that particular service.
You may also want to look at the dotCloud tutorial and sample code for django-celery.
/Andy