Improved deployment strategies for high memory Azure App Service - deployment

I have a Flask API running in an Azure App Service. The API loads quite a lot of data on startup and is using about 60-70% memory on a 8GB plan (P1V3). I'm planning to scale the App Service plan to 3 - 5 instances depending on traffic.
Now I also want to release new versions of the API without downtime, but having a stage slot requires me to scale the plan to 16GB in order to run two versions of the API simultaneously before swapping.
This is just a very inefficient use of resources as our API then runs at around 30% memory for double the cost, so I'm looking for solutions in order to optimize our approach.
I've tried to manually scale up from 8 to 16 GB on release, but this takes down the API even when we have multiple instances and "Always On" enabled.
Does App Services support deploying one instance at a time (rolling deployment), or other deployment strategies which doesn't require us to scale our app service plan to 16GB?


Web application deployment approach using Google Cloud - GKE

I deploying a python + tensorflow + flask application using a fully managed Google Cloud Run Service (1 vCPUs and 4 GB Ram).
System works fine but it is really slow, so I am evaluating ways of making it fast (it needs to run 20-30 times faster than what is doing now)
What would be the best approach?
To use a Kubernetes Cluster with one or two powerful machines
To use a Kubernetes Cluster with 3-5 weaker machines
To forget about Kubernets/Docker and run everything on single powerfull VM
Something else maybe?
For now I don't expect to have more than 10 users at a time but I want to be able to scale it up eventually.
You might want to evaluate according to your use case
Per this article, Fully managed Cloud Run is an ideal serverless platform for stateless containerized microservices that don’t require Kubernetes features like namespaces, co-location of containers in pods (sidecars) or node allocation and management.
GKE is a great choice if you are looking for a container orchestration platform that offers advanced scalability and configuration flexibility.
You mentioned you are looking the cheaper/easier method to develop, but this will probably not be as scalable, efficient or manageable, you might want to take a closer look at all cloud compute options in GCP to see what could benefit your use case the most.
You mentioned your use case is CPU intensive, so you might want to leverage the high CPU machine types, these might be used directly by creating a VM, creating an instance group or using them in other services like GKE or App Engine

Mongo database in GCP app engine

I'm currently looking into GCP app engine and I was figuring out how I would deploy a very large application with multiple services. I also wanted to use mongodb. GCP docs say that app engine allows dockerfiles and images. What would happen if I used the mongo docker image as a service on app engine? How would it scale it's instances? What will happen to consistency? I'm aware GCP have a third party solution for mongo, but since they allow docker images, what stops me from using it?
App Engine routinely tears down and creates new instances. If your instance is running MongoDB, then all the data stored in that instance will be lost.
This is why Google Cloud offers other, permanent places to store state, like Datastore and CloudSQL. You can also run MongoDb yourself on Google Compute Engine.
What would happen if I used the mongo docker image as a service on app engine?
Flexible App Engine allows you to use docker images to build your own application, as per is mentioned on this document [1]: "App Engine flexible environment instances are Compute Engine virtual machines, which means that you can take advantage of custom libraries, use SSH for debugging, and deploy your own Docker containers."
So there is no problem to use your own docker image in flexible app Engine.
How would it scale it's instances?
Each active version in App Engine must have at least one instance to handle requests, there are two ways to scale the instance in App Engine: automatic and manual.
As per is mentioned on the document[2]:
Automatic scaling creates instances based on request rate, response latencies, and other application metrics. You can specify thresholds for each of these metrics, as well as a minimum number instances to keep running at all times.
Manual scaling specifies the number of instances that continuously run regardless of the load level. This allows tasks such as complex initializations and applications that rely on the state of the memory over time.
The way you can configure these features is through the app.yaml file, I suggest you read this document[3]
What will happen to consistency?
Since App Engine scaling can be configured depending on its load, this allows for good performance in service execution and provides consistency in operations and optimization of resources.

AWS EB should create new instance once my docker reached its maximum memory limit

I have deployed my dockerized micro services in AWS server using Elastic Beanstalk which is written using Akka-HTTP( and Scala.
I have allocated 512mb memory size for each docker and performance problems. I have noticed that the CPU usage increased when server getting more number of requests(like 20%, 23%, 45%...) & depends on load, then it automatically came down to the normal state (0.88%). But Memory usage keeps on increasing for every request and it failed to release unused memory even after CPU usage came to the normal stage and it reached 100% and docker killed by itself and restarted again.
I have also enabled auto scaling feature in EB to handle a huge number of requests. So it created another duplicate instance only after CPU usage of the running instance is reached its maximum.
How can I setup auto-scaling to create another instance once memory usage is reached its maximum limit(i.e 500mb out of 512mb)?
Please provide us a solution/way to resolve these problems as soon as possible as it is a very critical problem for us?
CloudWatch doesn't natively report memory statistics. But there are some scripts that Amazon provides (usually just referred to as the "CloudWatch Monitoring Scripts for Linux) that will get the statistics into CloudWatch so you can use those metrics to build a scaling policy.
The Elastic Beanstalk documentation provides some information on installing the scripts on the Linux platform at
However, this will come with another caveat in that you cannot use the native Docker deployment JSON as it won't pick up the .ebextensions folder (see Where to put ebextensions config in AWS Elastic Beanstalk Docker deploy with dockerrun source bundle?). The solution here would be to create a zip of your application that includes the JSON file and .ebextensions folder and use that as the deployment artifact.
There is also one thing I am unclear on and that is if these metrics will be available to choose from under the Configuration -> Scaling section of the application. You may need to create another .ebextensions config file to set the custom metric such as:
BreachDuration: 3
LowerBreachScaleIncrement: -1
MeasureName: MemoryUtilization
Period: 60
Statistic: Average
Threshold: 90
UpperBreachScaleIncrement: 2
Now, even if this works, if the application will not lower its memory usage after scaling and load goes down then the scaling policy would just continue to trigger and reach max instances eventually.
I'd first see if you can get some garbage collection statistics for the JVM and maybe tune the JVM to do garbage collection more often to help bring memory down faster after application load goes down.

High availability on Bluemix

I've seen several status updates on Bluemix saying that applications are being restarted and there will be issues logging in, e.g.
During this time, you might experience temporary errors logging into
Bluemix or managing applications, such as starting, staging, and so
on. If this situation occurs, retry the operation later. The latest
status will be available at throughout
the upgrade process.
Existing applications will see a brief restart of instances, but near
continuous availability is expected.
Is it possible then to build a high-availability application on Bluemix?
IBM Bluemix supports deploying applications in multiple different regions.
Minimising downtime during platform issues can be achieved by hosting your application in multiple regions simultaneously and using an external load-balancer to move traffic between the instances depending on availability.
Replicating application data between regions will be dependent on the individual services you're using. For example, Cloudant supports multi-master replication, allowing you to failover without any manual intervention.

Why does Azure deployment take so long?

I'm trying to understand why it can take from 20-60min to deploy a small application to Azure (using the configuration/package upload method, not from within VS).
I've read through this situation and this one but I'm still a little unclear - is there a weird non-technology ritual that occurs while the instances are distributing, like somebody over at Microsoft lighting a candle or doing a dance?
As a fellow Azure user, I share your pain - deploying isn't "quick"/"painless" - and this hurts especially when you're in a development cycle and want to test dev iterations on Azure. However, in general deployments should take much less than 60 minutes - and less than 20 minutes too.
Steve Marx provided a brief overview of the steps involved in deployment:
And he references a deeper level explanation at:
There's a lot that goes on behind the scenes when you deploy an application to the Azure cloud. I don't have any special insight into what's going on behind the curtain, but having worked on the VS tools to upload projects to the Azure cloud, these are my impressions as an outsider looking in:
Among other things:
Hardware must be allocated from the available pool of servers
The VHD of the core OS must be uploaded to the machine
A VM instance must be initialized and booted off that VHD image
Your application package must be copied to the VM and installed
The VM monitor must wait for your service to start up, or fail
The data center load balancer and firewall must be made aware of your application's service endpoints
Once all of that has synchronized, your app is accessible from the web.
The VHD image is probably gigabytes in size, much larger than your app upload. Even on a superfast datacenter network, it takes time to move that much stuff into the VM, unpack it, and boot from it. Also, the load balancer and firewall are probably optimized to make routing requests the highest priority. Reconfiguring the firewall and load balancer is lower priority, and has to be done without interrupting traffic flow.
Also note that all this work only has to be done for a new deployment. Updating an existing deployment rolls out much faster - 2 to 3 minutes instead of 20 to 30 minutes.
Check out this PDC10 video by Mark Russinovich. He goes into great detail on what's going on inside Azure with some insights into the (admittedly slow) deployment process.
Original link is no longer working. Here's another link to a version of the same presentation: