Monitor Nestjs backend - server

In the old days, when we wanted to monitor a "Daemon" / Service, we were asking the software editor the list of all the services running in the background in Windows.
If a "Daemon / service" would be down, it would be restarted.
On top of that, we would use a software like NAGIOS or Centreon to monitore this particular "Daemon / service".
I have a team of Software developper in charge of implementing a nice Nest JS.
Here is what we are going to implement:
2 differents VMs running on a high availability VMWARE cluster with a SAN
the two VMs has Vmotion / High availabity settings
an HA Proxy is setup in order to provide load balancing and additional high availability
Our questions are, how can we detect that :
one of our backend is down ?
one of our backend moving from 50ms average response time to 800ms ?
one of our backend consumes more that 15Gb of ram ?
etc
When we were using "old school" daemon, it was enough, when it comes to JS backend, I am a bit clue less.
Cheers
Kynes
nb : the datacenter in charge of our infrastructure is not "docker / kubernetes / ansible etc compliant)

To be fair, all of these seem doable out of the box for Centreon/Nagios. I'd say check the documentation...
one of our backend is down ?
VM DOWN: the centreon-vmware plugins provides monitoring of VM status.
VM UP but Backend DOWN : use the native http/https url checks provided by Centreon/Nagios to load the web page.
Or use the native SNMP plugins to monitor the status of your node process.
one of our backend moving from 50ms average response time to 800ms ?
Ping Response time: Use the native ping check
Status of the network interfaces of the VM: the centreon-vmware plugin has network interface checks for VMs.
Page loading time: use the native http/https url checks provided by Centreon/Nagios.
You may go even further and use a browser automation tool like selenium to run scenarios on your pages and monitor the time for each step.
one of our backend consumes more that 15Gb of ram ?
Total RAM consumed on server: use the native SNMP memory checks from centreon/nagios.
RAM consumed by a specific process: possible through the native SNMP memory plugin.
Like so:
/usr/lib/centreon/plugins/centreon_linux_snmp.pl --plugin os::linux::snmp::plugin --mode processcount --hostname=127.0.0.1 --process-name="centengine" --memory --cpu
OK: Number of current processes running: 1 - Total memory usage: 8.56 MB - Average memory usage: 8.56 MB - Total CPU usage: 0.00 % | 'nbproc'=1;;;0; 'mem_total'=8978432B;;;0; 'mem_avg'=8978432.00B;;;0; 'cpu_total'=0.00%;;;0;`

Related

Google cloud VM instance CPU usage very high, and growing month by month

I have configured a VM Instance in Google Cloud with nodered, mqtt, grafana and influxdb, and wireguard tunnel.
It is used only by me, for my home domotic system.
Only few devices.
CPU is growing month by month.
I have already read about that you can't compare TOP command with CPU utilization in google cloud monitoring service because of: "The CPU usage shown in Google Cloud Console is not that of the instance, but the CPU usage of the container managing the instance. This container is in charge of providing the virtualization services to the instance and collecting all the metrics used for load balancing, auto-scaling, cloud monitoring, etc. As such, high numbers of I/O or network operations will cause the CPU utilization shown in Cloud Console to spike."
But how could I get managed what is the real reason ?
With TOP I can't see anything, as the CPU is under 5%, but VM instance is over 80%. (one month ago, about 50%)
Thanks a lot.
NOTE: For example, last 30 days:
CPU USAGE

Redis Simple Production Server Specification

Currently I use redislabs to host my redis server, but redislabs cloud server not available in my web server hosting (softlayer) so the performance of my web server is decrease because of network latency (~20ms for 1 trip)
Because of that reason, I want to create a VPS to host redis in softlayer so my web server can connect to the redis server through LAN.
From redislabs i know that it consume ~400MB memory and has ~250 ops/sec in normal day, but can go to ~1500 ops/sec when we have an event like flash sale.
The question is which server specification can handle that kind of traffic?
Is VPS using 1 CPU x 4GB memory is enough?
Thank you
In the softlayer portal control when ordering a VPS there are many options with the characteristics that you want, we can not give you the specific characteristics for your requirements because we do not know if it will fulfill your expectations.
I could suggest you to order a hourly VPS with the characteristics you want and you can try it, if it does not work you can cancel it immediately to do not incur huge costs as with a monthly server.

AWS EB should create new instance once my docker reached its maximum memory limit

I have deployed my dockerized micro services in AWS server using Elastic Beanstalk which is written using Akka-HTTP(https://github.com/theiterators/akka-http-microservice) and Scala.
I have allocated 512mb memory size for each docker and performance problems. I have noticed that the CPU usage increased when server getting more number of requests(like 20%, 23%, 45%...) & depends on load, then it automatically came down to the normal state (0.88%). But Memory usage keeps on increasing for every request and it failed to release unused memory even after CPU usage came to the normal stage and it reached 100% and docker killed by itself and restarted again.
I have also enabled auto scaling feature in EB to handle a huge number of requests. So it created another duplicate instance only after CPU usage of the running instance is reached its maximum.
How can I setup auto-scaling to create another instance once memory usage is reached its maximum limit(i.e 500mb out of 512mb)?
Please provide us a solution/way to resolve these problems as soon as possible as it is a very critical problem for us?
CloudWatch doesn't natively report memory statistics. But there are some scripts that Amazon provides (usually just referred to as the "CloudWatch Monitoring Scripts for Linux) that will get the statistics into CloudWatch so you can use those metrics to build a scaling policy.
The Elastic Beanstalk documentation provides some information on installing the scripts on the Linux platform at http://docs.aws.amazon.com/elasticbeanstalk/latest/dg/customize-containers-cw.html.
However, this will come with another caveat in that you cannot use the native Docker deployment JSON as it won't pick up the .ebextensions folder (see Where to put ebextensions config in AWS Elastic Beanstalk Docker deploy with dockerrun source bundle?). The solution here would be to create a zip of your application that includes the JSON file and .ebextensions folder and use that as the deployment artifact.
There is also one thing I am unclear on and that is if these metrics will be available to choose from under the Configuration -> Scaling section of the application. You may need to create another .ebextensions config file to set the custom metric such as:
option_settings:
aws:elasticbeanstalk:customoption:
BreachDuration: 3
LowerBreachScaleIncrement: -1
MeasureName: MemoryUtilization
Period: 60
Statistic: Average
Threshold: 90
UpperBreachScaleIncrement: 2
Now, even if this works, if the application will not lower its memory usage after scaling and load goes down then the scaling policy would just continue to trigger and reach max instances eventually.
I'd first see if you can get some garbage collection statistics for the JVM and maybe tune the JVM to do garbage collection more often to help bring memory down faster after application load goes down.

How to make restfull service truely Highly Available with Hardware load balancer

When we have a cluster of machines behind a load balancer (lb), generally hardware load balancer have persistent connections,
Now when we need to deploy some update on all machines (rolling update), the way to do is by bringing one machine Out of rotation, looks for no request sent to that server via lb. When the app reached no request state then update manually.
With 70-80 servers in picture this becomes very painful.
Can someone have a better way of doing it.
70-80 servers is a very horizontally scaled implementation... good job! Better is a very relative term, hopefully one of these suggestions count as "better".
Implement an intelligent health check for the application with the ability to adjust the health check while the application is running. What we do is have the health check start failing while the application is running just fine. This allows the load balancer to automatically take the system out of rotation. Our stop scripts query the load balancer to make sure that it is out of rotation and then shuts down normally which allows the existing connections to drain.
Batch multiple groups of systems together. I am assuming that you have 70 servers to handle peak load. This means that you should be able to restart several at a time. A standard way to do this is to implement a simple token granting service with a maximum of 10 tokens. Have your shutdown scripts checkout a token before continuing.
Another way to do this is with blue/green deploys. That means that you have an entire second server farm and then once the second server farm is updated switch load balancing to point to the new server farm.
This is an alternate to option 3. Install both versions of the app on the same servers and then have an internal proxy service (like haproxy) switch the connections between the version of the app that is deployed. For example:
haproxy listening on 8080
app version 0.1 listening on 9001
app version 0.2 listening on 9002
Once you are happy with the deploy of app version 0.2 switch haproxy to send traffic to 9002. When you release version 0.3 then switch load balancing back to 9001 etc.

Queuing in Rackspace Cloud [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 9 years ago.
Improve this question
I've been using EC2 for deployment all the time and now I wanna give Rackspace a try ,My application is have to be scalable, so I used RabbitMQ as the main queuing system . The actions on the front-end could lead to a very large amount of jobs that need execution which I want to queue somewhere.
Due to the expected load profile of the application it makes sense to use a scalable infrastructure like the rackspace cloud. Now I am wondering where it would be best to queue the jobs. Queueing them on the front-end server means that the number of front-end servers can only be scalled back down once the queues are processed which is a waste of resources if the peak load on the front-end is over we want to scale that down and scale up on machines that process the queue items.
If we queue them on the database server we are adding the load onto a single machine which in the current setup is already the most likely botleneck. How would you design this?
is there any built-in queuing for Rackspace something like amazon SQS or something ?
They don't have anything like SQS but there are a few good services that you may be able to take advantage of:
Cloud Files
With Akamai CDN - push all your static stuff right out to your clients (I'm in Gold Coast Australia and cloud files public content comes to me from some server in Brisbane (13 msec vs 250 msec ping times for USA servers) and due to the effect of distance on download speed - faster download times for your users, plus absolutely no clogging the pipes on the web server during the Christmas rush.
The way I use it is:
I create a Cloud files container; this gets a unique hostname.
I create a CNAME DNS record (for example: cdn.supa.ws) pointing to that unique hostname.
I use cloudfuse to mount the directory both on my cloud server and on my home linux box.
Then just copy or upload files straight to that directory, then serve them from http://cdn.yourdomain.com
Load balancers as a service
http://www.rackspace.com/cloud/cloud_hosting_products/loadbalancers/ - Basically a bunch of Zeus load balancers that you can use to push requests to your back end servers. Cool because they're API programmable, so you can scale on the fly and add more backend servers as needed. They also have nice weighting algorithms, so you can send more traffic to certain servers if needed.
Internal VLAN
I would recommend using the 'internal IPs' (10.x.y.z) (the eth1 interface) for message queuing and DB data between Cloud Servers as they give you a higher outgoing bandwidth cap.
Outgoing Bandwidth (speed) caps:
256 MB Ram - 10 Mb/s eth0 - 20 Mb/s eth1
512 MB Ram - 20 Mb/s eth0 - 40 Mb/s eth1
1 GB Ram - 30 eth0 - 60 Mb/s eth1
2 GB Ram - 40 eth0 - 80 Mb/s eth1
4 GB Ram - 50 eth0 - 100 Mb/s eth1
8 GB Ram - 60 eth0 - 120 Mb/s eth1
15.5 GB Ram - 70 eth0 - 140 Mb/s eth1
eth1 is called an Internal VLAN, but it is shared with other customers, so best to firewall off your eth1 as well as your eth0, for example only allow mysql connections from your Cloud Servers; and if you have really sensetive stuff maybe use myqsl with ssl, just in case :)
MySQL as a service
There is also a MySQL as a service private beta. I haven't tried it yet, but looks like it has a lot of potential coolness: http://www.rackspace.com/cloud/blog/2011/12/01/announcing-the-rackspace-mysql-cloud-database-private-beta/
Rackspace don't offer a hosted queuing system.
I've been running RabbitMQ on their Cloud Servers for more than 2 years and things are good.
I haven't tried clustering though so I don't know how easy it would be to setup over there, nor how stable it would be in their environment.
Beanstalkd just rocks- Tubes function as pub-sub and can just work like a charm on any cloud vendor. 3-7 minutes to setup. Blazingly fast since uses memcache like queue.
You can write workers in any language you chose from. You cannot go wrong with this one.
Link:
http://kr.github.com/beanstalkd/