How to identify the network performance issue? - server

I am a little confuse about my message server's network bottleneck issue. I can obviously found the problem caused by the a lot of network operation, but I am not sure why and how to identify it.
Currently we are using GCP as our VM and 4 core/8G RAM for our message server. Redis & Cassandra is in other server at the same place. The problem happened at the network operation to the redis server and cassandra server.
I need to handle 3000+ requests at once to save data to redis and 12000+ requests to cassandra server.
My task consuming all my CPU power and the CPU usage down right after I merge the redis request and cassandra request to kind of batch request. The penalty is I have to delay my data saving.
What I want to know is how can I know the network's capability of my system. How many requests within 1 second is a reasonable task?. As my testing, this is obviously true that the bottleneck is the network operation, but I can't prove it. I can't even know how to estimate a reasonable network usage of my system? Are there some tools or other thing that can help to my make sure my network's problem? Or this is just a error config of my GCP system?

There is a "monitoring" label in each instance where you can check through graphs values like instance CPU, Network and RAM usage.
But to further check the performance of your instance you should use StackDriver Logging1 and Monitoring2. It stores a lot of information from the internal servers and the system performance. for that you will need to install the agent in the instance. It also stores information about your Load Balancer3, in case you are using one with your web application, which is very advisable since it scale your resources up or down with intelligent Autoscaling.
But in order to test out your network you will need to use some third party tool to overload the network. There are multiple tools to achieve this, like JMeter.


Eclipse milo - Performance/scalability when deploying an OPCUA server in the cloud

I have created an OPCUA server with eclipse milo that is installed in the same machine where the clients are installed, so the communication works fast and reliably.
I did a bit of sniffing with wireshark to see how much communication involves under the hood and apparently there is a lot going on when monitoring variable, alarms, etc....
So I am thinking what issues I may expect in terms of performance and scalability if the server gets deployed in the cloud. I have seen that people talks about OPCUA cloud services, but not being this a hot topic is hard to foresee what challenges may come, and how well it scales and performs.
I would imagine that OPCUA uses sticky sessions, which means that you only can support a max number of users/requests, so dynamic scaling may not be an alternative right?
I tried the samples provides by eclipse milo, which are stored somewhere in the network, and it took long timeto connect to it. If that is the performance one may expect then the perception of the service for non-technical users would be that it does not work well.
Is the cloud a right place to use OPCUA considering the network overhead? Any recommendation to stick to local networks (intranet) only and skip the cloud?
Any feedback would be appreciated, thanks!
If you wanted to get into more detail and share Wireshark captures we might be able to go over parameters that would reduce traffic.
If bandwidth is a concern because you're using cellular or other constrained connections then sure, OPC UA may not be the best fit.
I'm curious what kind of delays or latency you experienced running the examples - connecting over the internet generally does not take very long, so perhaps you were also measuring the time it took to compile and start the example or there was something going on with your network.

Distributed systems with large number of different types of jobs

I want to create a distributed system that can support around 10,000 different types of jobs. One single machine can host only 500 such jobs, as each job needs some data to be pre-loaded into memory, which can't be kept in a cache. Each job must have redundancy for availability.
I had explored open-source libraries like zookeeper, hadoop, but none solves my problem.
The easiest solution that I can think of, is to maintain a map of job type, with its hosted machine. But how can I support dynamic allocation of job type on my fleet? How to handle machine failures, to make sure that each job type must be available on atleast 1 machine, at any point of time.
Based on the answers that you mentioned in the comments, I propose you to go for a MQ-based (Message Queue) architecture. What I propose in this answer is to:
Get the input from users and push them into a distributed message queue. It means that you should set up a message queue (Such as ActiveMQ or RabbitMQ) on several servers. This MQ technology, helps you to replicate the input requests for fault tolerance issues. It also provides a full end-to-end asynchronous system.
After preparing this MQ layer, you can setup you computing servers layers. This means that some computing servers (~20 servers in your case) will read the requests from the message queue and start a job based on the request. Because this MQ is distributed, you can make sure that a good level of load balancing can happen in your computing servers. In addition, each server is capable of running as much as jobs that you want (~500 in your case) based on the requests that it reads from the MQ.
Regarding the failures, the computing servers may only pop from the MQ, if and only if the job is completed. If one server is crashing, the job is still in the MQ and another server can work on it. If the job is saving some state somewhere or updates something, you should manage its duplicate run then.
The good point about this approach is that it is very salable. It means that if in future you have more jobs to handle, by adding a computing server and connecting it to the MQ, you can process more requests on the servers without any change to the system. In addition, some nice features in the MQ like priority-based queuing, helps you to prioritize the requests and process them based on the job type.
p.s. Your Q does not provide any details about the type and parameters of the system. This is a draft solution that I can propose. If you provide more details, maybe the community can help you more.

How much load can Parse Server on Compute Engine instance handle

The server will run singly on one instance of compute engine. What could limit it's serving capacity and how much load can a single instance (4 vCPUs and 15GB Memory) handle.
Note : I've already looked at Kubernetes and even load-balancing multiple instances but accessing the database from multiple clients is a little too complicated for me right now. So please keep in mind if you're going to suggest containerisation, that I'm a beginner.
Any and all advice is welcome. Thanks!
The serving capacity of the server depends on a number of factors, which includes the requests you receive from the clients, the additional applications running in it etc. For a 4 core CPU, as per this help center article, you will get a peak performance of 8Gb/s, which is good for a single instance. Since you are using a single parse server alone on the VM, it should work very well with the above-mentioned configuration.
A container is a tool for a developer. It contains all the dependencies and library which required to run/test a particular application in a container. The applications running in the container are easily portable.
There is this help center article which might give a precise idea of containers and its usage. While the Kubernetes Engine will help you to deploy/manage these containerized application.

Optimized environment for mongo

I have my RHEL linux server(VM) running a 4core processor and 8GB ram running the below applications
- an Apache Karaf container
- an Apache tomcat server
- an ActiveMQ server
- and the mongod server(either primary of secondary).
Often I see that mongo consumes nearly 80% of cpu. Now I see that my cpu and memory is overshooting most of the time and this has caused me to doubt whether my hardware config is too low for running these many components.
Please let me know if it is ok to run mongo like this on a shared server..
The question is to broad and the answer depends on too many variables, but I'll try to give you overall sense of it.
Can you use all these services together on the same machine at a minimum load? - for sure. It's not clear where other shards reside though, but it will work either way. You didn't provide your HDD specs which is quite important for a DB server, but again it will work at a minimum load.
Can you use this setup under heavy load - not the best idea. Perhaps it's better to have separate servers handling these services.
Monitor overall server load like: CPU, memory, IO. Check mongo logs for slow queries. If your queries supposed to run fast and they don't, you'll need more hardware.
Nobody would be really able to tell you how much load a specific server configuration can handle. You need at least 512Mb RAM and 1 CPU to get going these days but very soon you hit the limits. It all depends on how many users you have, what kinds of queries they run and how much data they cover.
Can you run MongoDB along other applications on a single server? Well it would appear that if you are having memory issues or CPU issues in your current configuration then you will likely need to address something. But "Can You?", well if it is not going to affect you then of course you can.
Should you, do this? Most people would firmly agree that you should not, and that would also stand for most of the other applications you are running on the one machine.
There are various reasons, process isolation, resource allocation, security, and far too many for a short topic response to go into why you should not have this kind of configuration. And certainly where it becomes a problem you should be addressing the issue by seeking a new configuration.
For Mongo alone, most people would not think twice about running their SQL database on dedicated hardware. The choice for Mongo should likely be no different.
Have also suggested this be moved to ServerFault, as it is not a programming question suited to stack overflow.

The benefits of deploying multiple instances for serving/data/cache

although I've much experience writing code. I don't really have much experience deploying things. I am writing a project that uses mongodb for persistence, redis for meta-caching, and play for serving pages. I am deciding whether to buy a dedicated server vs buying multiple small/medium instance from amazon/linode (one for each, mongo, redis, play). I have thought of the trade-offs as below, I wonder if anyone can add to the list or provide further insights. I am leaning toward (b) buying two sets of instances from linode and amazon, so if one of them have an outage it will fail over to the other provider. Also if anyone has any tips for deploying scala/maven cluster or tools to do so, much appreciated.
A. put everything in one instance
faster speed between database and page servlet (same host).
less end points to secure.
harder to manage. (in my opinion)
harder to upgrade a single module. if there are installation issues, it might bring down the whole system.
B. put each module (mongo,redis,play) in different instances
sharding is easier.
easier to create cluster for a single purpose. (i.e. cluster of redis)
easier to allocate resources between module.
less likely everything will fail at once.
bandwidth between modules -> $
secure each connection and end point.
I can only comment about the technical aspects (not cost, serviceability, etc ...)
It is not mentioned whether the dedicated instance is a physical box, or simply a large VM. If the application generates a lot of roundtrips to MongoDB or Redis, then the difference will be quite significant.
With a VM, the cost of I/Os, OS scheduling and system calls is higher. These elements tend to represent an important part in the performance cost of efficient remote data stores like MongoDB or Redis, and the virtualization toll is higher for them.
From a system point of view, I would not put MongoDB and Redis/Play on the same box if the MongoDB database is expected to be larger than the available memory. MongoDB maps data files in memory, and relies on the OS to perform memory swapping. It is designed for this. The other processes are not. Swapping induced by MongoDB will have catastrophic consequences on Redis and Play response time if they are all on the same box. So I would at least separate MongoDB from Redis/Play.
If you plan to use Redis for caching, it makes sense to keep it on the same box than the Play server. Redis will use memory, but low CPU. Play will use CPU, but not much memory. So it seems a good fit. Also, I'm not sure it is possible from Play, but if you use a unix domain socket to connect to Redis instead of the TCP loopback, you can achieve about 50% more throughput for free.