Eclipse milo - Performance/scalability when deploying an OPCUA server in the cloud - opc-ua

I have created an OPCUA server with eclipse milo that is installed in the same machine where the clients are installed, so the communication works fast and reliably.
I did a bit of sniffing with wireshark to see how much communication involves under the hood and apparently there is a lot going on when monitoring variable, alarms, etc....
So I am thinking what issues I may expect in terms of performance and scalability if the server gets deployed in the cloud. I have seen that people talks about OPCUA cloud services, but not being this a hot topic is hard to foresee what challenges may come, and how well it scales and performs.
I would imagine that OPCUA uses sticky sessions, which means that you only can support a max number of users/requests, so dynamic scaling may not be an alternative right?
I tried the samples provides by eclipse milo, which are stored somewhere in the network, and it took long timeto connect to it. If that is the performance one may expect then the perception of the service for non-technical users would be that it does not work well.
Is the cloud a right place to use OPCUA considering the network overhead? Any recommendation to stick to local networks (intranet) only and skip the cloud?
Any feedback would be appreciated, thanks!

If you wanted to get into more detail and share Wireshark captures we might be able to go over parameters that would reduce traffic.
If bandwidth is a concern because you're using cellular or other constrained connections then sure, OPC UA may not be the best fit.
I'm curious what kind of delays or latency you experienced running the examples - connecting over the internet generally does not take very long, so perhaps you were also measuring the time it took to compile and start the example or there was something going on with your network.

Related

Server Architecture for hosting Java PLAY application in the cloud

This is rather a set of questions than one very specific question. In the last couple weeks/days I puzzled together information regarding how to properly host a JAVA PLAY application "in the cloud", as lots of this information is scattered over different services, I felt like gathering up all these small pieces to one, because lots of things are important to be seen in full context. However, I moved my considerations to the bottom of the question, as they are mainly my opinions and subjective findings, which I don't want to be held responsible for. If I got something wrong, please don't hesitate to point that out.
Hosting Java PLAY + MySQL on AWS for world wide accessibility
Our Scenario: we have a quite straight forward application written within the Java PLAY framework (https://www.playframework.com/), working on iOS and Android as well as with a backend-system (for administration, content management and API), storing data in a MySQL DB. While most of the users' interactions with the server is quick and easy (login, sync some data) there are also some more data-intensive tasks (download some <100mb data zips to the mobile phone, upload a couple of mb to the server). Therefore we were looking for a solution to properly provide users far away from our servers with reasonable response times. The obvious next step was hosting in the cloud.
Hosting setup within AWS:
Horizontal scaling: for the start, only 1 EC2 instance with our app will be running in eu-1a. We will need to evaluate how much resources one instance actually requires, if more instances are needed and if more instances would actually benefit to quicker response times.
Horizontal scaling across regions: once the app generates heavy user load from another region, the whole EC2 instance should be duplicated and put to another region, running a db read replica (see Setting up a globally available web app on amazon web services and https://aws.amazon.com/de/blogs/aws/cross-region-read-replicas-for-amazon-rds-for-mysql/ ).
Vertical scaling of EC2 instances: in recent tests of the old hosting setup, the database proved to be the bottleneck rather than the play app and its server's hardware specifications. Therefore it is not yet fully clear how much vertical scaling would affect response times. If a t2.micro instance serves as good as a m3.xlarge instance, of course we would rather climb our way up from the bottom here.
Vertical scaling of RDS: we will need to estimate how much traffic hits the DB server and what CPU/RAM/etc will be required. Probably we will work our way up here aswell.
Global Redirection: done using Amazon Route 53 (?). A user from Tokio should be redirected to the EC2 instance running in Asia; a user from Rome to the EC2 instance in Europe. This does not only affect API calls within the app, but also content delivery (in both directions).
Open Questions regarding the setup
Is this setup conclusive? Am I missing crucial components?
Regarding global redirection: is Amazon Route 53 the right tool? How does it differ from CloudFront (which strikes me to be purely for content / media distribution?).
How do I define correct data/api endpoints for my app? Of course I don't want to define the database endpoint of a db read replica during app deployment. Will this also happen during the AR53 (question 2) setup? Same goes for API calls, of course the app should direct it's calls to https://myurl.com/api and from there it should be redirected. Is this realistic?
I would highly appreciate all kinds of thoughts (!), also regarding the background info written below. If you can point me to further reading to solve my questions on my own, I am also very thankful - there is simply a huge load of information regarding this, but this makes it hard to narrow the answers down. I do have knowledge in hosting/servers, but I am pretty sure there are true experts out there waiting to slap me with knowledge. :)
Background-Information
Current Hosting Setup: a load balancer distributes the traffic on 2 root linux servers, both of them running the PLAY app, one of them also holding the MySQL installation.
The current hosting setup has 3 big flaws:
No vertical scalability: the hosting company would take money for each scaling step. Currently the servers are running idle, but if the app booms, we could run short on capacity quickly. Running idle is still paid as if permanently under full load. This is expensive!
No deployment support: currently, we connect through SSH, manually deploy the correct folders to the file system, recompile on the server, set privileges, apply database evolutions; do the same for the second server (with different db connection parameters). What could possibly go wrong. ;)
No worldwide availability: to set up another server in another region of the world would mean a huge effort. To have a synchronized replica of our DB can be done, but once again deploying would mean downtime, room for errors and therefore time and money.
Hosting Options for Java PLAY:
There are lot of different blog posts about this. In short:
AWS: Amazon Web Services is one of the first places you start looking. Here you get everything that's possible, at a flexible price. You set yourself up an EC2 instance, a MySQL RDS and you're good to go - all of this in the free tier, so you can experiment, play around, test your stuff.
Microsoft Azure: similar to AWS regarding pricing and possibilities. However, I did not dive into setting up and deploying our application for test purposes.
Heroku: super easy deployment from within PLAY, scalable servers. However (on the first glance?) lacks possibility to supply remote regions with high speed content.
Jelastic: even easier deployment from within PLAY / IntelliJ IDEA. You push your app image to jelastic, jelastic distributes it further to their infrastructure providers.
RedHat OpenShift (https://www.openshift.com/): sounds promising, yet not as complete as AWS.
Lots of choices and possible setups/prices. Especially after finding out about deployment using boxfuse (https://cloudcaptain.sh/) I made my choice for AWS, as it offers absolutely all we need from 1 source. Boxfuse has low monthly costs but is perfectly integrated into AWS. Scaling is supported as well as the 3 common environments (dev/test/prod). Support is outstanding.
The setup looks good. I would however make one change: your large up- & downloads. As mobile speeds may not be ideal, have your app serve long-running requests is something you should avoid as this will needlessly tie up server threads. Instead consider having users upload and download straight from S3 using presigned URLs. You can then later add CloudFront to the mix when it makes financial sense to do so.
R53 will work just fine for picking the best server(s) for each end user.
For EC2 consider having an ELB + Auto-Scaling Group setup. Even just for a single instance you get the benefit of permanent health monitoring and auto-respawns. If you expect more load you can then auto-scale based on your expected bottleneck (cpu, network i/o). This will give you a more autonomous and robust setup than manually having to scale up and down based on your own monitoring analysis (even though the scaling part is very easy if you stick with immutable infrastructure & blue/green deployments like what Boxfuse offers).
Your focus on vertical server scaling might not serve you well on AWS. I would start thinking about horizontal scaling of app servers behind an Elastic Load Balancer, and possibly look into Elastic Beanstalk.
I'm not sure you can setup a read replica in another region via RDS, you might have to set that up via MySQL servers running on standard EC2 instances. And even if you can, that's going to be some expensive and high-latency data transfer.
If file uploads and downloads are all you are worried about, you just need to put CloudFront (Amazon's CDN service) in front of your application, and allow it to handle file uploads and downloads via its global edge servers. You could even do this without moving your entire application into AWS. I would recommend reading this blog post as a start.

Is using Erlang's gen_tcp a scalable way to construct a high traffic socket server

I am trying to learn Erlang to do some simple but scalable network programming. I basically want to write a program that does what servers on the backbone of the internet do--but on a smaller scale. I want to try to set up an intranet with web accessible servers which would act as gateways to the intranet [sic] and route data to connected clients and/or other gateways.
The high traffic would come from the fact that data would not only flow from client to gateway to client, but might have to bounce around a few gateways to get to the destination (like how data travels on the internet). This means that the gateways would have to not only handle traffic from their clients, but traffic from other gateways' clients.
I figured this would lead to unusually high levels of traffic, even for a medium number of clients and gateways.
Coming from a background in Python and, to a lesser extent, other scripting languages, I am used to digging for a customized module to solve my problems. I understand Erlang is designed for high traffic network programming, but all I could find in terms of libraries/modules for this kind of thing was gen_tcp.
Does this mean that Erlang is already so optimized for this kind of thing that you can fire it up with its most basic modules and expect it to scale nicely?
You can expect gen_tcp to perform extremely well, even under conditions of massive load. If you are just going to pass around data and not process it much, then my guess is you will be able to scale quite nicely - effectively you will just be passing around pointers.
All of the known scalable solutions written in Erlang uses gen_tcp:
Cowboy, Mochiweb, Yaws, ...
Riak
Etorrent
RabbitMQ
and so on. When using it, there is a hint worth mentioning though: Make sure you run erl as erl +K true so you get access to the kernel polling. That is, epoll() on Linux, kqueue()/kevent() on BSD and /dev/poll on Solaris. Also note that you can give commands to TCP ports to set their options w.r.t. buffer size and so on. Finally, for certain types of packets, you can have the C-layer parse the packet for you, see erl -man inet and the setopts/2 call. An example would be {packet, 4} which is quite popular.
In general, Erlang has a quite fast I/O sublayer. You can expect it to perform really quickly, even for large complex interactions.

simulate server load with BSD sockets

I'm using blocking TCP sockets in C and I want to simulate a high load on the server when there are many simultaneous connections and then I want to measure the time necessary to access the server via a browser during this high load time (the server understands HTTP headers).
Also each client request ends fast (sends a HTTP header - gets text).
How do I do this (without crashing my local machine -> I tried using fork to make many clients; also, I have a virtual machine too).
If anyone has an idea or some general directions about how to do this, it would mean a lot.
Edit: I need to run this with my own client, which uses a modified version of the OpenSSL library to connect to my SSL/TLS server, so I can't use external test tools.
I want to know how to build the client and the server. I don't know too much about other sockets than the blocking ones, I'm just skimming through the UNIX Network Programming book of Richard Stevens, but I was wondering if anyone could point out the exact solution.
Thank you !
The easiest resolution to this would be to download an existing stress testing framework such as fwptt ( http://fwptt.sourceforge.net/ ).
If you want to implemennt your own stress testing framework, I'd suggest you lose the blocking nature of your code and go with a parallel design that will scale beautifully. The limiting factor is pretty much your CPU then.
Having two physical servers would be ideal, so that then your stress test isn't affecting the CPU (and therefore the response times) of the server. Also that VM of yours drains up precious CPU time.

Scala + Akka: How to develop a Multi-Machine Highly Available Cluster

We're developing a server system in Scala + Akka for a game that will serve clients in Android, iPhone, and Second Life. There are parts of this server that need to be highly available, running on multiple machines. If one of those servers dies (of, say, hardware failure), the system needs to keep running. I think I want the clients to have a list of machines they will try to connect with, similar to how Cassandra works.
The multi-node examples I've seen so far with Akka seem to me to be centered around the idea of scalability, rather than high availability (at least with regard to hardware). The multi-node examples seem to always have a single point of failure. For example there are load balancers, but if I need to reboot one of the machines that have load balancers, my system will suffer some downtime.
Are there any examples that show this type of hardware fault tolerance for Akka? Or, do you have any thoughts on good ways to make this happen?
So far, the best answer I've been able to come up with is to study the Erlang OTP docs, meditate on them, and try to figure out how to put my system together using the building blocks available in Akka.
But if there are resources, examples, or ideas on how to share state between multiple machines in a way that if one of them goes down things keep running, I'd sure appreciate them, because I'm concerned I might be re-inventing the wheel here. Maybe there is a multi-node STM container that automatically keeps the shared state in sync across multiple nodes? Or maybe this is so easy to make that the documentation doesn't bother showing examples of how to do it, or perhaps I haven't been thorough enough in my research and experimentation yet. Any thoughts or ideas will be appreciated.
HA and load management is a very important aspect of scalability and is available as a part of the AkkaSource commercial offering.
If you're listing multiple potential hosts in your clients already, then those can effectively become load balancers.
You could offer a host suggestion service and recommends to the client which machine they should connect to (based on current load, or whatever), then the client can pin to that until the connection fails.
If the host suggestion service is not there, then the client can simply pick a random host from it internal list, trying them until it connects.
Ideally on first time start up, the client will connect to the host suggestion service and not only get directed to an appropriate host, but a list of other potential hosts as well. This list can routinely be updated every time the client connects.
If the host suggestion service is down on the clients first attempt (unlikely, but...) then you can pre-deploy a list of hosts in the client install so it can start immediately randomly selecting hosts from the very beginning if it has too.
Make sure that your list of hosts is actual host names, and not IPs, that give you more flexibility long term (i.e. you'll "always have" host1.example.com, host2.example.com... etc. even if you move infrastructure and change IPs).
You could take a look how RedDwarf and it's fork DimDwarf are built. They are both horizontally scalable crash-only game app servers and DimDwarf is partly written in Scala (new messaging functionality). Their approach and architecture should match your needs quite well :)
2 cents..
"how to share state between multiple machines in a way that if one of them goes down things keep running"
Don't share state between machines, instead partition state across machines. I don't know your domain so I don't know if this will work. But essentially if you assign certain aggregates ( in DDD terms ) to certain nodes, you can keep those aggregates in memory ( actor, agent, etc ) when they are being used. In order to do this you will need to use something like zookeeper to coordinate which nodes handle which aggregates. In the event of failure you can bring the aggregate up on a different node.
Further more, if you use an event sourcing model to build your aggregates, it becomes almost trivial to have real-time copies ( slaves ) of your aggregate on other nodes by those nodes listening for events and maintaining their own copies.
By using Akka, we get remoting between nodes almost for free. This means that which ever node handles a request that might need to interact with an Aggregate/Entity on another nodes can do so with RemoteActors.
What I have outlined here is very general but gives an approach to distributed fault-tolerance with Akka and ZooKeeper. It may or may not help. I hope it does.
All the best,
Andy

Separated servers or virtualization, which is more efficient?

My company is hosting a few separate, but related, moderately hit, web sites. Accordingly, a production database server, staging database server, production web server, staging web server, etc are needed. My question is, should we invest in physically separate servers for each of our needs, or should we put that money together and invest in a much higher end server and virtualize all of the aforementioned servers? Which route would you guys decide on, and why?
That depends on a lot of things, here are the main considerations.
If you have a lot of servers with low to moderate usage, virtualization should generally save you money on hardware, power, and floorspace. There is a tipping point, however, based on the overhead of the VM layer itself. Honestly, you will have to experiment to find the right cost/performance balance on this. I am sure the VM vendors would be happy to help you with the math.
The downside is that virtualization creates a single point of failure. If that box fails, you have downtime for all of your servers. Having them separate makes it far less likely for everything to take a dive at once.
You certainly want physical separation between the development and the production servers. You shouldn't ever have to worry that something you do in dev could bring down the machine on which production is hosted. And, there are some problems in development that really require either a hard reset of the physical machine or a ludicrous work-around to avoid a hard reset.
As for production web server and production database, you're not really introducing any new points of failure by virtualizing them on the same machine, particularly if you can colocate a static version of the site on another server. For any modern web site of even moderate complexity, database failure is web site failure anyway.
From my experience, for low or moderate usage a VM is the way to go - if you get just one very powerful server instead on several moderately powerful servers you save money, power and space and make the application faster at the same time because it's running on faster hardware.
A VM also have same another nice advantages, if the server hardware fails you can load the same VM on different physical hardware and continue running like nothing happened (you do have full backups, don't you?) and you can take a copy of the actual production server and run it in isolation on a development machine to debug those annoying bug that only appear in production.
But I would put the development (and maybe testing) servers on a different physical machine than production, you need to make sure no matter what stupid mistake you made in development it wouldn't take down the production server.