Communicating between ZMQ sockets in a distributed system over the internet

Communicating between ZMQ sockets in a distributed system over the internet - sockets

I've been working on a distributed system project, my system is partially p2p. My issue is super simple, I don't know how people usually solve it, because I have no experience in this domain, I am very new to this all.
I want to communicate over the internet between two clients, which both have ZMQ sockets.
On my local network, or my machine, they seem to work fine, but when communicating over the internet, I never get my message. I have shortlisted this to 2 reasons :
1 ) The NAT - it is not letting my message reach the client host, does anyone know how to solve the issue of NAT within ZMQ, I have heard of TCP hole punching and such, how do web developers and other people who deal with this thing often manage this ?
2 ) ZMQ sockets can not communicate over the internet, even if the communication is strictly between the two ZMQ sockets and not BSD sockets etc. I am not sure about this one though.
If anyone has expertise in this area I would be grateful and it would help me move forward thanks !

2 ) ZMQ sockets can not communicate over the internet, even if the communication is strictly between the two ZMQ sockets
Well, 2 ) is easy, ZeroMQ sockets obviously work over the Internet .
There is not much to add to this.
1 ) The NAT - it is not letting my message reach the client host,
The 1 ) will deserve a bit more attention :
Right, NAT could be in place, when local LAN(s) routers are connected to a single public ( registered, coordinated IPv4 / IPv6 address ).
Next, there could be another show-stopper in the game, the FireWall, put in one or more places (!) -- be it a local one ( O/S operated, which one can check, if having administrator-grade login to the localhost ), or a one, integrated into any of the Gateway/Proxy/Policy enforcement.
In any case, thorough design-review ought take place with your local administrator(s), responsible for the localhost-O/S and network-infrastructure elements and a network-wide responsible security-manager / engineer(s).
The "HOW ?" part :
This ( principal ) complexity is exactly why Game Studios try to avoid user's headaches on solving these complexities and try to provide some escape strategy.
For a trivial case 1-to-1 :
One may indeed use a set of rules for a port-forwarding setup ( if FireWall + Gateway engineering permits ) and your ZeroMQ connectivity may get directed onto a
tcp://<public-IP>:<_a_port#_known_to_be_FWDed_to_a_target_HOST_> address.
For more, but still units :
This scenario may seem easy for a single connection setup, yet if in a need to have units or tens of target hosts, there would be a rather limited will from the gateway ( router ) / firewall admins to open more and more ports on the wild side of the security perimeter. Here another trick may help - using a standard ssh-tools, where connections could harness so called local-port-forwarding and remote-port-forwarding, and the interconnects enjoy a single-port to pass firewall + gateway, plus the content is encryption-protected. Sure, more administrative efforts are needed on both sides, yet, a known and smart way to go, if these efforts and a bit increased latency ( encryption + decryption processing added ) do not spoil your in-game UX latency plans.
For more, above a few units :
There is an option of re-using an ad-hoc, yet a security threating dual-sided sword - a multi-player shared (!) VPN, which solves the global "visibility" issues ( most often with some central ( be it published or not ) service-provisioning mapping and authentication serving coordinator ). The localhost side applications simply start to see another "local"-interface, uncoordinated with respect to its IPv4 / IPv6 address(es), yet this private-shared-VPN seems to be joining all the players so as to look as if all of these share one common IP-network, having effectively bypassed all the security/firewalling efforts of the common network practices --- which is at the same time its strongest risk for putting in place ( not mentioning the risk of the principal SPOF in the remote, central, authorisation/mapping service-provider, be theirs set of motivations published or hidden ).
Gaming industry panicked on all these issues since Multi-Player games started to be sold ( which for 2018/Q2 Gamers may look as since ever ) and the industry was trying to avoid exactly all these complexity-related pains, as a dominant fraction of game-buying teenagers was not expected to also have acquired both the patient persistence ( to systematically craft the proper setup ) and the deep-enough knowledge ( so as to know all the system-wide details of where and what to setup or re-configure, so as to unlock a secure end-to-end online in-game visibility ).
For indeed many-to-many cases :
A nice example was started in late 1990-ies / early 2000-ies when IL-2 Sturmovik's Community of Pilots and Virtual Squadrons went operational 24/7/365 "since ever". They used a Community sponsored HyperLobby a Mediating Node to have all these complexities solved once and forever for all interested Members. HyperLobby lightweight client/server service-infrastructure was doing all the port-forwarding mapping setups and other server-side mediations dirty hacks all invisible to Pilot(s) and provided added means for administering indeed many connected Multi-Player Game Theatres for IL-2, F/A-18, Su-27 Flanker, CFS, Medal Of Honor and many more, than I try to remember today ( topping above small tens of thousands of connected Pilots IIRC in peak hours ). A great piece of sound design and distinguished efforts for decades for the truly Global online Community ( having an honor to serve in pseudohistorical VFSQ, with members spanning 13 TimeZones - from Hawaii, Brasil, U.S., U.K., France, Germany, Italy, Greece to Turkey - flying with similarly minded friends from Japan, Australia, New Zealand and many other places, round the globe -- be it East/West Front + Pacific Theater globally coordinated weekend-events - missions reconstructed till the very historical details or enjoyed the Memorial Parade Flyovers on V-Day-s' anniversaries ) -- ~Salute~ Jiri Fojtasek, well done indeed! -- idea of which helps to illustrate the way to go / follow ( as many younger game-portals indeed did follow this path ).

Related

Does a Computer has a physical component for each ports/sockets?

I'm sorry if my question is too easy or obvious. I'm Comp. Science Student (it's my 6.th term).
I'm trying to combine the knowledges that I'm learning from 'Computer Networks and Security', 'Computer Organization' and 'Operating Systems' lectures in this term.
So, I cannot figure out that;
Does a computer or a phone has physical components for each 65535 ports/sockets?
Or
The machine has just one physical component. So that means port numbers are logical representations that can be shown as text-fields (like header entities or json attributes) of a request or something, to say to computer how to handle the request?
I used ports for connections of back-end and android and front-end. I know that a socket is a physical component, I worked 4 months as phone-repairer, I know these. But this makes me confused.
Thanks in advance..

Think of it this way: "the network interface is the phone system, IP-addresses are phone numbers, and socket/port numbers are like telephone extensions. (The network interface is the only physical device.)
Network traffic is carried by so-called "packets" which have various fields which tell the network how they should be routed. The IP-address will get the packet to the proper destination, then the port/socket-number will specify exactly what software process at that destination should actually handle this packet.

Microservice Adapter. One for many or many / countries to one / country. Architectural/deployment decision

Say, I have System1 that connect to System 2 through the adapter-microservice between them.
System1 -> rest-calls --> Adapter (converts request-response + some extra logic, like validation) -> System2
System1 is more like a monolith, exists for many countries (but it may change).
Question is: From the perspective of MicroService architecture and deployment, should the Adapter be one per country. Say Adapter-Uk, Adappter-AU, etc. Or it should be just the Adapter that could handle many countries at the same time?
I mean:
To have a single system/adapter-service :
Advantage: is having one code-base in or place, the adaptive code-logic between countries in 90 % are the same. Easy to introduce new changes.
Disadvantage: once we deploy the system, and there is a bug, it could affect many countries at the same time. Not safe.
To have a separate system:
Disadvantage: once some generic change has been introduced to one system, then it should be "copy-pasted" for all other countries/services. Repetitive, not smart.. work, from developer point of view.
Advantage:
Safer to change/deploy.
Q: what is a preferable way from the point of view of microservice architecture?

I would suggest the following:
the adapter covers all countries in order to maintain single codebase and improve code reusability
unit and/or integration tests to cope with the bugs
spawn multiple identical instances of the adapter with load balancer in front

Since Prabhat Mishra asked in the comment.
After two years.. (took some time to understand what I have asked.)
Back then, for me was quite critical to have a resilient system, i.e. if I change a code in one adapter I did not want all my countries to go down (it is SAP enterprise system, millions clients). I wanted only once country to go down (still million clients, but fewer millions :)).
So, for this case: i would create many adapters one per country, BUT, I would use some code-generated common solution to create them, like common jar - so I would would not repeat my infrastructure or communication layers. Like scaffolding thing.
country-adapter new "country-1"
add some country specific code (not changing any generated one (less code to repeat, less to support))
Otherwise, if you feel safe (like code reviewing your change, making sure you do not change other countries code), then 1 adapter is Ok.
Another solution, is to start with 1 adapter, and split to more if it is critical (feeling not safe about the potential damage / cost if fails).
In general, seems, all boils down to the same problem, which is: WHEN to split "monolith" to pieces. The answer is always: when it causes problems to be as big as it is. But, if you know your system well, you know in advance that WHEN is now (or not).

SOA Service Boundary and production support

Our development group is working towards building up with service catalog.
Right now, we have two groups, one to sale a product, another to service that product.
We have one particular service that calculates if the price of the product is profitable. When a sale occurs, the sale can be overridden by a manager. This sale must also be represented in another system to track various sales and the numbers must match. The parameters of profitability also change, and are different from month to month, but a sale may be based on the previous set of parameters.
Right now the sale profitability service only calculates the profit, it also provides a RESTful URI.
One group of developers has suggested that the profitability service also support these "manager overrides" and support a date parameter to calculate based on a previous date. Of course the sales group of developers disagree. If the service won't support this, the servicing developers will have to do an ETL between two systems for each product(s), instead of just the profitability service. Right now since we do not have a set of services to handle this, production support gets the request and then has to update the 1+ systems associated for that given product.
So, if a service works for a narrow slice, but an exception based business process breaks it, does that mean the boundaries of the service are incorrect and need to account for the change in business process?
Two, does adding a date parameter extend the boundary of the service too much, or should it be excepted that if the service already has the parameters, it would also have a history of parameters as well? At this moment, we don't not have a service that only stores the parameters as no one has required a need for it.
If there is any clarification needed before an answer can be given, please let me know.

I think the key here is: How much pain would be incurred by both teams if and ETL was introduced between to the two?
Not that I think you're over-analysing this, but if I may, you probably have an opinion that adding a date parameter into the service contract is bad, but also dislike the idea of the ETL.
Well, strategy aside, I find these days my overriding instinct is to focus less on the technical ins and outs and more on the purely practical.
At the end of the day, ETL is simple, reliable, and relatively pain free to implement, however it comes with side effects. The main one is that you'll be coupling changes to your service's db schema with an outside party, which will limit options to evolve your service in the future.
On the other hand allowing consumer demand to dictate service evolution is easy and low-friction, but also a rocky road as you may become beholden to that consumer at the expense of others.
Another possibility is to allow the extra service parameters to be delivered to the consumer via a message, rather then across the same service. This would allow you to keep your service boundary intact and for the consumer to hold the necessary parameters local to themselves.
Apologies if this does not address your question directly.

What's a unique, persistent alternative to MAC address?

I need to be able to repeatably, non-randomly, uniquely identify a server host, which may be arbitrarily virtualized and over which I have no control.
A MAC address doesn't work because in some virtualized environments, network interfaces don't have hardware addresses.
Generating a state file and saving it to disk doesn't work because the virtual machine may be cloned, thus duplicating the file.
The server's SSH host keys may be a candidate. They can be cloned like a state file, but in practice they generally aren't because it's such a security problem that it's a mistake not often made.
There's also /var/lib/dbus/machine-id, but that's dependent on dbus. (Thanks Preetam).
There's a cpuid but that's apparently deprecated. (Thanks Bruno Aguirre on Twitter).
Hostname is worth considering. Many systems like Chef already require unique hostnames. (Thanks Alfie John)
I'd like the solution to persist a long time, and certainly across server reboots and software restarts. Ultimately, I also know that users of my software will deprecate a host and want to replace it with another, but keep continuity of the data associated with it, so there are reasons a UUID might be considered mutable over the long term, but I don't particularly want a host to start considering itself to be unknown and re-register itself for no reason.
Are there any alternative persistent, unique identifiers for a host?

It really depends on what is meant by "persistent". For example, two VMs can't each open the same network socket to you, so even if they are bit-level clones of each other it is possible to tell them apart.
So, all that is required is sufficient information to tell the machines apart for whatever the duration of the persistence is.
If the duration of the persistence is the length of a network connection, then you don't need any identifiers at all -- the sockets themselves are unique.
If the persistence needs to be longer -- say, for the length of a boot -- then you can regenerate UUIDs whenever the system boots. (Note that a VM that is cloned would still have to reboot, unless you're hot-copying it.)
If it needs to be longer than that -- say, indefinitely -- then you can generate a UUID identifier on boot and save it to disk, but only use this as part of the identifying information of the machine. If the virtual machine is subsequently cloned, you will know this since you will have two machines reporting the same ID from different sources -- for instance, two different network sockets, different boot times, etc. Since you can tell them apart, you have enough information to differentiate the two cloned machines, which means you can take a subsequent action that forces further differentiation, like instructing each machine to regenerate its state file.
Ultimately, if a machine is perfectly cloned, then by definition you cannot tell which one was the "real one" to begin with, only that there are now two distinguishable machines.
Implying that you can tell the difference between the "real one" and the "cloned one" means that there is some state you can use to record the difference between the two, like the timestamp of when the virtual machine itself was created, in which case you can incorporate that into the state record.

It looks like simple solutions have been ruled out.
So that could lead to complex solutions, like this protocol:
- Client sends tuple [ MAC addr, SSH public host key, sequence number ]
- If server receives this tuple as expected, server and client both increment sequence number.
- Otherwise server must determine what happened (was client cloned? did client move?), perhaps reaching a tentative conclusion and alerting a human to verify it.

I don't think there is a straight forward "use X solution" based on the info available but here are some general suggestions that might get you to a better spot.
If cloning from a "gold image" consider using some "first boot" logic to generate a unique ID. Config management systems like Chef, Puppet or Cf-engine provide some scaffolding to achieve this.
Consider a global state manager like zookeeper. Specifically its atomic counter functionality. Same system could get new ID over time, but it would be unique.
Also this stack overflow might give you some other direction. It references Twitter's approach to a similar problem.

If I understand correctly, you want a durable, globally unique identifier under these conditions:
An OS installation that can be cloned while running, so any state inside the VM won't work, and
Could be running in an arbitrary virtualization environment, so any state outside the VM won't work.
I realize this doesn't directly answer your question, but it really seems like either the design or the constraints need some substantial adjustment to accomodate a solution.

syslog - log line classifications

A very generic question; in the context of a programmer, with operational aspect of the process (program) in mind.
Is there any sort of best-practice / guide to classify messages, particularly in the context of SaaS / multi-tenancy (server) software environment, which would be generating errors and warnings due to user actions or misconfiguration. Due to the nature of the software, most modules that I am having to deal with, are stateless; i.e when an error happens due to user-error, it is quite hard to distinguish between that and an operational error (like network misconfiguration, etc).
What I want to know is from some of you experienced folks; what is the sensible logic to be employed here, in order to make it easy for the operations boys/girls to classify these messages, and identify problems?

Just three aspects from an admin and log analysis/classification perspective:
Make the tag field/program name configurable. Then one can configure multiple instances to use log tags like app/user_1, app/user_2 etc., allowing for fast and simple filters on the syslog level.
Structure you messages from left to right, so one can filter different categories of log lines with simple search patterns or regular expression. E.g. config error - cannot parse line 123 or runtime warning - lost connection to DB xyz
For very structured logs you might also take a look at the 'structured data' field in syslog-protocol. So far it is rarely used and without tool support, but it allows for application log messages with namespaces and very clear key-value-attributes.

Identify the servers and server types (name, ip address, etc.)
Classify by severity, make sure all the clocks are in synch in order
to have the message ordered correctly.
Put a message/error code to filter/create some rules in your monitoring tool.
Put a module (used if several modules on one server)
Put a category for addressing general services like networking, etc.
I guess you will gather the logs from the different machines with their syslog deamon to a central machine in charge of supervision/monitoring.

Most *nix processes log to syslog (or should at least) using a semi-standard format "Month Day 24H-Time host process_name[pid]: message". Syslog incorporates ways to indicate the message's severity, use them (though keep in mind that the severity is from the system's prospective, not the applications).
If message is a debugging problem then it's usually "Function_Name File_Name Line_No Error_Code Error_Desc"; otherwise the format of the message is entirely program dependent.
For multi-tenant systems it's pretty common for the "message" part to start with some form of tenant identification, followed by the actual log message.