Ganglia - security when polling metrics over TCP (xml format) from nodes - ganglia

Context: I am a student and I am trying to prepare a proof of concept for quick network-monitoring.
our imaginary context is that we have multiple clusters which are on different subnets. I have read numerous documentations regarding ganglia and what I really want to find out is during node polling, assuming that gmetad is on a different subnet as the node as well, is there any security measure that is utilised to protect sending the XML data over TCP.

It's not entirely clear whether you mean to ask about TCP or UDP transport here, but I assume TCP since that's how gmetad-gmetad and gmetad-gmond communication is done.
The only security measures are the trusted_hosts configuration attribute for gmetad and the access control lists that can be specified for gmond's tcp_accept_channel configuration.
You could perhaps consider a secure tunneled route between the hosts if you're looking to avoid eavesdropping?

Related

Apache NiFi - Listen UDP/TCP ranges

I'm trying to configure the ListenUDP or ListenTCP processors to get input from multiple, but very specific IP's. I'm trying to find out if IP ranges can be used instead of a single IP, this way all my palo altos will go to one processor, and so on.
ListenTCP & ListenUDP do not filter incoming traffic.
You can choose which network interface is used to listen for incoming traffic by setting Local Network Interface - so you can apply normal filtering techniques such as iptables or a firewall to filter what traffic is allowed to reach that network interface.
In theory, recieved messages do write an attribute tcp.sender or udp.sender to each FlowFile. So you could technically filter AFTER the ListenTCP by comparing the value of this Attribute and dropping messages that are not valid...but this is a lot less efficient than filtering network traffic outside of NiFi.

How to effectively establish point to point channel using ZeroMQ?

I have trouble with establishing asynchronous point to point channel using ZeroMQ.
My approach to build point to point channel was that it generates as many ZMQ_PAIR sockets as possible up to the number of peers in the network. Because ZMQ_PAIR socket ensures an exclusive connection between two peers, it needs the same number of peers. My first attempt is realized as the following diagram that represents paring connections between two peers.
But the problem of the above approach is the fact that each pairing socket needs a distinct bind address. For example, if four peers are in the network, then each peer should have at least three ( TCP ) address to bind the rest of peers, which is very unrealistic and inefficient.
( I assume that peer has exactly one unique address among others. Ex. tcp://*:5555 )
It seems that there is no way other than using different patterns, which contain some set of message brokers, such as XREQ/XREP.
( I intentionally avoid broker based approach, because my application will heavily exchange message between peers, which it will often result in performance bottleneck at the broker processes. )
But I wonder that if there is anybody who uses ZMQ_PAIR socket to efficiently build point to point channel? Or is there a way to bypass to have distinct host IP addresses for multiple ZMQ_PAIR sockets to bind?
Q: How to effectively establish ... well,
Given the above narrative, the story of "How to effectively ..." ( where a metric of what and how actually measures the desired effectivity may get some further clarification later ), turns into another question - "Can we re-factor the ZeroMQ Signalling / Messaging infrastructure, so as to work without using as many IP-addresses:port#-s as would the tcp://-transport-class based topology actually need?"
Upon an explicitly expressed limit of having not more than a just one IP:PORT# per host/node ( being thus the architecture's / desing's the very, if not the most expensive resource ) one will have to overcome a lot troubles on such a way forward.
It is fair to note, that any such attempt will come at an extra cost to be paid. There will not be any magic wand to "bypass" such a principal limit expressed above. So get ready to indeed pay the costs.
It reminds me one Project in TELCO, where a distributed-system was operated in a similar manner with a similar original motivation. Each node had an ssh/sshd service setup, where local-port forwarding enabled to expose a just one publicly accessible IP:PORT# access-point and all the rest was implemented "inside" a mesh of all the topological links going through ssh-tunnels not just because the encryption service, but right due to the comfort of having the ability to maintain all the local-port-forwarding towards specific remote-ports as a means of how to setup and operate such exclusive peer-to-peer links between all the service-nodes, yet having just a single public access IP:PORT# per node.
If no other approach will seem feasible ( PUB/SUB being evicted for either traffic actually flowing to each terminal node in cases of older ZeroMQ/API versions, where Topic-filtering gets processed but on the SUB-side, which both security and network Departments will not like to support, or for concentrated workloads and immense resources needs on PUB-side, in cases of newer ZeroMQ/API versions, where Topic-filter is being processed on the sender's side. Adressing, dynamic network peer (re-)discovery, maintenance, resources planning, fault resilience, ..., yes, not any easy shortcut seems to be anywhere near to just grab and (re-)use ) the above mentioned "stone-age" ssh/sshd-port-forwarding with ZeroMQ, running against such local-ports only, may save you.
Anyway - Good Luck on the hunt!

Why exactly binding a socket to multiple ports is not allowed?

Why does this limitation exist? What is the technical reason for it?
AFAIU, ports were introduced to distinguish between facilities (services, connections, etc.) of the same host, so logically the limitation is reasonable. However, SO_REUSEADDR exists to allow one-port-to-many-sockets binding, but not the other way round. It seems practical, because it would spare a system call wasted on multiplexing; many SO questions seek (fruitlessly) a way to do it. But the lack of implementation suggests there are some obstacles I cannot figure.
The reason is that UDP and TCP connections are keyed based on the IP-Port Pair. This is how the stack figures out what goes with what internally.
If we had many ports to one it would require some other mechanism to key the connection so that the proper data would be delivered to the proper application thread/session.

Sending large files between erlang nodes

I have a setup with two Erlang nodes on the same physical machine, and I wanna be able to send large files between the nodes.
From the symptoms I see it looks like there is only one Tcp connection between the nodes, and sending the large binary across stops all other traffic, is this the case?
And even more interesting is there a way of making the vm use several connections between the nodes?
Yeah, you only get 1 connection, according to the manual
The handshake will continue, but A is informed that B has another
ongoing connection attempt that will be shut down (simultaneous
connect where A's name is greater than B's name, compared literally).
Not sure what "big" means in the question, but generally speaking (and imho), it might be good to setup a separate tcp port to handle the payloads, and only use the standard erlang messages as a signaling method (to negotiate ports, setup a listener, etc), like advising there's a new incoming payload and negotiate anything needed.
Btw, there's an interesting thread on the same subject, and you might try tunning the net_* variables to see if they help with the issues.
hope it helps!
It is not recommended to send large messages between erlang nodes,
http://learnyousomeerlang.com/distribunomicon
Refer to "bandwidth is infinite" section, I would recommend use something else like GFS so that you don't lose the distribution feature of erlang.

At the level of IP, does "leave the connection open" have a specific technical meaning - such as intermediate gateways storing an IP map entry?

I am an experienced socket-level programmer in C++, but I do not understand what happens at the IP network level when a socket connection is left open (vs. being closed by calling the close function on the socket from within code).
I have studied the IP header and tried to understand if leaving a socket open has any implications at the IP level.
At the TCP level, leaving a socket open could make sense to me, because perhaps that means the "sequence number" field in the TCP header continues to increment. However, that would be a purely endpoint-based implementation, and therefore could not cut down on transit time for TCP packets. It is my understanding that leaving a connection open generally means that transit time between endpoints across the internet is decreased for packets.
The question is, does it mean anything at the IP level to leave a socket connection open?
The best guess I have is that if a socket connection remains open, that intervening gateways along the complete IP network path will attempt to leave an entry in their mapping table so that the next hop can be executed immediately, without needing to do a broadcast to all connected gateways in order to determine the next hop.
(Perhaps the overhead of DNS lookup is also avoided in this fashion.)
Am I correct in guessing that "leaving a connection open" corresponds to map entries remaining in place on intermediate IP gateways (which speeds up packet transfer)?
Direct answer: No.
Your question suggests that you don't fully understand the purpose of TCP, which is to establish a data stream between two hosts. Keeping that in mind, the purpose of leaving a connection open should be obvious: if you close the connection, the stream will end.
The status of a TCP connection is not visible on the IP level; it's only of relevance to TCP. With the exception of NAT gateways, intermediate hosts do not generally keep track of the status of TCP connections passing through them. (In many cases, it'd be impossible for them to do so -- large routers have far more connections running through them than they could possibly track.)
The best guess I have is that if a socket connection remains open, that intervening gateways along the complete IP network path will attempt to leave an entry in their mapping table so that the next hop can be executed immediately, without needing to do a broadcast to all connected gateways in order to determine the next hop.
This guess is incorrect. A router will have some sort of algorithm for picking a route based on the destination IP, based on a set of routing tables it keeps internally. Read up on BGP for details on how this is determined on large routers; on smaller routers, the routing table is typically defined by the administrator.
First of all, let's clear up a misconception:
that intervening gateways along the complete IP network path will attempt to leave an entry in their mapping table so that the next hop can be executed immediately, without needing to do a broadcast to all connected gateways in order to determine the next hop.
Routers never "broadcast to all connected gateways" in order to determine the next hop. If a packet arrives and the router does not already know how to route it, the packet is simply dropped (possibly with an ICMP error message being sent back to the source). The job of the routing protocols that run on routers is to prepopulate the router's routing table with routes learned from peers so that they are then prepared to receive packets and route them.
Also, "the complete IP network path" is not well-defined. The network path can change at any time as links fail on the network or new links become available. It can even change from one packet to the next in the absence of routing changes due to load balancing.
Back to your question: no, whether or not a socket is closed has no impact on IP. IP is stateless in the sense that every packet is self-contained and routed independently.
Whether or not a socket is closed does make a difference to TCP, but, as you note, that concerns only the two nodes at the endpoints of the connection.
The impact of "leaving a connection open" on speed, such that it is, is that establishing a connection in TCP requires a round-trip. But more to the point, a connection also has semantic meaning to most protocols running on TCP. Two bits of data sent on the same connection are related in a way that two bits of data sent on different connections are not.