Is there a way to tell if an RV inbox has a valid, active endpoint?
I have a system where clients create RV Inboxes. These are then passed to other components in the system which can use the Inbox to send messages to the client.
I would like a Monitor process in my system to know if a client has died. The Monitor will have the client Inbox.
I can implement a heartbeating mechanism, but I wonder if there is a mechanism within RV which can tell me whether the Inbox is still valid - ie, whether messages sent on it will be routed to an active client.
I guess that RV itself must know this - as it will know whether it can send a message to the Inbox or not. Is there a way for my code to be able to access this information, or to make a test about whether an Inbox is valid at a given time?
Inboxes use direct connections. From the documentation:
Transport objects can create inbox names,
designating a destination that is unique
to that transport object and its process.
Rendezvous software
uses point-to-point
techniques to deliver messages with inbox subject names.
And, direct communication is really just UDP
Direct communication uses RPTP over a UDP channel.
So you have two problems - first, mapping the inbox to the ip address and UDP port, and two, telling if it's what you want.
I haven't been able to find any way to solve the first one - those inboxes are opaque, though presumably someone clever could figure it out.
For the second one, it seems like it's not so easy a problem to solve in general - from our sister site
UDP ports only have two states: listening or not. That usually translates to "having a socket open on it by a process" or "not having any socket open". The latter case should be easy to detect since the system should respond with an ICMP Destination Unreachable packet with code=3 (Port unreachable). Unfortunately many firewalls could cut those packets so if you don't get anything back you don't know for sure if the port is in this state or not. And let's not forget that ICMP is session less too and doesn't do retransmissions: the Port Unreachable packet could very well be lost somewhere on the net.
I would guess, the easiest way to do this is with a heartbeating mechanism. In the past, we had a library around RV that would publish all of the clients and any subjects subscribed to (including inboxes) out for various management processes.
(Not really answering my own question - this is from a work colleague.)
I came across this when I was working on an RV bridge that we had to write last year, as it was proxying and rewriting inboxes. It turns out RV creates “throwaway” inboxes for certain operations (the blocking ones), and no message is sent (on that multicast, anyway) to say that the inbox is gone. The way inboxes work seems to be that:
The inbox encodes the IP address of the host, a unique process ID for the client on that host/transport, and an incrementing counter
The sender just sends a unicast packet to the right IP and port (service)
The daemon (if it has clients on that service) checks if it has any active subscriptions on that subject (just like for multicast), and passes it on to the client
In other words, it’s the receiving daemon which has to discard messages sent on closed inboxes, and there is no equivalent to the “LISTEN.STOP” messages that accompany regular subscription ends.
Related
I'm starting to use golang for a quite amount of time for a project. In my project I have to implement a tcp server which responds to tcp clients. The server has to send a number of messages to a client.
The problem is that when a server writes a message to a client connection, it has to wait until the client has read that message from buffer and then send another message (the server has to wait until the client calls the reader.ReadString('\n') method).
In my server code I wrote:
for {
data := <-client.outgoing
client.writer.WriteString(data + "\n")
client.writer.Flush()
}
but the server sends all the messages to client without waiting for ReadString in client.
How to make server wait until the client read a message and then send the other message?
I think that either the assignment is ambiguous or you're misinterpreting it and solving the XY problem.
The short answer is that you can never know whether the client has read a message just by looking at the TCP conversation. You have to implement this "protocol" in your application.
Here are a few problems:
From your application you don't really have access to what TCP is doing. You get a stream on which you can perform I/O.
The fact that a write to your stream "succeeds" only means that TCP has agreed to try to transport your stuff and has an independent copy. It doesn't say anything about whether the data has been received and it doesn't even mean the data has been even sent
You may find certain mechanisms to peer into TCP's inner workings (such as ioctls, SIOCINQ, SIOCOUTQ or various setsockopts): these won't help
Even if you find out what your TCP is doing, this only tells you what the remote TCP is doing. So if you have full control over your TCP and even see the acknowledgments from the peer, you still don't know what the application is doing. It's very possible the application didn't read the data yet (it might not have requested the data, the TCP might be withholding it in a buffer for some weird reason, the scheduler might not have scheduled the remote process etc.)
Going back to your question, a way to really know whether the remote application has received your message is to have the remote application tell you. This means you have to restructure your protocol to:
Send stuff from the server
Wait for a message from the application telling you it received your stuff
Send more stuff (because you know from point 2 it's safe to do so)
The following is my recent interview experience with a reputed network software company. I was asked questions about interconnecting TCP level and web requests and that confused me a lot. I really would like to know expert opinions on the answers. It is not just about the interview but also about a fundamental understanding of how networking work (or how application layer and transport layer cross-talk, if at all they do).
Interviewer: Tell me the process that happens behind the scenes when
I open a browser and type google.com in it.
Me: The first thing that happens is a socket is created which is
identified by {SRC-IP, SRC-PORT, DEST-IP, DEST-PORT, PROTOCOL}. The
SRC-PORT number is a random number given by the browser. Usually the TCP/IP
connection protocol (three-way handshake is established). Now
both the client (my browser) and the server (Google) are ready to handle
requests. (TCP connection is established).
Interviewer: Wait, when does the name resolution happen?
Me: Yep, I am sorry. It should have happened before even the socket is created.
DNS name resolve happens first to get the IP address of Google to
reach at.
Interviewer: Is a socket created for DNS name resolution?
Me: hmm, I actually do not know. But all I know DNS name resolution is
connectionless. That is, it's not TCP but UDP. Only a single
request-response cycle happens. (So there is a new socket created for DNS
name resolution).
Interviewer: google.com is open for other requests from other
clients. So is establishing your connection with Google blocking
other users?
Me: I am not sure how Google handles this. But in a typical socket
communication, it is blocking to a minimal extent.
Interviewer: How do you think this can be handled?
Me: I guess the process forks a new thread and creates a socket to handle my
request. From now on, my socket endpoint of communication with
Google is this child socket.
Interviewer: If that is the case, is this child socket’s port number
different than the parent one?
Me: The parent socket is listening at 80 for new requests from
clients. The child must be listening at a different port number.
Interviewer: How is your TCP connection maintained since your destination port number has changed. (That is the src-port number sent on Google's packet) ?
Me: The dest-port that I see as a client is always 80. When
a response is sent back, it also comes from port 80. I guess the OS/the
parent process sets the source port back to 80 before it sends back the
post.
Interviewer: How long is your socket connection established with
Google?
Me: If I don’t make any requests for a period of time, the
main thread closes its child socket and any subsequent requests from
me will be like I am a new client.
Interviewer: No, Google will not keep a dedicated child socket for
you. It handles your request and discards/recycles the sockets right
away.
Interviewer: Although Google may have many servers to serve
requests, each server can have only one parent socket opened at port 80. The number of clients to access Google's webpage must be larger than the number of servers they have. How is this usually handled?
Me: I am not sure how this is handled. I see the only way it could
work is spawn a thread for each request it receives.
Interviewer: Do you think the way Google handles this is different from
any bank website?
Me: At the TCP-IP socket level, it should be
similar. At the request level, slightly different because a session
is maintained to keep state between requests for banks' websites.
If someone can give an explanation of each of the points, it will be very helpful for many beginners in networking.
How many sockets do Google open for every request it receives?
This question doesn't actually appear in the interview, but it's in your title so I'll answer it. Google doesn't open any sockets at all. Your browser does that. Google accepts connections, in the form of new sockets, but I wouldn't describe that as 'opening' them.
Interviewer : Tell me the process that happens behind the scene when I open a browser and type google.com in it.
Me : The first thing that happens is a socket is created which is identified by {SRC-IP, SRC-PORT, DEST-IP, DEST-PORT, PROTOCOL}.
No. The connection is identified by the tuple. The socket is an endpoint to the connection.
The SRC-PORT number is a random number given by browser.
No. By the operating system.
Usually TCP/IP connection protocol (three way handshake is established). Now the both client (my browser) and server (google) are ready to handle requests. (TCP connection is established)
Interviewer: wait, when does the name resolution happens?
Me: yep, I am sorry. It should have happened before even the socket is created. DNS name resolve happens first to get the IP address of google to reach at.
Interviewer : Is a socket created for DNS name resolution?
Me : hmm, Iactually do not know. But all I know DNS name resolution is a connection-less. That is it not TCP but UDP. only a single request-response cycle happens. (so is a new socket created for DNS name resolution).
Any rationally implemented browser would delegate the entire thing to the operating system's Sockets library, whose internal functioning depends on the OS. It might look at an in-memory cache, a file, a database, an LDAP server, several things, before going out to a DNS server, which it can do via either TCP or UDP. It's not a great question.
Interviewer: google.com is open for other requests from other clients. so is establishing you connection with google is blocking other users?
Me: I am not sure how google handles this. But in a typical socket communication, it is blocking to a minimal extent.
Wrong again. It has very little to do with Google specifically. A TCP server has a separate socket per accepted connection, and any rationally constructed TCP server handles them completely independently, either via multithreading, muliplexed/non-blocking I/O, or asynchronous I/O. They don't block each other.
Interviewer : How do you think this can be handled?
Me : I guess the process forks a new thread and creates a socket to handle my request. From now on, my socket endpoint of communication with google is this this child socket.
Threads are created, not 'forked'. Forking a process creates another process, not another thread. The socket isn't 'created' so much as accepted, and this would normally precede thread creation. It isn't a 'child' socket.
Interviewer: If that is the case, is this child socket’s port number different than the parent one.?
Me: The parent socket is listening at 80 for new requests from clients. The child must be listening at a different port number.
Wrong again. The accepted socket uses the same port number as the listening socket, and the accepted socket isn't 'listening' at all, it is receiving and sending.
Interviewer: how is your TCP connection maintained since your Dest-port number has changed. (That is the src-port number sent on google's packet) ?
Me: The dest-port as what I see as client is always 80. when request is sent back, it also comes from port 80. I guess the OS/the parent process sets the src port back to 80 before it sends back the post.
This question was designed to explore your previous wrong answer. Your continuation of your wrong answer is still wrong.
Interviewer : how long is your socket connection established with google?
Me : If I don’t make any requests for a period of time, the main thread closes its child socket and any subsequent requests from me will be like am a new client.
Wrong again. You don't know anything about threads at Google, let alone which thread closes the socket. Either end can close the connection at any time. Most probably the server end will beat you to it, but it isn't set in stone, and neither is which if any thread will do it.
Interviewer : No, google will not keep a dedicated child socket for you. It handles your request and discards/recycles the sockets right away.
Here the interviewer is wrong. He doesn't seem to have heard of HTTP keep-alive, or the fact that it is the default in HTTP 1.1.
Interviewer: Although google may have many servers to serve requests, each server can have only one parent socket opened at port 80. The number of clients to access google webpage must be exceeding larger than the number of servers they have. How is this usually handled?
Me : I am not sure how this is handled. I see the only way it could work is spawn a thread for each request it receives.
Here you haven't answered the question at all. He is fishing for an answer about load-balancers or round-robin DNS or something in front of all those servers. However his sentence "the number of clients to access google webpage must be exceeding larger than the number of servers they have" has already been answered by the existence of what you are both incorrectly calling 'child sockets'. Again, not a great question, unless you've reported it inaccurately.
Interviewer : Do you think the way Google handles is different from any bank website?
Me: At the TCP-IP socket level, it should be similar. At the request level, slightly different because a session is maintained to keep state between requests in Banks websites.
You almost got this one right. HTTP sessions to Google exist, as well as to bank websites. It isn't much of a question. He should be asking for facts, not your opinion.
Overall, (a) you failed the interview completely, and (b) you indulged in far too much guesswork. You should have simply stated 'I don't know' and let the interview proceed to things that you do know about.
For point #6, here is how I understand it: if both ends of an end to end connection were the same as that of another socket, there would indeed be no way to distinguish both socket, but if a single end is not the same as that of the other socket, then it's easy to distinguish both. So there is not need to turn destination port 80 (the default) forth and back, since the source ports differ.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
This question got me thinking, and I now realize that I don't know anything about the internals of MTAs.
What exactly does an MTA do? Everything after the SMTP protocol seems like dark magic to me. Let's say that I wanted to code a minimalistic MTA (or MDA) just for sending emails, what would I need to learn/do?
Edit: I don't actually plan on writing an MTA, I just want to understand how it works internally.
--- edit after somehow noticing you talked about possibly writing a MTA ---
To write a MTA, you need to open a server socket. When someone connects, you need to send and receive text (ascii) data on that socket in compliance with the SMTP protocol. SMTP is very chatty, so you can expect a few rounds of communication.
The initial round of communication typically tells you whether SMTP is supported or ESMTP is supported. The second (optional) round of communication is to determine security / encryption / feature support. Eventually the "client" side will ask to send a message to a particular address / set of addresses. When done, the server will indicate that it's ready to get the body of the email message. When the body of the message (and it's optinal attachments) have all been transmitted, the MTA will tell you it received the message fine. At that point in time, the MTA will act as a client to other MTAs discovered via DNS MX records to get your email closer to it's destination MTA which will copy it into someone's inbox.
So an MTA is needed because mail delivery on the client side is the equivalent to handing a physical letter to a post office. Post offices are responsible for inter-postoffice routing (which parallels to MTA-to-MTA transmission). The destination Post office is then responsible for delivery of the letter to the post office box or home address (which parallels one's computer inbox).
They don't call it e-mail for nothing.
--- original post follows ---
A MTA will accept a mail message, see if it can forward or deliver it, respond if it can be forwarded or delivered, and then forward or deliver it if it indicated it could.
How the message gets closer to it's final destination usually has a bit to do with DNS. MX (mail exchange) records in DNS indicate servers which are responsible (or at least closer to the responsible server) for particular email domain names. It is not possible to fully understand how a mail message gets closer to it's destination without understanding how DNS works.
A MTA typically looks at the delivery address, and either is configured to be the "end point" of the email address's mail domain, or knows that server XYZ is one hop closer to the email address's mail domain. If it's an endpoint, it will copy the message from the wire into someone's inbox. If it's relaying it will "forward" the message to the next MTA.
Here ya go: http://en.wikipedia.org/wiki/Message_transfer_agent
Quickly, the MTA receives the raw message, decides where it's ultimate destination is, and then forwards the message on to that destination.
A very simple MTA can be written the delivers only to local inboxes. The MTA is an "easier" part of the system to write because you can behave badly but still be functional, so your interoperability with other systems is less of an issue (that's where much of the complexities of email lie nowadays, that and spam/virus checking).
The real contract of an MTA is simply that if you accept the message from the system sending it to you, you accept responsibility to deliver that message. Thus, when that socket closes with an acknowledgement of acceptance, the delivering systems job is done and it's all in your hands.
If you happen to do a crummy job, mail is lost, and it's your problem. But it's still fun to play around with.
Edit: The original tutorial I linked to has gone 404. Here's another that's ok: https://troubleshootguru.wordpress.com/2014/07/06/mail-server-components-mta-mda-mua/
In short, a MUA is a user client that uses SMTP to send an email to an MTA. The MTA is a server that is responsible for routing the MTA to its destination. If that destination is another server, the MTA hands the email to an MDA. The MDA is a client on the server that uses SMTP to forward the email to the other server, which is also an MTA.
So what do you need to learn? If you want to write an MUA or MDA, you need to learn how to open a socket to another computer, send SMTP commands, and receive SMTP responses. If you want to write an MTA, you need to learn how to listen for socket connections on a port, receive SMTP commands, and send SMTP responses.
If you like Java, try the code on this page as a starting point for a client.
I have clients that need to all connect to a single server process. I am using UDP discovery for the clients to find the server. I have the client and server exchange IP address and port number, so that a TCP/IP connection can be established after completion of the discovery. This way the packet size is kept small. I see that this could be done in one of two ways using UDP:
Each client sends out its own multicast message in search of the server, which the server then responds to. The client can repeat sending this multicast message in regular intervals (in the case that the server is down) until the server responds.
The server sends out a multicast message beacon at regular intervals. The clients subscribe to the multicast group and in this way receives the server's multicast message and complete the discovery.
In 1. if there are many clients then initially there would be many multicast messages transmitted (one from each client). Only the server would subscribe and receive the multicast messages from the clients. Once the server has responded to the client, the client ceases to send out the multicast message. Once all clients have completed their discovery of the server no further multicast messages are transmitted on the network. If however, the server is down, then each client would be sending out a multicast message beacon in intervals until the server is back up and can respond.
In 2. only the server would submit a multicast message beacon in regular intervals. This message would end up getting routed to all clients that are subscribed to the multicast group. Once the clients receive the packet the client's UDP listening socket gets closed and they are no longer subscribed to the multicast group. However, the server must continue to send the multicast beacon, so that new clients can discover it. It would continue sending out the beacon at regular intervals regardless of whether any clients are out their requiring discovery or not.
So, I see pros and cons either way. It seems to me that #1 would result in heavier load initially, but this load eventually reduces down to zero. In #2 the server would continue sending out a beacon forever.
UDP and multicast is a fairly new topic to me, so I am interested in finding out which would be the preferred approach and which would result in less network load.
I've used option #2 in the past several times. It works well for simple network topologies. We did see some throughput problems when UDP datagrams exceeded the Ethernet MTU resulting in a large amount of fragmentation. The largest problem that we have seen is that multicast discovery breaks down in larger topologies since many routers are configured to block multicast traffic.
The issue that Greg alluded to is rather important to consider when you are designing your protocol suite. As soon as you move beyond simple network topologies, you will have to find solutions for address translation, IP spoofing, and a whole host of other issues related to the handoff from your discovery layer to your communications layer. Most of them have to do specifically with how your server identifies itself and ensuring that the identification is something that a client can make use of.
If I could do it over again (how many times have we uttered this phrase), I would look for standards-based discovery mechanisms that fit the bill and start solving the other protocol suite problems. The last thing that you really want to do is come up with a really good discovery scheme that breaks the week after you deploy it because of some unforeseen network topology. Google service discovery for a starting list. I personally tend towards DNS-SD but there are a lot of other options available.
I would recommend method #2, as it is likely (depending on the application) that you will have far more clients than you will servers. By having the server send out a beacon, you only send one packet every so often, rather than one packet for each client.
The other benefit of this method, is that it makes it easier for the clients to determine when a new server becomes available, or when an existing server leaves the network, as they don't have to maintain a connection to each server, or keep polling each server, to find out.
Both are equally viable methods.
The argument for method #1 would be that in normal principle, clients initiate requests, and servers listen and respond to them.
The argument for method #2 would be that the point of multicast is so that one host can send a packet and it can be received by many clients (one-to-many), so it's meant to be the reverse of #1.
OK, as I think about this I'm actually drawn to #2, server-initiated beacon. The problem with #1 is that let's say clients broadcast beacons, and they hook up with the server, but the server either goes offline or changes its IP address.
When the server is back up and sends its first beacon, all the clients will be notified at the same time to reconnect, and your entire system is back up immediately. With #1, all of the clients would have to individually realize that the server is gone, and they would all start multicasting at the same time, until connected back with the server. If you had 1000 clients and 1 server your network load would literally be 1000x greater than method #2.
I know these messages are most likely small, and 1000 packets at a time is nothing to a UDP network, but just from a design standpoint #2 feels better.
Edit: I feel like I'm developing a split-personality disorder here, but just thought of a powerful point of why #1 would be an advantage... If you ever wanted to implement some sort of natural load balancing or scaling with multiple servers, design #1 works well for this. That way the first "available" server can respond to the client's beacon and connect to it, as opposed to #2 where all the clients jump to the beaconing server.
Your option #2 has a big limitation in that it assumes that the server can communicate more or less directly with every possible client. Depending on the exact network architecture of your operational system, this may not be the case. For example, you may be depending that all routers and VPN software and WANs and NATs and whatever other things people connect networks together with, can actually handle the multicast beacon packets.
With #1, you are assuming that the clients can send a UDP packet to the server. This is an entirely reasonable expectation, especially considering the very next thing the client will do is make a TCP connection to the same server.
If the server goes down and the client wants to find out when it's back up, be sure to use exponential backoff otherwise you will take the network down with a packet storm someday!
UDP doesnot sends any ack back, but will it send any response?
I have set up client server UDP program. If I give client to send data to non existent server then will client receive any response?
My assumption is as;
Client -->Broadcast server address (ARP)
Server --> Reply to client with its mac address(ARP)
Client sends data to server (UDP)
In any case Client will only receive ARP response. If server exists or not it will not get any UDP response?
Client is using sendto function to send data. We can get error information after sendto call.
So my question is how this info is available when client doesn't get any response.
Error code can be get from WSAGetLastError.
I tried to send data to non existent host and sendto call succeeded . As per documentation it should fail with return value SOCKET_ERROR.
Any thoughts??
You can never receive an error, or notice for a UDP packet that did not reach destination.
The sendto call didn't fail. The datagram was sent to the destination.
The recipient of the datagram or some router on the way to it might return an error response (host unreachable, port unreachable, TTL exceeded). But the sendto call will be history by the time your system receives it. Some operating systems do provide a way to find out this occurred, often with a getsockopt call. But since you can't rely on getting an error reply anyway since it depends on network conditions you have no control over, it's generally best to ignore it.
Sensible protocols layered on top of UDP use replies. If you don't get a reply, then either the other end didn't get your datagram or the reply didn't make it back to you.
"UDP is a simpler message-based connectionless protocol. In connectionless protocols, there is no effort made to set up a dedicated end-to-end connection. Communication is achieved by transmitting information in one direction, from source to destination without checking to see if the destination is still there, or if it is prepared to receive the information."
The machine to which you're sending packets may reply with an ICMP UDP port unreachable message.
The UDP protocol is implemented on top of IP. You send UDP packets to hosts identified by IP addresses, not MAC addresses.
And as pointed out, UDP itself will not send a reply, you will have to add code to do that yourself. Then you will have to add code to expect the reply, and take the proper action if the response is lost (typically resend on a timer, until you decide the other end is "dead"), and so on.
If you need reliable UDP as in ordering or verification such that TCP/IP will give you take a look at RUDP or Reliable UDP. Sometimes you do need verification but a mixture of UDP and TCP can be held up on the TCP reliability causing a bottleneck.
For most large scale MMO's for isntance UDP and Reliablity UDP are the means of communication and reliability. All RUDP does is add a smaller portion of TCP/IP to validate and order certain messages but not all.
A common game development networking library is Raknet which has this built in.
RUDP
http://www.javvin.com/protocolRUDP.html
An example of RUDP using Raknet and Python
http://pyraknet.slowchop.com/