Fast http communication - sockets

I want an http message to be sent and processed quickly by a remote server, via an already established persistent TCP connection
How can I optimize the communication?
I have a few ideas, but I am not knowledgeable enough about networking to know if they make sense:
HTTP sits on top of TCP. But how does it work exactly? Specifically, if I send 1 http message, does it translate into only 1 tcp message? (I know the initial handshake takes 3 round trip time, but I do not care about this, as the connection is already established). I guess it depends on the Maximum Segment Size that the server can accept?
Can I ask the server for a bigger maximum segment size if needed? How can I do it (I use python, httplib and socket modules, it would be ideal in this language).
The remote server works with TCP, but could I try sending it UDP messages? I know UDP is faster, but could this idea work?

I'll allow myself to comment/answer in-text:
I have a few ideas, but I am not knowledgeable enough about networking
to know if they make sense:
Exactly! Read up on TCP and HTTP, on wikipedia, for a starter. It will make things much easier to discuss. Probably also faster than asking on stackoverflow ;)
HTTP sits on top of TCP. But how does it work exactly?
Well, exactly like protocols work over each other in a layered protocol stack. Read Wikipedia's TCP article, and about the OSI/ISO layer model.
Specifically, if I send 1 http message, does it translate into only 1
tcp message?
No. HTTP itself doesn't care (and it doesn't have to) into how many lower level packets communication gets split.
(I know the initial handshake takes 3 round trip time,
but I do not care about this, as the connection is already
established).
3 time? That's not something that makes sense. Read about the TCP handshake and how HTTP asks for a document.
I guess it depends on the Maximum Segment Size that the
server can accept?
Among a lot of different other factors; really: HTTP doesn't care the least!
Can I ask the server for a bigger maximum segment size if needed?
No. Your network stack will most probably automatically use the biggest MTU that works.
How can I do it (I use python, httplib and socket modules, it would be
ideal in this language).
The remote server works with TCP, but could I try sending it UDP messages?
There are some specialized HTTP-over-UDP protocols, but they are not HTTP. Generally, HTTP is spoken over TCP, but again, the internet works on a layered protocol stack, and higher level protocols usually don't care what transports their data -- you could perfectly well have an HTTP session over carrier pidgeons!
I know UDP is faster, but could this idea work?
It's not. That's a misconception. UDP doesn't have automatic re-requesting for packets that got lost along the way, which, for things like multimedia or games might make sense, but using TCP gives you an ordered session, which is necessary for HTTP.

Related

UDP sockets Multiplexing/Demultiplexing

There are already some questions regarding this, but I want to ask something very specific that does not seem to be covered by the others. So, I'm aware that the same UDP port (server) can be used by different users - the server stills knows where to deliver the appropriated packages. However, since the UDP port on the server is exactly the same for both users, does this mean that some delay can occur? For instance, if at exact same time 5 users want to use the same server socket and the connection is UDP based, then there will be a delay because the socket can't deal with the 5 connections at the same time. Is this correct? I know that, in practice, this would only happen with a great amount of connection, given that processing time of an UDP connection is faster than TCP, but it could potentially happen. Or am I wrong?

TCP or UDP for lots of connections?

I want to create a P2P network with the following characteristics:
low latency is not really important
loosing packages is okay
the nodes would only send tiny amounts of data around
there will be no NAT/firewall issues, every node has an open port on its public ip
every node is connected to every other node
Usually I would use TCP for anything not time-critical but the last requirements causes the nodes to have lots of open connections for a long time. If I remember correctly, using TCP to connect to 1000 servers would mean I had to use 1000 ports to handle these connections. UDP on the other hand, would only require a single port for each node.
So my question is: Is TCP able to handle the above requirements in a network with e.g. 1000 nodes without tweaking the system? Would UDP be better suited in this case? Is there anything else that would be a deal-breaker for either protocol?
With UDP you control the "connection state" and it is pretty much the best way to do anything peer to peer related IF you have a high number of nodes or care about bandwidth, memory and CPU overhead. By moving all the control to your application in regards to the "connection state" of each node you minimize the amount of wasted resources by making it fit your needs exactly.
You will bypass a lot of operating system specific weirdness that limits the effectiveness of TCP with high numbers of connections. There is TIME_WAIT bloat and tens to hundreds of OS specific settings which will need tweaking for every user of your P2P app if it needs those high numbers. A test app I made which allowed you to use UDP with ack or TCP showed only a 10% difference in performance regardless of operating system using UDP. TCP performance was always lower than the best UDP and its performance varied wildly by over 600% depending upon the OS. With tweaks you can make most OS perform roughly the same using TCP but by default most are not properly tweaked.
So in my opinion it is harder to make a reliable UDP P2P network compared to TCP but it is often needed. However I would only advise that route it if you were quite experienced with networking as there are a lot of "gotchas" to deal with. There are libraries which help with this like Raknet or Enet. They provide ways to do reliable UDP but it still takes a higher amount of networking knowledge to know how this all ties in together, whereas with TCP it is mostly hidden from you.
In a peer to peer network you often have messages like NODE PINGs that you may not care if each one is always received, you just care if you have received one recently. ie You may send a ping every 10 seconds, and disconnect the node after 60 seconds of no ping. This would mean you would need 6 ping packets in a row to fail, which is highly unlikely unless the node is really down. If you received even one ping in that 60 second period then the node is still active. A TCP implementation of this would have involved more latency and bandwidth as it makes sure EACH ping message gets through and will block any other data going out until it does. And since you cannot rely on TCP to reliably tell you if a connection is dead, you are forced to add similar PING features for TCP, on top of all the other things TCP is already doing extra with your packets.
Games also often have data that if its not received by a client it is no big deal because there are more packets coming in a few milliseconds which will invalidate any missed packets. ie Player is moving from A to Z over a time span of 1 second, his client sends out each packet, roughly 40 milliseconds apart ABCDEFG__I__KLMNOPQRSTUVWXYZ Do we really care if we miss "H and J" since every 40ms we are receiving updates? Not really, this is where prediction can come into it, but this is usually not relevant to most P2P projects. If that was TCP instead of UDP then it would have increased bandwidth requirements and added latency to the rest of the packets being received as the data will be resent until it arrives, on top of the extra latency it is already adding by acking everything.
Essentially you can lower latency and network overhead for many messages in a peer to peer network using UDP. However there will always be some messages which NEED to be sent reliably and that requires you to basically implement some reliable way to get packets to that node, similar to that of TCP. And this is where you need some level of expertise if you want a reliable peer to peer network. Some things to look into include sequencing packets with a number, message ACKs, etc.
If you care a lot about efficiency or really need tens of thousands of connections then implementing your specific protocol in UDP will always be better than TCP. But there are cases to be made for TCP, like if the time to make the project matters or if you are a new to network programming.
If I remember correctly, using TCP to connect to 1000 servers would mean I had to use 1000 ports to handle these connections.
You remember wrong.
Take a web server which is listening on port 80 and can handle 1000s of connections at the same time on this single port. This is because a connection is defined by the tuple of {client-ip,client-port,server-ip,server-port}. And while server-ip and server-port are the same for all connections to this server the client-ip and client-port are not. Even if the client-ip is the same (i.e. same client) the client would pick a different source port.
... with e.g. 1000 nodes without tweaking the system?
This depends on the system since each of the open connections needs to preserve the state and thus needs memory. This might be a problem for embedded systems with only little memory.
In any case: if your protocol is just sending small messages and if packet loss, reordering or duplication are acceptable than UDP might be the better choice because the overhead (connection setup, ACK..) is smaller and it takes less memory. You could also use a single socket to exchange data with all 1000 nodes whereas with TCP you would need a separate socket for each connection (socket is not the same as port!). Using only a single socket might allow for a simpler application design.
I want to amend the answer by Steffen with a few points:
1000 connections are nothing for any normal computer and OS.
UDP fits your requirements. It might be easier to program because it is message oriented. TCP provides a stream of bytes. You need to layer a messaging protocol on top of that which is not that easy. Also, you need to handle broken TCP connections by reconnecting.
Ports are not scarce. No problem with consuming 1000 ports.

TCP server vs HTTP server in vert.x

What is the difference between a TCP server/Net server in vertex and HTTP server?
What are the use cases for each?
I tried googling and went through the official website, none of them have a clear explanation.
First off, in General Networking, there are 2 common types of handling connections. This can be done either over TCP (Transmission Control Protocol) or UDP (User Datagram Protocol). The most import difference between these two is that UDP will continuously send out streams/buffers of bytes without checking to see if the network packets made it to the other side of the line. This is useful in situations where security isn't much of an issue and where speed is important. Most VoIP services (Skype, Hangouts), XMPP (chat) and even YouTube (I think) use UDP for their streaming, as it has huge gains on performances and it doesn't matter all that much if a frame made it to the other side of the line, as the person could simply repeat themselves.
TCP on the other hand, is "secure" by default. It performs several handshakes on a regular basis with the endpoint so maintain connectivity and make sure that all packets are received on the other side of the line.
Now, there are a LOT of protocols out there in the Wild Wild West called Internet.
List of TCP and UDP port numbers
As you can see, a lot of protocols support either TCP or UDP. HTTP on it's own is a TCP protocol with port 80 (as you might know). Therefore, HTTPServer is pretty much just an extension of a TCPServer, but with some add-ons such as REST. These add-ons are much welcome as HTTP processing is a pretty common use case. Without HTTPServer, you would need to declare loads of functions on your own.
There are many articles online explaining the difference between HTTP and TCP, so here is: http://www.differencebetween.net/technology/internet/difference-between-tcp-and-http/
Vert.x naturally offers the capacity to do "raw" networking at TCP level or at HTTP-level, the latter offering facilities to deal with the protocol, including decoding TCP packets into HTTTP requests, supporting the creation of HTTP responses, ...

should I be using sockets or packet capture? perl

I'm trying to spec out the foundations for a server application who's purpose will be to..
1 'receive' tcp and/or udp packets
2 interpret the contents (i.e. header values)
To add more detail, this server will receive 'sip invites' and respond with a '302 redirect'.
I have experience with Net::Pcap and perl, and know I could achieve this by looping for filtered packets, decoding and then using something like Net::SIP to respond.
However, there's a lot of bloat in both of these modules/applications I don't need. The server will be under heavy load, and if I run TCPDUMP on it's own, it loses packets in the kernel due to server load, so worry it wont be appropriate :(
Should I be able to achieve the same thing by 'listening' on a socket (using IO::Socket for example) and decoding a packet?
Unfortunatly by debugging, it's hard to tell if IO::Socket will give me the opportunity to see a raw packet? And instead it automatically decodes the message to a readable format!
tl;dr: I want to capture lots of SIP Invites, analyse the head values, and respond with a SIP 302 redirect. Is there a better way than using tcpdump (via Net::Pcap) to achieve this?
Thanks,
Moose
Is there a better way than using tcpdump (via Net::Pcap) to achieve this?
Yes. Using libpcap (that's what you meant instead of tcpdump in that question) is a bad way to implement a TCP-based service, as you will have to reimplement much of TCP yourself (libpcap gives you raw network-layer packets), and the packets your program gets will also get delivered to the Internet protocol stack on your machine, so:
if there's nothing on your machine listening on the TCP port to which the other machines are trying to connect, the connection requests will get a RST from the TCP code and think the connection attempt failed;
if there is something on your machine listening on that port, it'll probably accept the connection, and it and your program will both try to communicate with the other machine, which will probably confuse its TCP stack and cause various bad and random things to happen.
It's not much better for UDP:
if there's nothing on your machine listening on the UDP port to which the other machines are trying to connect, the connection requests will probably get an ICMP Port Unreachable message from the UDP code, which may make it think the connection attempt failed;
if there is something on your machine listening on that port, it'll probably accept the connection, and it and your program will both try to communicate with the other machine, which will probably confuse its SIP stack and cause various bad and random things to happen.
IO:Socket will probably not give you raw packets, and that's a good thing; you won't have to implement your own IP and TCP/UDP stack. If your goal is to implement a redirect server on your machine, you have no need to receive raw packets; you want to receive SIP INVITEs with all the lower-level processing done for you by your machine's IP/TCP/UDP stack.
If you already have a SIP implementation on your machine, and you want to act as a "firewall" for it, so that, for some INVITEs, you send back a 302 redirect and prevent the SIP implementation on your machine from ever seeing the INVITEs in question, you will need to use the same mechanism that your particular OS uses to implement firewalls. There is no libpcap-like wrapper for those mechanisms, as far as I know.

Single source pushing: how to send 5kb each 5 minutes to 50000 clients

I need to implement a client server architecture where the server sends
the same message to many clients over the internet.
I need to send a single message every 5 minutes about.
The message won't excede 5KB.
I need the solution to scale to a big number of clients connected (50.000-100.000)
I considered a bunch of solutions:
TCP Sockets
UDP Multicast
WCF http duplex service (comet)
I think I have to discard UDP solution because it is a good solution only for clients on the same network and it won't work over the internet.
I read somewhere that WCF multicast will cause a bottleneck if I have many clients connected but I can't find anywhere documentation showing performance statistics.
Tcp sockets seems to me the solution to chose.
What do you think about? Am I correct?
I'm certainly wrong when I say UDP doesn't work on internet... I thought
this because I read some articles pointing out that you need properly
configured routers in the network to support multicasting... I read of the
udp ports multicast range and thought it was meant to be locally.
Instead, the range 224.0.0.1 - 239.255.255.255 (Class D address group), can be reached over the internet
Considering that in my case reliability is not a crucial point, the udp multicast is a good choice.
The .net framework offers really helpful classes to accomplish this.
I can easily start an UdpClient and begin send data on a multicast address with two lines of code.
At client side it is really easy to.
There is the UdpSingleSourceMulticastClient class that does exactly what I need.
For what concernes reliability and security the .net framework has a smart and simple way of handle DoS attacks, DNS Rebinding attacks and Revers tunnel attacks that is described here: http://msdn.microsoft.com/en-us/library/ee707325(v=vs.95).aspx
The main question is: Do you care if the updates get to the clients?
If you DO then you will need to build something on top of UDP to add reliability. UDP datagrams are NOT reliable and so you should expect that some wont get to the destination. This is more likely if you are pushing UDP datagrams out quickly. Note that your clients might also get multiple copies of the same datagram in some situations with UDP.
50-100k connections with this level of traffic shouldn't be that difficult to achieve with TCP if you have a decent architecture.
See here for some blog posts that I've done on the subject.
http://www.serverframework.com/asynchronousevents/2010/10/how-to-support-10000-concurrent-tcp-connections.html
http://www.serverframework.com/asynchronousevents/2010/10/how-to-support-10000-or-more-concurrent-tcp-connections---part-2---perf-tests-from-day-0.html
http://www.serverframework.com/asynchronousevents/2010/12/one-million-tcp-connections.html
And here's some example code that deals with sending data to many clients.
http://www.serverframework.com/ServerFramework/latest/Docs/examples-datadistributionservers.html
Unicast (tcp sockets) will work fine for a relatively small amount of traffic such as this, but keep on top of multicasting technology, the situation is changing every year.