What are the benefits of removing fragmentation from IPv6?

What are the benefits of removing fragmentation from IPv6? - sockets

I was working on a project which includes developing an application using java sockets. However while reading some fundamentals and newly upcoming IPv6 paradigm which motivated me to ask below question,
What are the benefits of removing fragmentation from IPv6?
It would be helpful if someone can give me understanding about why?
I have researched on internet but haven't found any useful description.

It is a common mis-understanding that there is no IPv6 fragmentation because the IPv6 header doesn't have the fragment-offset field that IPv4 does; however, it's not exactly accurate. IPv6 doesn't allow routers to fragment packets; however, end-nodes may insert an IPv6 fragmentation header1.
As RFC 5722 states2, one of the problems with fragmentation is that it tends to create security holes. During the late 1990's there were several well-known attacks on Windows 95 that exploited overlapping IPv4 fragments3; furthermore, in-line fragmentation of packets is risky to burn into internet router silicon due to the long list of issues that must be handled. One of the biggest issues is that overlapping fragments buffered in a router (awaiting reassembly) could potentially cause a security vulnerability on that device if they are mis-handled. The end-result is that most router implementations push packets requiring fragmentation to software; this doesn't scale at large speeds.
The other issue is that if you reassemble fragments, you must buffer them for a period of time until the rest are received. It is possible for someone to leverage this dynamic and send very large numbers of unfinished IP fragments; forcing the device in question to spend many resources waiting for an opportunity to reassemble. Intelligent implementations limit the number of outstanding fragments to prevent a denial of service from this; however, limiting outstanding fragments could legitimately affect the number of valid fragments that can be reassembled.
In short, there are just too many hairy issues to allow a router to handle fragmentation. If IPv6 packets require fragmentation, hosts implementations should be smart enough to use TCP Path MTU discovery. That also implies that several ICMPv6 messages need to be permitted end-to-end; interestingly many IPv4 firewall admins block ICMP to guard against hostile network mapping (and then naively block all ICMPv6), not realizing that blocking all ICMPv6 breaks things in subtle ways4.
**END-NOTES:**
See Section 4.5 of the Internet Protocol, Version 6 (IPv6) Specification
From RFC 5722: Handling of Overlapping IPv6 Fragments:
Commonly used firewalls use the algorithm specified
in [RFC1858] to weed out malicious packets that try
to overwrite parts of the transport-layer header in
order to bypass inbound connection checks. [RFC1858]
prevents an overlapping fragment attack on an
upper-layer protocol (in this case, TCP) by recommending
that packets with a fragment offset of 1 be dropped.
While this works well for IPv4 fragments, it will not work
for IPv6 fragments. This is because the fragmentable part
of the IPv6 packet can contain extension headers before
the TCP header, making this check less effective.
See Teardrop attack (wikipedia)
See RFC 4890: Recommendations for Filtering ICMPv6 Messages in Firewalls

I don't have the "official" answer for you, but just based on reading how IPv6 handles datagrams that are too large, my guess would be to reduce the load on routers. Fragmentation and reassembly incurs overhead at the router. IPv6 moves this burden to the end nodes and requires that they perform MTU discovery to determine the maximum datagram size they can send. It stands to reason that the end nodes are better suited for the task because they have less data to process. Effectively, the routers have enough on their plates; it's makes sense to force the nodes to deal with it and allow the routers to simply drop something that exceeds their MTU threshold.
Ideally, the end result would be that routers can handle a larger load under IPv6 (all things being equal) than they did under IPv4 because there is no fragmentation/reassembly that they have to worry about. That processor power can be dedicated to routing traffic.

IPv4 has a guaranteed minimum MTU of 576 bytes, IPv6 is 1,5001,280 bytes, and recommendation is 1,500 bytes, the difference is basically performance. As most end-user LAN segments are 1,500 bytes it reduces network infrastructure overhead for storing state due to additional fragmentation from what are effectively legacy networks that require smaller sizes.
For UDP there is no definition in IPv4 standards about reconstruction of fragmented packets which means every platform can handle it differently. IPv6 asserts that the fragmentation and assembly will always occur in the IP stack and fragments will not be presented to applications.

Related

Is PCI "CF8h/CFCh" IO port addresses only applicable to processors with an IO address space?

Some CPU like x86 processor has two address spaces. One for memory and one for IO. And different instructions to access them.
And the PCI 3.0 spec also mentions some important IO addresses:
Two DWORD I/O locations are used to generate configuration
transactions for PC-AT compatible systems. The first DWORD location
(CF8h) references a read/write register that is named CONFIG_ADDRESS.
The second DWORD address (CFCh) references a read/write register named
CONFIG_DATA.
So it seems PCI 3.0 spec is tightly coupled to processors that does implement IO address space. And that's the a priori knowledge that SW/FW writers should know.
So what about the other processor archs that don't have IO address space? Like ARM. How can they interact with the PCI configuration space?

The paragraph immediately preceding the one quoted in the question directly addresses the question. It says:
Systems must provide a mechanism that allows software to generate PCI configuration transactions. ...
For PC-AT compatible systems, the mechanism for generating configuration transactions is defined and specified in this section. ...
For other system architectures, the method of generating configuration
transactions is not defined in this specification.
In other words, systems that are not PC-AT compatible must provide a mechanism, but it is specified elsewhere. The PCI spec isn't tightly coupled to PC-AT systems, but it doesn't define the mechanism for other types of systems.
The paragraph in the question only applies to PC-AT compatible systems.

Below quote from here clears things up:
The method for generating configuration cycles is host dependent. In
IA machines, special I/O ports are used. On other platforms, the PCI
configuration space can be memory-mapped to certain address locations
corresponding to the PCI host bridge in the host address domain.
And
I/O space can be accessed differently on different platforms.
Processors with special I/O instructions, like the Intel processor
family, access the I/O space with in and out instructions. Machines
without special I/O instructions will map to the address locations
corresponding to the PCI host bridge in the host address domain. When
the processor accesses the memory-mapped addresses, an I/O request
will be sent to the PCI host bridge, which then translates the
addresses into I/O cycles and puts them on the PCI bus.
So for non-IA platform, MMIO can just be used instead. And the platform specs should document that memory-mapped address for the PCI host bridge as the a priori knowledge for SW/FW writers.
ADD 1 - 14:36 2023/2/5
From the digital design's perspective, the host CPU and the PCIe subsystem are just two separate IP blocks. And the communication between them is achieved by a bunch of digital signals in the form of address/data/control lines. As long as the signals can be conveyed, the communication can be made.
For x86 CPUs, the memory address space and IO address space are just different usage of address lines down to the earth. I don't think there's any strong reason that memory addresses cannot be used to communicate with PCIe subsystem. I think it's a more logical choice back then to use I/O addresses for PCIe because PCIe is deemed as I/O.
So the real critical thing I think, is to convey the digital signals in proper format between IPs. PCIe is independent of CPU architectures and cares nothing about what lines to be used. For ARM, there's nothing unnatural to use memory addresses, i.e., MMIO. After all it's digital signals and are capable of passing necessary information properly.

Is transmitting a file over multiple sockets faster than just using one socket?

In this old project (from 2002), It says that if you split a file into multiple chunks and then transmit each chunk using a different socket, it will arrive much faster than transmitting it as a whole using one socket. I also remember reading (many years ago) that some download manager also uses this technique. How accurate is this?

Given that a single TCP connection with large windows or small RTT can saturate any network link, I don't see what benefit you expect from multiple TCP sessions. Each new piece will begin with slow-start and so have a lower transfer-rate than an established connection would have.
TCP already has code for high-throughput, high-latency connections ("window scale option") and dealing with packet loss. Attempting to improve upon this with parallel connections will generally have a negative effect by having more failure cases and increased packet loss (due to congestion which TCP on a single connection can manage).
Multiple TCP sessions is only beneficial if you're doing simultaneous fetches from different peers and the network bottleneck is outside your local network (like bittorrent) or the server is doing bandwidth limitations per connection (at which point you're optimizing for the server, not TCP or the network).

bandwidth overheads for mail with ssl

I have been reading that the overheads of encryption in HTTP protocol is negligible. Is the same true for SMTP too ?
If I send a mail encrypted will the bandwidth consumption be any significantly larger ?

I did a research project on this a few years ago. 1700 data points changing every parameter we could think of, including SSL vs plaintext. We discovered, rather to my surprise, that over practical Internet links SSL was 33% as fast as plaintext. I expected it to me much slower.

There are too many dependencies and variables, key size, chain lenght, used protocol, session key (re-)negotiation, timing issues etc. etc. etc. It is probably best to test this with a good packet analyzer like Wireshark. I would not say the overhead is neglectible, especially not if you send many small messages. In that case you might consider creating a tunnel instead, to avoid initial overhead. Key renegotiation will then play a part as well.
To minimize SSL trafic you might also look into ECC, using stream ciphers, single certificate chains etc. etc. but I would advise you only to go that direction if absolutely necessary, as it will be a more complex and dangerous route.
[EDIT] If you send large messages of 400K each then I would think that overhead is not that large. The ciphertext size won't be too much different from the plain text size, so overhead is minimalized.

what is the difference between memory mapped io and io mapped io

Pls explain the difference between memory mapped IO and IO mapped IO

Uhm,... unless I misunderstood, you're talking about two completely different things. I'll give you two very short explanations so you can google up what you need to now.
Memory-mapped I/O means mapping I/O hardware devices' memory into the main memory map. That is, there will be addresses in the computer's memory that won't actually correspond to your RAM, but to internal registers and memory of peripheral devices. This is the machine architecture Pointy was talking about.
There's also mapped I/O, which means taking (say) a file, and having the OS load portions of it in memory for faster access later on. In Unix, this can be accomplished through mmap().
I hope this helped.

On x86 there are two different address spaces, one for memory, and another one for I/O ports.
The port address space is limited to 65536 ports, and is accessed using the IN/OUT instructions.
As an example, a video card's VGA functionality can be accessed using some I/O ports, but the framebuffer is memory-mapped.
Other CPU architectures only have one address space. In those architectures, all devices are memory-mapped.

Memory mapped I/O is mapped into the same address space as program memory and/or user memory, and is accessed in the same way.
Port mapped I/O uses a separate, dedicated address space and is accessed via a dedicated set of microprocessor instructions.
As 16-bit processors will slowly become obsolete and replaced with 32-bit and 64-bit in general use, reserving ranges of memory address space for I/O is less of a problem, as the memory address space of the processor is usually much larger than the required space for all memory and I/O devices in a system.
Therefore, it has become more frequently practical to take advantage of the benefits of memory-mapped I/O.
The disadvantage to this method is that the entire address bus must be fully decoded for every device. For example, a machine with a 32-bit address bus would require logic gates to resolve the state of all 32 address lines to properly decode the specific address of any device. This increases the cost of adding hardware to the machine.
The advantage of IO Mapped IO system is that less logic is needed to decode a discrete address and therefore less cost to add hardware devices to a machine. However more instructions could be needed.
Ref:- Check This link

I have one more clear difference between the two. The memory mapped I/O device is that I/O device which respond when IO/M is low. While a I/O (or peripheral) mapped I/O device is that which respond when IO/M is high.

Memory mapped I/O is mapped into the same address space as program memory and/or user memory, and is accessed in the same way.
I/O mapped I/O uses a separate, dedicated address space and is accessed via a dedicated set of microprocessor instructions.
The difference between the two schemes occurs within the Micro processor’s / Micro controller’s. Intel has, for the most part, used the I/O mapped scheme for their microprocessors and Motorola has used the memory mapped scheme.
https://techdhaba.com/2018/06/16/memory-mapped-i-o-vs-i-o-mapped-i-o/

Are there applications where the number of network ports is not enough?

In TCP/IP, the port number is specified by a 16-bit field, yielding a total of 65536 port numbers. However, the lower range (don't really know how far it goes) is reserved for the system and cannot be utilized by the application. Assuming that 60,000 port numbers are available, it should be more than plenty for most nework application. However, MMORPG games often have tens of thousands of concurrently connected users at a time.
This got me wondering: Are there situations where a network application can run out of ports? How can this limitation be worked around?

You don't need one port per connection.
A connection is uniquely identified by a tuple of (host address, host port, remote address, remote port). It's likely your host IP address is the same for each connection, but you can still service 100,000 clients on a single machine with just one port. (In theory: you'll run into problems, unrelated to ports, before that.)

The canonical starter resource for this problem is Dan Kegels C10K page from 1999.
The lower range you refer to is probably the range below 1024 on most Unix like systems. This range is reserved for privileged applications. An application running as a normal user can not start listening to ports below 1024.
An upper range is often used by the OS for return ports and NAT when creating connections.
In short, because of how TCP works, ports can run out if a lot of connections are made and then closed. The limitation can be mitigated to some extent by using long-lived connections, one for each client.
In HTTP, this means using HTTP 1.1 and keep-alive.

There are 2^16 = 65536 per IP address. In other words, for a computer with one Ip address to run out of ports it should use more than 65536 ports which will never happen naturally!
You have to understand a socket which is (IP+Port) and the end to end device for communication
IPv4 is 32 bit let's say somehow it can address around 2^32 computers publicly (regardless of NATing).
so now there are 2^16*2^32 = 2^48 public sockets possible (which is in the order of 10^15) so it will not have a conflict (again regardless of NATing).
However IPv6 is introduced to allow more public IPs

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse