IPv6 address entropy and hashing - hash

I have one thread that delegates incoming IPv6 connections to X worker threads, where connections with the same IPv6 source address should always be sent to the same worker thread. However, given a large amount of connections from all over the Internet, I also want to minimize the differences in load all X worker threads have to take on (i.e. in any given large time interval all worker threads should be entrusted with connections with approximately the same number of unique IPv6 addresses).
The most naive way to do so would be to interpret all or a section of the IPv6 address as a number N, and send it to the worker thread with index [N mod X]. However, IP addresses have low entropy (i.e. some bits are (un)set more often than others and thus are quite unrandom), so that would result in poor load balancing among the worker threads.
Clearly some kind of low-cost hashing needs to performed on the addresses to get keys with higher entropy. So are there documented solutions for this problem? Doesn't need to result in fantastic randomness, just something known to work reasonably well with IPv6 addresses.
Edited to remove the sentence about avoiding collisions (because in this scenario, collisions are actually totally fine).

Related

I want to know the exact concept of virtual address

Virtual address is described as linear address in some places, and logical address in others.
I'd like to know which one is right with the clear concept of virtual address.
The concept of virtual addresses is that you have a fake/pretend address space and convert/map that somehow to the real/physical address space for one or more reasons (to improve flexibility, to improve portability, to improve security, etc). How this is implemented in practice doesn't really effect the theoretical concept.
For the implementation of the concept on 80x86; virtual addresses are converted into linear addresses using segmentation, then linear addresses are converted into physical addresses using paging. However; segmentation can be configured so that "virtual = linear" (by setting segment bases to zero and segment limits to max., including in 64-bit code if FS and GS are configured so that they do nothing); and paging can be disabled resulting in "linear = physical"; and if neither segmentation nor paging are used you end up with "virtual = linear = physical".
Most operating systems for 80x86 don't use segmentation but do use paging; so virtual addresses can be described as linear addresses for most operating systems (and most applications) on 80x86; but "technically can" isn't a good reason for increasing confusion and almost nobody would call them linear addresses (instead of virtual addresses) without a reason - normally you'd only see the word "linear" used if the difference might matter.
For logical addresses, I have no idea where you saw that, and without context I'd (correctly or incorrectly) assume it's related to storage space and has nothing to do with memory (e.g. "logical block address" as an alternative to "cylinder, head, sector addressing" for old hard disks).
The three basic concept you need to know:
Physical - An actual, specific device
Logical - A redirection to a device
Virtual - A simulated device
In ye olde days before large memory system, virtual and logical were often conflated in regard to addresses. In reality, there is no such thing as a virtual address. A logical address can map to a nothing at all, a physical address, or memory that is simulated virtually.
You can have virtual memory that is accessed by logical addresses.

Beacon size vs message size in Wireless Ad-Hoc Networks

I'm working on neighbor discovery protocols in wireless ad-hoc networks. There are many protocols that rely only on beacon messages between nodes when the discovery phase is going on. On the other hand, there are other approaches that try to transmit more information (like a node's neighbor table) during the discovery, in order to accelerate it. Depending on the time needed to listen to those messages the discovery latency and power consumption varies. Suppose that the same hardware is used to transmit them and that there aren't collisions.
I read that beacons can be sent extremely fast (less than 1ms easily) but I haven't found anything about how long it takes to send/receive a bigger message. Let say a message carrying around 50-500 numbers representing all the info about your neighbors. How much extra power is needed?
Update
Can this bigger message be divided into a bunch of beacon size messages? If it does, then I suppose the power used to transmit/listen grows linearly.
One possible solution is to divide the transmission in N different beacon-like messages with a small extra information to be able to put them back together. In this way, the power used grows linearly as N grows.

k means clustering of ip address data in matlab

I have ip address data. I want to apply k mean clustering on it. how to apply it. can I map data in 4 dimensional space i.e. data has
10.0.11.4
10.0.7.4
10.0.8.4
10.0.14.4
then can I map data as 4 dimension i.e.
10 0 11 4
10 0 7 4
10 0 8 4
10 0 14 4
Sounds like a terrible idea to do so. It will lead quite meaningless clusters (Close IPs are often not related and host multiple sites. So the same IP might host a legal car shop and illegal material).
Did you know that every IP is one number?
The four-numbers-with-dots is just a bit easier to use for manual network management. But what you see is simply a four-byte integer. The IP 127.0.0.1 written in hex is 0x7F000001 and as decimal 2130706433.
I haven't recently verified this, but I'm pretty sure all browser still have to support the decimal notion of IPs, too. If you have a webserver on you localhost, try accessing it via http:// 2130706433/, or try ping 2130706433 on the command line.
K-means on the four-bytes data space would only make sense if there were some pattern to how IP adresses are assigned. I.e. you would need to have 10.1.123.45 and 10.2.123.45 to always have as much in common as if they were subsequent IPs in the last byte.
It depends on why you want to do that.
For example if you want to do this to check which IP address are in the subnet or are geographically closest then K-means will clearly fail there. Because the following values (for exmple) as per k-means
10.0.4.1
9.0.4.1
are close to each other but in reality, they might geographically too far. So as I said, it all depends on why you want to run K-means on IP addresses?

Graphs: Nodes with max degree of 4, each node tries to connect to 4 nearest nodes - how many connections lost?

I am currently looking into spatially embedded networks, and am having trouble finding an answer for the following. Lets say I have a network with N nodes. Each of these nodes are located in space, and have 4 ports from which a connection can be made (meaning each node has a maximum degree of 4). Each node tries to connect with its 4 closest neighbors. (Links are bidirectional)
The following problem occurs:
As connections are added, certain nodes will reach their maximum degree. Another node may have this node as one of its closest neighbors, but can't connect to it because the other node cannot accept anymore connections. These 'attempts' are highlighted in red.
Is there any way to generalize this problem and determine how many connections are possible within the graph? How does the ordering in which connections are made affect the results?
The order of connections are made is irrelevant. Suppose, the connections are established at once without maximal node degree restriction. After that, some of the nodes can have more connections than is allowed. Therefore, you have to remove some of the newly added connections to satisfy the restriction. Nevertheless, how many connections have to be removed is defined, though, which connections is not.
If you have a criterion which evaluates your network you can use dynamic programming approach to remove invalid connections in a newly established network to maximize value of the network.

Bandwidth measurent by minimum data transfer

I intend to write an application where I will be needing to calculate the network bandwidth along with latency and packet loss rate. One of the constraints is to passively measure the bandwidth (using the application data itself).
What I have read online and understood from a few existing applications is that almost all of them use active probing techniques (that is, generating a flow of probe packets) and use the time difference between arrival of the first and last packets to calculate the bandwidth.
The main problems with such a technique are that it floods the network with probe packets, which runs longer and is not scalable (since we need to run the application at both ends).
One of the suggestions was to calculate the RTT of a packet by echoing it back to the sender and calculate the bandwidth using the following equation:
Bandwidth <= (Receive Buffer size)/RTT.
I am not sure how accurate this could be as the receiver may not always echo back the packet on time to get the correct RTT. Use of ICMP alone may not always work as many servers disable it.
My main application runs over a TCP connection so I am interested in using the TCP connection to measure the actual bandwidth offered over a particular period of time. I would really appreciate if anybody could suggest a simple technique (reliable formula) to measure the bandwidth for a TCP connection.
It is only possible to know the available bandwidth by probing the network. This is due to that a 80% utilized link will still send echo-packets without delay, i.e. it will appear to be 0% occupied.
If you instead just wish to measure the bandwidth your application is using, it is much easier. E.g. keep a record of the amount of data you have transferred in the last second divided into 10ms intervals.
Active probing technique and its variants are bandwidth estimation algorithm. You dont want to use these algorithm to measure bandwidth. Note the difference between 'measure' and 'estimate'.
If you want to use tcp to measure bandwidth, you should be aware that tcp bandwidth is influenced by latency.
The easiest way to measure bandwidth using tcp is by sending tcp packets and measure the transferred bandwidth. It will flood the network. None of the non flooding algorithm reliable in high speed network. Plus, non flooding algorithm assume the channel is clear from traffic. If there is other traffic inside the channel, the result would be skewed. Im not suprised if the result would not make sense.