k means clustering of ip address data in matlab - matlab

I have ip address data. I want to apply k mean clustering on it. how to apply it. can I map data in 4 dimensional space i.e. data has
10.0.11.4
10.0.7.4
10.0.8.4
10.0.14.4
then can I map data as 4 dimension i.e.
10 0 11 4
10 0 7 4
10 0 8 4
10 0 14 4

Sounds like a terrible idea to do so. It will lead quite meaningless clusters (Close IPs are often not related and host multiple sites. So the same IP might host a legal car shop and illegal material).
Did you know that every IP is one number?
The four-numbers-with-dots is just a bit easier to use for manual network management. But what you see is simply a four-byte integer. The IP 127.0.0.1 written in hex is 0x7F000001 and as decimal 2130706433.
I haven't recently verified this, but I'm pretty sure all browser still have to support the decimal notion of IPs, too. If you have a webserver on you localhost, try accessing it via http:// 2130706433/, or try ping 2130706433 on the command line.
K-means on the four-bytes data space would only make sense if there were some pattern to how IP adresses are assigned. I.e. you would need to have 10.1.123.45 and 10.2.123.45 to always have as much in common as if they were subsequent IPs in the last byte.

It depends on why you want to do that.
For example if you want to do this to check which IP address are in the subnet or are geographically closest then K-means will clearly fail there. Because the following values (for exmple) as per k-means
10.0.4.1
9.0.4.1
are close to each other but in reality, they might geographically too far. So as I said, it all depends on why you want to run K-means on IP addresses?

Related

Use of Binomial Theorem in IP address distribution

I am currently making a project on Binomial theorem/Distribution for my semester. I need some very interesting real life applications of these to add in my project(I need to add in depth explanation of that application). I came accross these applications:
Distribution of Internet Protocol Address (or IP Address)
This method in IP distribution condition where you have been given IP address of the fixed host and number of host are more
than total round off then you may use this theorem to distribute bits so that all host may be covered in IP addressing. This
method is known as variable sub netting.
Weather forecasting
Moreover binomial theorem is used in forecast services. The
disaster forecast also depends upon the use of binomial
theorems
But I couldn't find the explanation for point 1 anywhere. I know this is somewhat lame but if any of u can explain it in detail or if u could simply explain some other real life application of Binomial theorem/Distribution to me, I would really appreciate it!

PCIe TLP write packet address only 31:2 bits

Let's take a sample write packet : Suppose that the CPU wrote the
value 0x12345678 to the physical address 0xfdaff040 using 32-bit
addressing
This example is from this site (I didn't understand the explanations in the original post)
Why does the address start at the second bit [31 : 2]
Why isn't the address the same
An address of an aligned, 32-bit chunk always has two zero bits at the end of the address. You can think of this as either writing the address of the chunk to the 32-bit slot or as writing the addresses divided by four to bits 2 through 31 of the address. The result is the same either way since dividing by four is equivalent to shifting two bit positions to the right.

Ethernet cable to DB15 connector

Morning Overflowers,
For a specific in-house application for my company, I need to be able to make a Gigabit ethernet connection go through DB15 connectors, as seen bellow.
Here is what I'm trying to achieve:
For the first version, I just cut in half a cat 5e ethernet cable. I did not care too much about the pin-out from the cable to the DB15 connector and in the end I ended up having a 10MBit/s data rate, which is super low. Also my cable was super short, 2m in total.
For the second version I used a 5m cat 6 cable for one side, and the remain of the other cat 5e cable (resoldered) for the other side. I was more careful about the pinout and used the 4 left most pins to place the ethernet pairs as seen here:
The data rate is this time 100 MBits/s, but still not 1 GBits/s.
Before going through a 3rd version I thought I'd use my brain a little. I noticed while soldering that although inside a cat 5e/cat 6 cable there are 4 pairs, not all of them are side by side on the RJ45 socket as seen on figure bellow where blue and green wires are a bit mixed.
There is probably a reason for that arrangement and putting pairs together other that inside the cable itself is not probably a good idea, which leads to my question.
For version 3, should I just keep pin 1 to 8 in that order and solder them to the DB15 connector on adjacent pins?
More generally I am aware that unless the DB15 section is super short I won't be able to maintain Gibabit ethernet due to noise and other problems caused by unmatched pairs on that section.
I am open to any suggestion or tips or anything :)
Thanks in advance
After trial and errors, it turned out that it works fine if you arrange pairs to match a RJ45 connector (like on the figure "ethernet plug wiring"). The quality of the cable is probably not the one of a perfect 5e/6 cable but my computer can negociate a Gbits connection and transfer files over the network at speeds way above 10 MB/s reaching 50 MB/s. I always soldered more section with various connectors and it worked fine too.

Graphs: Nodes with max degree of 4, each node tries to connect to 4 nearest nodes - how many connections lost?

I am currently looking into spatially embedded networks, and am having trouble finding an answer for the following. Lets say I have a network with N nodes. Each of these nodes are located in space, and have 4 ports from which a connection can be made (meaning each node has a maximum degree of 4). Each node tries to connect with its 4 closest neighbors. (Links are bidirectional)
The following problem occurs:
As connections are added, certain nodes will reach their maximum degree. Another node may have this node as one of its closest neighbors, but can't connect to it because the other node cannot accept anymore connections. These 'attempts' are highlighted in red.
Is there any way to generalize this problem and determine how many connections are possible within the graph? How does the ordering in which connections are made affect the results?
The order of connections are made is irrelevant. Suppose, the connections are established at once without maximal node degree restriction. After that, some of the nodes can have more connections than is allowed. Therefore, you have to remove some of the newly added connections to satisfy the restriction. Nevertheless, how many connections have to be removed is defined, though, which connections is not.
If you have a criterion which evaluates your network you can use dynamic programming approach to remove invalid connections in a newly established network to maximize value of the network.

IPv6 address entropy and hashing

I have one thread that delegates incoming IPv6 connections to X worker threads, where connections with the same IPv6 source address should always be sent to the same worker thread. However, given a large amount of connections from all over the Internet, I also want to minimize the differences in load all X worker threads have to take on (i.e. in any given large time interval all worker threads should be entrusted with connections with approximately the same number of unique IPv6 addresses).
The most naive way to do so would be to interpret all or a section of the IPv6 address as a number N, and send it to the worker thread with index [N mod X]. However, IP addresses have low entropy (i.e. some bits are (un)set more often than others and thus are quite unrandom), so that would result in poor load balancing among the worker threads.
Clearly some kind of low-cost hashing needs to performed on the addresses to get keys with higher entropy. So are there documented solutions for this problem? Doesn't need to result in fantastic randomness, just something known to work reasonably well with IPv6 addresses.
Edited to remove the sentence about avoiding collisions (because in this scenario, collisions are actually totally fine).