I'd like to store ip2location database in a postgres database.
Guide on ip2location website suggests to store IP addresses that specify the range (ip_from and ip_to) in columns of type DECIMAL(39, 0).
I would have expected this to be of type INET.
Is there any advantage (e.g. in terms of speed, size, ...) to use DECIMAL(39, 0) instead of INET? Other than the ip2location database format contains IPs converted to integers already and one would have to convert that back to IP addresses obviously.
The advantage of using the inet data type are
it takes up only 4 bytes to store an IPv4 address
you can make use of all the useful built-in functions and operators for IP addresses and CIDRs
There is no advantage in using NUMERIC(39), except that it seems to be required by the software you want to use.
Related
I'm trying to create an API that I can send an IP address to and the response will contain the subnet that the IP belongs to (if it belongs to any in the table).
I have a list of subnets all stored in a table in DynamoDB like such:
subnet
45.221.27.0/24
102.215.216.0/23
192.168.0.0/16
etc...
I can't seem to figure out how I could efficiently query the table to determine which subnet an IP belongs to. I am using a Lambda to make the request so I am trying to avoid reading all the subnets in because that will use a lot of memory. I'm also trying to avoid scanning the table rather than querying because that can become too expensive.
I've been thinking about different ways of storing the subnets in the table such that it becomes possible to get more granular with queries but I also feel like I'm overcomplicating something that shouldn't be so complex.
How funny, I'm actually writing a blog on this. I'll add the link once it's published. There's a lot of interesting scaling topics related to this problem for how to load and query with max efficiency. Here's the simplest approach:
Use a singular Partition Key value (that is, the same for all items). Use the range start IP address as the Sort Key. But make it the 32-bit numeric value of the IP address not the string value, because we need to sort by it and sorting by the string value is problematic. (All IP addresses are really just 32-bit numbers underneath.) The other attributes will be the metadata you want to retrieve.
The lookup then is to issue a Query where the PK is the singular value and the SK is <= the lookup IP address (in numeric form).
The one caveat is you need to make sure that any gaps in the IP address range data set need to be filled during the load with marker items saying "gap here", otherwise a lookup that hits the gap will return the range ahead of the gap.
I'm interested to know, how does WSAIoctl() with the SIO_ROUTING_INTERFACE_QUERY control code create the list of IP addresses of my host machine? In particular, what criteria does it use to order the IP addresses?
It only returns one! From MSDN (emphasis mine):
SIO_ROUTING_INTERFACE_QUERY (opcode setting: I, O, T==1)
To obtain the address of the local interface (represented as sockaddr structure) that should be used to send to the remote address specified in the input buffer ...
It's true that multiple routes to the destination address might exist, in which case it will no doubt pick the cheapest (routing table entries each contain a cost, or metric, see here).
Or did you mean SIO_ADDRESS_LIST_QUERY?, in which case Windows knows full well what network interfaces you have installed on your machine and the order in which they are returned is supremely unimportant.
Let's say we have a collection of documents where each document represents a network subnet like this:
{'network':'10.0.0.0/24,'someattr':'aaa'}
{'network':'192.168.0.0/16','someattr':'bbb'}
{'network':'172.16.0.0/16','someattr':'ccc'}
Is there any way I can lookup a single ip address (e.g. '10.0.100.50') in the MongoDB collection and identify the subnet/document it belongs to?
Remember that IPs are just ints and subnet masks are just ranges of ints - they're displayed as they are so that humans can read them (among other reasons). What I would do is add two additional fields to my MongoDB schema: startAddr and endAddr. Represent them as ints and they will be the starting and ending addresses for your subnet.
Next, you can convert the IP to an int in your application. See here for information on converting an IP to an integer, you didn't mention what language you're writing your application in so can't help you there.
Finally, query MongoDB on the following
db.collection.find({startAddr: {$lte: <ip>}, {endAddr: {$gte: <ip>}}})
You can also combine this new schema with some application level code to easily do things like find documents that have overlapping subnets if you need to do that.
I have a PostgreSQL table with a column (of type Text) 'ip_port'. This column has values like:
167.123.24.25:59874
This is IP:Port
And I'm trying to write a query to list all the subnets from those IP and a count of IPs in each subnet. Is there an elegant way of doing this in Postgres?
Thanks in advance!
You cannot do this - in PostgreSQL or any other tool - if the only information you have is the IP address. You must also know:
The broadcast address for the network;
The network subnet mask or mask prefix length; or
that pre-CIDR class-based address allocation is being used
Consider that a given IP is actually a member of multiple networks of increasing size. For the purpose of local area Ethernet-based IP networking there's a broadcast domain defined by the subnet mask, but there are also a nested set of increasingly broad routing aggregation domains for that address, some of which are visible on the public Internet and some of which are private to an internal network. The line between "public" and "private" can be blurry; for example, within my ISP's network their large network blocks are split into smaller chunks that are not visible from the outside world but are visible to the ISP's customers.
For example, a non-exhaustive list of networks for 167.123.24.25 might be, according to some quick calculations with the sipcalc tool:
167.123.24.25/32 (the host its self)
167.123.24.25/28: (a common ISP allocation): usable 167.123.24.17 - 167.123.24.30, netmask 255.255.255.240
167.123.24.25/24: (a common LAN subnet size): usable 167.123.24.1 - 167.123.24.254, netmask 255.255.255.0
167.123.24.25/20: (a midlevel routing aggregation): usable 167.123.16.1 - 167.123.31.254, netmask 255.255.240.0
167.123.24.25/16: The top-level APNIC allocation of this address block according to whois): usable 167.123.0.1 - 167.123.255.254, netmask 255.255.0.0
If you traceroute that address, consider that any of those routers might (but also might not) be the point at which the address becomes part of a narrower, more specific subnet. See Classless inter-domain routing on Wikipedia.
All that is before you even consider NAT (blech) and internal WAN routing. 167.123.24.25 might be a router for a whole hidden network of NATed hosts. These hosts are invisible to you from the outside without extremely sophisticated passive mapping that only someone close to them along their routing path can do.
To learn more about the address's network you have to do some serious probing to find out what addresses might be treated as broadcast addresses. Of course, most routers will filter packets destined for broadcast addresses from outside the local routing domain, so you might not be able to tell the difference between "host doesn't exist", "is a broadcast address" and "host exists but filters the packet I'm using for probing". Such probing is slow, and it'll also tend to make the network owners grumpy at you. Since this network is owned by the US Department of Public Works, I'd recommend not making them grumpy.
Below is a sample invoking of getaddrinfo()
status = getaddrinfo("www.example.net","1234", &hints, &server_info);
After that, the server_info will point to a linked list with all kinds of address information.
I have the following questions:
Since I have clearly specified the host name and port number, the only address infos I can think of are IPv4 and IPv6 addresses. So, the length of the linked list should be 2. Is there any other kind of address info besides them?
Thanks.
The name could resolve to more than one IPv4 or IPv6 address, there is nothing to say that only one IPv4 address will be returned, for example (try it with "www.google.com" for example, you'll likely get more than one IPv4 address).
But in any case, I think the basic premise of your question is wrong. Even if there was no possibility to return more than one IPv4 and one IPv6 address today, the function is documented to return an arbitrarily-long linked list of addrinfo objects. Therefore, even if your code worked today in every situation, there is no guarantee that it would continue to work tomorrow. If the function is documented to return an arbitrarily-long linked list, then you need to be able to handle that.
You want to disconnect the physical configuration of machines with names in your mind. DNS merely maps a name to a set of addresses. A lot of hosts will only have one interface. Many hosts will have multiple (called multi-homing). DNS doesn't care about what the configuration of the machine or machines that the addresses it maps a name to is. As simple examples often a server will have interfaces on multiple networks with different addresses that will all map to the same name. Sometimes when collapsing services from different machines onto one different names will map to the same address. So don't assume any sort of 1:1 mapping between names and machines much less interfaces.