length of dbName is max 10 - fiware-orion

Why is there a such a short maximum length for the used database?
I'm considering a pull request to make it larger, but I would like to know the reason for this.

MongoDB limits DB name length to 64 characteres, as stated in its documentation:
Database names cannot be empty and must have fewer than 64 characters.
When Orion runs in -multiservice mode, each service is associated to a database, which name is as follows:
<db_prefix>-<service_name>
where <db_prefix> is the value of the -db CLI parameter (orion by default) and <service_name> is the name of the service (i.e. the one that comes in the Fiware-Service header in requests).
On the other hand, service names are limited to 50 characters (as stated in Orion documentation).
Thus, if <db_prefix> maximum length is 10 then the maximum length for a DB name will be: 10 (max db prefix) + 50 (max service name) + 1 (for the -) = 61, which is less than the maximum 64 allowed at DB level.
We could have chosen 12 as maximum db prefix (for a total maximum of 63 at DB level), but we liked 10 as round number :)

Related

Is there a possibility of same ObjectIds be gernerated on MongoDbs on 2 different machines

The Object Id in MongoDB has 3 parts as per the official documentation:
a 4-byte timestamp value, representing the ObjectId’s creation, measured in seconds since the Unix epoch
a 5-byte random value
a 3-byte incrementing counter, initialized to a random value
In some other blogs and documentation , it is said that the 5 byte random value refers to 3-byte machine id and 2-byte process id combination to have uniqueness.
As per my Observation:
In my local Machine whenever I am writing in MongoDB through an application, the 5 bytes random value only changes if I restart the application and then write again, which is showing possibility of the 5-byte random no. to depend on the Process Id. But its not the case that first 3 bytes is not changing (Machine Address) and only last 2 bytes is changing(Process id). Instead, ** complete 5-byte is changing**
I want to know that is it just a random number , or it has some dependency on Machine Id and Process Id... If it has dependency on Machine Id + Process Id, then can we assume that it is highly unlikely that 2 Object Ids on different machines are same at a given time.
The "random value" part of ObjectId used to be machine id + process id, now it is simply a random number. (See the rationale in the spec for the official statement.)
When using virtualization it is not uncommon to end up with the same machine id across multiple servers. See for example here.
Process id can also be the same across multiple servers if they all launch from the same image, thus have exactly the same boot sequence.
For these reasons ObjectId generation now uses a random counter.
Random value(5 bytes) are a combination of Machine Identifier(3 bytes) and Process Id(2 bytes).
You need to get the same hash value for the combination of Machine Identifier and Process Id with the same random value across both the machine at the exact same second to get a duplicate ObjectId in MongoDB across two different machines.
To answer your question, it is highly unlikely but not impossible.
The same has been described in this MongoDB Blog where it is mentioned that it may not be globally unique in some edge cases.
Refer Point:3 of the accepted answer of this question for Possibility of duplicate Mongo ObjectId's being generated in two different collections?

MongoDB network subnet CIDR filter [duplicate]

Currently in order to save an IP address I am converting it to number and store it in the collection. Basically I am doing this for logging purposes. This means that I care to store information as fast as possible and with smallest amount of space.
I will be rarely using it for querying.
My ideas that
Storing as strings is for sure inefficient.
Storing as 4 digits will be slower and will take more space.
Nonetheless I think that this is an adequate method, but is there a better one for my purpose?
Definitely save IP addresses as numbers, if you don't mind the extra bit of work that it takes, especially if you need to do queries on the addresses and you have large tables/collections.
Here's why:
Storage
An IPv4 address is 4 bytes if stored as unsigned integer.
An IPv4 address varies between 10 bytes and 18 bytes when written out as a string in dotted octed form. (Let's assume the average is 14 bytes.)
That is 7-15 bytes for the characters, plus 2-3 bytes if you're using a variable length string type, which varies based on the database you're using. If you have a fixed length string representation available, then you must use a 15-character fixed width field.
Disk storage is cheap, so that's not a factor in most use cases. Memory, however, is not as cheap, and if you have a large table/collection and you want to do fast queries, then you need an index. The 2-3x storage penalty of string encoding drastically reduces the amount of records you can index while still keeping the index resident in memory.
An IPv6 address is 16 bytes if stored as an unsigned integer. (Likely as multiple 4 or 8 byte integers, depending on your platform.)
An IPv6 address ranges from 6 bytes to 42 bytes when encoded as a string in abbreviated hex notation.
On the low end, a loop back address (::1) is 3 bytes plus the variable length string overhead. On the high end, an address like 2002:4559:1FE2:1FE2:4559:1FE2:4559:1FE2 uses 39 bytes plus the variable length string overhead.
Unlike with IPv4, it's not safe to assume the average IPv6 string length will be mean of 6 and 42, because the number of addresses with a significant number of consecutive zeroes is a very small fraction of the overall IPv6 address space. Only some special addresses, like loopback and autoconf addresses, are likely to be compressible in this way.
Again, this is a storage penalty of >2x for string encoding versus integer encoding.
Network Math
Do you think routers store IP addresses as strings? Of course they don't.
If you need to do network math on IP addresses, the string representation is a hassle. E.g. if you want to write a query that searches for all addresses on a specific subnet ("return all records with an IP address in 10.7.200.104/27", you can easily do this by masking an integer address with an integer subnet mask. (Mongo doesn't support this particular query, but most RDBMS do.) If you store addresses as strings, then your query will need to convert each row to an integer, then mask it, which is several orders of magnitude slower. (Bitwise masking for an IPv4 address can be done in a few CPU cycles using 2 registers. Converting a string to an integer requires looping over the string.)
Similarly, range queries ("return all records all records between 192.168.1.50 and 192.168.50.100") with integer addresses will be able to use indexes, whereas range queries on string addresses will not.
The Bottom Line
It takes a little bit more work, but not much (there are a million aton() and ntoa() functions out there), but if you're building something serious and solid and you want to future-proof it against future requirements and the possibility of a large dataset, you should store IP addresses as integers, not strings.
If you're doing something quick and dirty and don't mind the possibility of remodeling in the future, then use strings.
For the OP's purpose, if you are optimizing for speed and space and you don't think you want to query it often, then why use a database at all? Just print IP addresses to a file. That would be faster and more storage efficient than storing it in a database (with associated API and storage overhead).
An IPv4 is four bytes, so you can store it into a 32-bit integer (BSON type 16).
See http://docs.mongodb.org/manual/reference/bson-types
An efficient way to save a ip address as a int. If you want to tag a ip with cidr filter, a demo here:
> db.getCollection('iptag').insert({tags: ['office'], hostmin: 2886991873, hostmax: 2887057406, cidr: '172.20.0.0/16'})
> db.getCollection('iptag').insert({tags: ['server'], hostmin: 173867009, hostmax: 173932542, cidr: '10.93.0.0/16'})
> db.getCollection('iptag').insert({tags: ['server'], hostmin: 173932545, hostmax: 173998078, cidr: '10.94.0.0/16'})
Create tags index.
> db.getCollection('iptag').ensureIndex(tags: 1)
Filter ip with cidr range. ip2int('10.94.25.32') == 173938976.
> db.getCollection('iptag').find({hostmin: {$lte: 173938976}, hostmax: {$gte: 173938976}})
Simplest way for IPv4 is to convert to int using the interesting Maths provided here.
I use the following function (js) to convert before matching with db
ipv4Number: function (ip) {
iparray = ip.split(".");
ipnumber = parseInt(iparray[3]) +
parseInt(iparray[2]) * 256 +
parseInt(iparray[1]) * Math.pow(256, 2) +
parseInt(iparray[0]) * Math.pow(256, 3);
if (parseInt(ipnumber) > 0)return ipnumber;
return 0;
}

pgSQL Character Limitations

I am still relatively new to pgSQL after switching away from mySQL completely.
I am trying to find the character limitations, if any, that pgSQL may or may not have. Specifically I am curious if there is a character limit on the following?
Database Name Length (mySQL is 64 characters)
Username Length (mySQL is 16 characters)
Password Length
I've been searching Google, I've read the pgSQL FAQ, and a few random other posts but I haven't found a solid answer to any of these. Perhaps pgSQL does not have these limitations like mySQL does. If anyone could shed some light on this that would be great!
I am currently using pgSQL 9.3.1
The length for any identifier is limited to 63 characters:
http://www.postgresql.org/docs/current/static/sql-syntax-lexical.html#SQL-SYNTAX-IDENTIFIERS
By default, NAMEDATALEN is 64 so the maximum identifier length is 63 bytes
As username and database name are identifiers, that limit should apply to them.
I'm not aware of any length limitation on passwords (although I'm sure there is one).

save IP address in mongoDB

Currently in order to save an IP address I am converting it to number and store it in the collection. Basically I am doing this for logging purposes. This means that I care to store information as fast as possible and with smallest amount of space.
I will be rarely using it for querying.
My ideas that
Storing as strings is for sure inefficient.
Storing as 4 digits will be slower and will take more space.
Nonetheless I think that this is an adequate method, but is there a better one for my purpose?
Definitely save IP addresses as numbers, if you don't mind the extra bit of work that it takes, especially if you need to do queries on the addresses and you have large tables/collections.
Here's why:
Storage
An IPv4 address is 4 bytes if stored as unsigned integer.
An IPv4 address varies between 10 bytes and 18 bytes when written out as a string in dotted octed form. (Let's assume the average is 14 bytes.)
That is 7-15 bytes for the characters, plus 2-3 bytes if you're using a variable length string type, which varies based on the database you're using. If you have a fixed length string representation available, then you must use a 15-character fixed width field.
Disk storage is cheap, so that's not a factor in most use cases. Memory, however, is not as cheap, and if you have a large table/collection and you want to do fast queries, then you need an index. The 2-3x storage penalty of string encoding drastically reduces the amount of records you can index while still keeping the index resident in memory.
An IPv6 address is 16 bytes if stored as an unsigned integer. (Likely as multiple 4 or 8 byte integers, depending on your platform.)
An IPv6 address ranges from 6 bytes to 42 bytes when encoded as a string in abbreviated hex notation.
On the low end, a loop back address (::1) is 3 bytes plus the variable length string overhead. On the high end, an address like 2002:4559:1FE2:1FE2:4559:1FE2:4559:1FE2 uses 39 bytes plus the variable length string overhead.
Unlike with IPv4, it's not safe to assume the average IPv6 string length will be mean of 6 and 42, because the number of addresses with a significant number of consecutive zeroes is a very small fraction of the overall IPv6 address space. Only some special addresses, like loopback and autoconf addresses, are likely to be compressible in this way.
Again, this is a storage penalty of >2x for string encoding versus integer encoding.
Network Math
Do you think routers store IP addresses as strings? Of course they don't.
If you need to do network math on IP addresses, the string representation is a hassle. E.g. if you want to write a query that searches for all addresses on a specific subnet ("return all records with an IP address in 10.7.200.104/27", you can easily do this by masking an integer address with an integer subnet mask. (Mongo doesn't support this particular query, but most RDBMS do.) If you store addresses as strings, then your query will need to convert each row to an integer, then mask it, which is several orders of magnitude slower. (Bitwise masking for an IPv4 address can be done in a few CPU cycles using 2 registers. Converting a string to an integer requires looping over the string.)
Similarly, range queries ("return all records all records between 192.168.1.50 and 192.168.50.100") with integer addresses will be able to use indexes, whereas range queries on string addresses will not.
The Bottom Line
It takes a little bit more work, but not much (there are a million aton() and ntoa() functions out there), but if you're building something serious and solid and you want to future-proof it against future requirements and the possibility of a large dataset, you should store IP addresses as integers, not strings.
If you're doing something quick and dirty and don't mind the possibility of remodeling in the future, then use strings.
For the OP's purpose, if you are optimizing for speed and space and you don't think you want to query it often, then why use a database at all? Just print IP addresses to a file. That would be faster and more storage efficient than storing it in a database (with associated API and storage overhead).
An IPv4 is four bytes, so you can store it into a 32-bit integer (BSON type 16).
See http://docs.mongodb.org/manual/reference/bson-types
An efficient way to save a ip address as a int. If you want to tag a ip with cidr filter, a demo here:
> db.getCollection('iptag').insert({tags: ['office'], hostmin: 2886991873, hostmax: 2887057406, cidr: '172.20.0.0/16'})
> db.getCollection('iptag').insert({tags: ['server'], hostmin: 173867009, hostmax: 173932542, cidr: '10.93.0.0/16'})
> db.getCollection('iptag').insert({tags: ['server'], hostmin: 173932545, hostmax: 173998078, cidr: '10.94.0.0/16'})
Create tags index.
> db.getCollection('iptag').ensureIndex(tags: 1)
Filter ip with cidr range. ip2int('10.94.25.32') == 173938976.
> db.getCollection('iptag').find({hostmin: {$lte: 173938976}, hostmax: {$gte: 173938976}})
Simplest way for IPv4 is to convert to int using the interesting Maths provided here.
I use the following function (js) to convert before matching with db
ipv4Number: function (ip) {
iparray = ip.split(".");
ipnumber = parseInt(iparray[3]) +
parseInt(iparray[2]) * 256 +
parseInt(iparray[1]) * Math.pow(256, 2) +
parseInt(iparray[0]) * Math.pow(256, 3);
if (parseInt(ipnumber) > 0)return ipnumber;
return 0;
}

Need a 9 char length unique ID

My application uses a 9 digit number (it can be alphanumeric also). I can start with any number and then increments it at the beginning. But my application is not a single instance application, so if I run this exe as another instance, it should increment the latest value and the previous instance should again increment the latest value when it needs that value. I mean at all time, the value should be latest incremented value among all the instances that I open.
This is half of the problem. The other side is, exes can be run on any machine on the network and each instance should keep on incrementing (just like time never goes back) for another 2 years. My restrictions is that I can't use files to store and retrieve the latest value in common place.
How can I do that?
A 9 char/digit UNIQUE NUMBER also works for sure. The whole idea is to assign a number (String of 9 char length) to each "confidential file" and (encrypt it and whatever, which is not my job)
I tried with:
GUID which is unique in total 128 bits but not with last or first 9 chars
Tick count more than 9
MAC address unique only if 12 chars
ISBN (book numbering system)
And so on ...
I think the best approach might be to have unique number server which each instance of you application queries over the network to get unique numbers.
First, you need to remove the distributed aspect from the problem. Like user Hugo suggested, using the last 2 or 3 bytes of the IP address should work. Your problem is now reduced to a local problem for each single machine.
Your algorithm probably needs to be able to deal with a restart, and not start handing out the same numbers after a reboot. You state that you do not have the option to use a file to store and retrieve information about this mechanism via a file system. This means that a random number generator alone would not be good enough, and you need a time-based component in your number generator as well. If you use 4 bytes containing the number of seconds elapsed since some date you will have more than 100 years of uniqueness in that. However, ideally the time-scale to use here depends on the expected handout-frequency of your numbers. Your problem is now reduced to a local problem for each single machine for each single second.
The final 2 or 3 bytes are then available to ensure local uniqueness for the second. Depending on your requirements and operating system, there are multiple IPC mechanisms to manage this, like pipes, sockets or shared memory. Or you could think of more creative ways. If you know the number of participating processes on a node, you could assign a sequence number to each process at startup or configuration time, and 1 of the 2 or 3 bytes is used for that. Your uniqueness problem has now become local to your process to one second only, which should be doable.
Why does it have to be EXACTLY 9? UUIDs would be great if that didn't limit you.
In any case, your best shot is to generate a random number. If all your PCs are in the same network, use the host-digits of the IP address at the begining to avoid colission. This should be no more than 16 or 24 bits in most cases anyway, so you have 6 remaining digits.