Is there any phone number representation guaranteed to be unique? - telephony

I'm aware that lots of local representations of phone numbers clash between countries. i.e. there's no guarantee that a US phone number represented in local conventions (i.e. (401) 555-8675) doesn't clash with a different phone number represented in accordance with its locality.
Is the conventional internationalized format used in the US and europe (i.e. +1 (401) 555-8675) guaranteed to be globally unique if stored as, say, a normalized string ("+14013398675")?

Related

API/REST request. Limit a number to two fractional digits. Strings are not supported

I am trying to make an RESTful request to an api. The endpoint needs a number (actually money value), which cannot exceed the normal fractional digits for that currency.
I cannot use a String -> 500 - Bad request
If I use a double I get an error for doubles that have rounding errors (and thus to many digits).
So in the end if I have a value like "23.42", I need to pass this to my json as 23.42 without quaotation marks.
I am using java and the original value is stored as a BigDecimal.
I am using GSON if this is important

What value ranges are supported by major datetime libraries?

I'm looking for an overview of the internal data representation and earliest/latest dates supported by typical time libraries in different programming languages.
I can remember reading a webpage about that a while back but can't find it any more after dozens of Google search term refinements.
I didn't see the code of all major libs to be sure about that, but I guess many of them store data as fields, with most of them being numbers, such as year, month, day, hour, minute, etc.
The limits are either the upper/lower bounds of the respective numerical types, or some artificial value (such as "year 1 million" to represent a "very far future").
I think it also depends on the types being represented:
If the type represents only a local date (day/month/year, without hours and no timezone/offset), then the limits should be the upper/lower bounds of the type used to store the year value, or some artificial value.
If the type represents a timestamp (number of seconds/milliseconds/whatever-precision-of-seconds-the-API-supports since unix epoch), the limit can be the maximum value of the type used to store this value, or some artificial limit value.
And so on. But each of the cases above might give you a different limit, and the consistency between limits from different types will depend on the API.
Is that what you're asking?
Some API's, such as in Java, has documented limits for each type:
https://docs.oracle.com/javase/8/docs/api/java/time/LocalDateTime.html#MIN
https://docs.oracle.com/javase/8/docs/api/java/time/Instant.html#MIN
And note how the limit is different for each type.

String pattern matching using unreliable data

I know that there are several threads about string pattern matching, but I feel like my circumstances are slightly different.
I have a list of user-entered claim numbers, each claim number is unique. Each claim number can be in varying formats, it all depends on the user and how much of the actual full claim number they give us. We don't know what the insurance carrier's claim format is, they won't share it with us.
Let's assume that our users have entered the following claim numbers
Insurance Carrier X claim numbers:
04756G215
04759Q696
04760G279
04760T844
00631F546
006G34549
006J73029
Insurance Carrier Y claim numbers:
000628948-014
01-VK4994-0
01-VW6183-4
01-WC20436
12082356
01VL0063-6
01WB16121
03-016298-2
03-165476-3
1000-66-0792
1000-66-3808
1000-67-8667
1000-68-1360
1000-68-1686
1000-68-8494
1000-69-5647
1000-69-6905
Insurance Carrier Z:
42RBB903752
444F09799
51RBB672507
51RBC153279
55RBB120866
55RBB339718
As you can see, different formats. Also, I am certain that I can't rely on the user to enter the correct one, they often omit parts of it because that part may have some claim office code in it, we simply don't know.
Knowing this, I want to enter a claim number in a system that tells me which is the likely carrier that it belongs to
So 55RBB339719 would most likely give me carrier Z.
Are neural networks the way to go? Fuzzy Logic?
UPDATE:
Here's a string pattern to match
51RAB435220
As you can see it's the same pattern (2 digits, 3 characters, 6 digits).
However, a user can enter RAB435220 because the first two digits may be insignificant, meaning that they may be a department code rather than being the actual claim numbers. It's possible that only the last 6 digits are the significant ones.
What makes it difficult is that we don't know what the significant digits are.

How can SystemInfo.deviceUniqueIdentifier be unique?

According to http://docs.unity3d.com/ScriptReference/SystemInfo-deviceUniqueIdentifier.html, SystemInfo.deviceUniqueIdentifier
It is guaranteed to be unique for every device (Read Only).
If you delete and reinstall your application, this value will change. How can it guarantee that there isn't another iPhone in the world that happened to generate the same identifier?
(I know that the identifier is pretty long so it is quite unlikely for this to happen - but I'd like to know why)
As the description says, it's typically some form of device ID, set by the vendor:
iOS: on pre-iOS7 devices it will return hash of MAC address. On iOS7
devices it will be UIDevice identifierForVendor or, if that fails for
any reason, ASIdentifierManager advertisingIdentifier.
Windows Store
Apps: uses AdvertisingManager::AdvertisingId for returning unique
device identifier, if option in 'PC Settings -> Privacy -> Let apps
use my advertising ID for experiences across apps (turning this off
will reset your ID)' is disabled, Unity will fallback to
HardwareIdentification::GetPackageSpecificToken().Id.
Even if these values were random, they come out in the same format as a GUID with 32 hexadecimal characters (128bits). The number of possibilities is rather large.
From Wikipedia:
The total number of unique such GUIDs is 2^122 (approximately 5.3×10^36). This number is so large that the probability of the same number being
generated randomly twice is negligible; however other GUID versions
have different uniqueness properties and probabilities, ranging from
guaranteed uniqueness to likely duplicates. Assuming uniform
probability for simplicity, the probability of one duplicate would be
about 50% if every person on earth as of 2014 owned 600 million GUIDs.

How can I generate a unique, small, random, and user-friendly key?

A few months back I was tasked with implementing a unique and random code for our web application. The code would have to be user friendly and as small as possible, but still be essentially random (so users couldn't easily predict the next code in the sequence).
It ended up generating values that looked something like this:
Af3nT5Xf2
Unfortunately, I was never satisfied with the implementation. Guid's were out of the question, they were simply too big and difficult for users to type in. I was hoping for something more along the lines of 4 or 5 characters/digits, but our particular implementation would generate noticeably patterned sequences if we encoded to less than 9 characters.
Here's what we ended up doing:
We pulled a unique sequential 32bit id from the database. We then inserted it into the center bits of a 64bit RANDOM integer. We created a lookup table of easily typed and recognized characters (A-Z, a-z, 2-9 skipping easily confused characters such as L,l,1,O,0, etc.). Finally, we used that lookup table to base-54 encode the 64-bit integer. The high bits were random, the low bits were random, but the center bits were sequential.
The final result was a code that was much smaller than a guid and looked random, even though it absolutely wasn't.
I was never satisfied with this particular implementation. What would you guys have done?
Here's how I would do it.
I'd obtain a list of common English words with usage frequency and some grammatical information (like is it a noun or a verb?). I think you can look around the intertubes for some copy. Firefox is open-source and it has a spellchecker... so it must be obtainable somehow.
Then I'd run a filter on it so obscure words are removed and that words which are too long are excluded.
Then my generation algorithm would pick 2 words from the list and concatenate them and add a random 3 digits number.
I can also randomize word selection pattern between verb/nouns like
eatCake778
pickBasket524
rideFlyer113
etc..
the case needn't be camel casing, you can randomize that as well. You can also randomize the placement of the number and the verb/noun.
And since that's a lot of randomizing, Jeff's The Danger of Naïveté is a must-read. Also make sure to study dictionary attacks well in advance.
And after I'd implemented it, I'd run a test to make sure that my algorithms should never collide. If the collision rate was high, then I'd play with the parameters (amount of nouns used, amount of verbs used, length of random number, total number of words, different kinds of casings etc.)
In .NET you can use the RNGCryptoServiceProvider method GetBytes() which will "fill an array of bytes with a cryptographically strong sequence of random values" (from ms documentation).
byte[] randomBytes = new byte[4];
RNGCryptoServiceProvider rng = new RNGCryptoServiceProvider();
rng.GetBytes(randomBytes);
You can increase the lengh of the byte array and pluck out the character values you want to allow.
In C#, I have used the 'System.IO.Path.GetRandomFileName() : String' method... but I was generating salt for debug file names. This method returns stuff that looks like your first example, except with a random '.xyz' file extension too.
If you're in .NET and just want a simpler (but not 'nicer' looking) solution, I would say this is it... you could remove the random file extension if you like.
At the time of this writing, this question's title is:
How can I generate a unique, small, random, and user-friendly key?
To that, I should note that it's not possible in general to create a random value that's also unique, at least if each random value is generated independently of any other. In addition, there are many things you should ask yourself if you want to generate unique identifiers (which come from my section on unique random identifiers):
Can the application easily check identifiers for uniqueness within the desired scope and range (e.g., check whether a file or database record with that identifier already exists)?
Can the application tolerate the risk of generating the same identifier for different resources?
Do identifiers have to be hard to guess, be simply "random-looking", or be neither?
Do identifiers have to be typed in or otherwise relayed by end users?
Is the resource an identifier identifies available to anyone who knows that identifier (even without being logged in or authorized in some way)?
Do identifiers have to be memorable?
In your case, you have several conflicting goals: You want identifiers that are—
unique,
easy to type by end users (including small), and
hard to guess (including random).
Important points you don't mention in the question include:
How will the key be used?
Are other users allowed to access the resource identified by the key, whenever they know the key? If not, then additional access control or a longer key length will be necessary.
Can your application tolerate the risk of duplicate keys? If so, then the keys can be completely randomly generated (such as by a cryptographic RNG). If not, then your goal will be harder to achieve, especially for keys intended for security purposes.
Note that I don't go into the issue of formatting a unique value into a "user-friendly key". There are many ways to do so, and they all come down to mapping unique values one-to-one with "user-friendly keys" — if the input value was unique, the "user-friendly key" will likewise be unique.
If by user friendly, you mean that a user could type the answer in then I think you would want to look in a different direction. I've seen and done implementations for initial random passwords that pick random words and numbers as an easier and less error prone string.
If though you're looking for a way to encode a random code in the URL string which is an issue I've dealt with for awhile then I what I have done is use 64-bit encoded GUIDs.
You could load your list of words as chakrit suggested into a data table or xml file with a unique sequential key. When getting your random word, use a random number generator to determine what words to fetch by their key. If you concatenate 2 of them, I don't think you need to include the numbers in the string unless "true randomness" is part of the goal.