Is there any speed benefit to performing your own algorithm to scramble IDs for security purposes? [duplicate]

Is there any speed benefit to performing your own algorithm to scramble IDs for security purposes? [duplicate] - hash

I am planning to implement my own very simple "hashing" formula to add a layer of security to an app with multiple users. My current plan is as follows:
User creates an account at which point an ID is generated on the backend. The ID is run through a formula (let's say ID * 57 + 8926 - 36 * 7, or something equally random). I then send back to the frontend the new user ID and the new "hashed" number and store them in localStorage.
User tries to access a secured area (let's say a settings page so they can change their own settings).
I send the backend two values: their ID and the hashed number. I run the ID through the same formula to check it matches the hashed value I've received. If the check passes, they can get in. So if someone has tried, say, changing their ID in localStorage to get access to another user's settings page, the only way they could achieve that is if they guess what the formula was. They could easily guess a user ID, but guess that the corresponding number is the result of ID * 57 + 8926 - 36 * 7 seems pretty unlikely.
I'm doing this because it would be quicker/cheaper than a db lookup for an actual hashed value... I think? Would it make more sense to use a package to create some kind of primary key/uuid instead of "hashing" my own value and doing a db lookup each time?
Tech stack: React on FE, Python on BE, SQL db.

I see a lot of posts saying "don't roll your own" -- is this absolute?
Yes it is. The reason being that whenever a non-cryptographer tries their hand at developing their own algorithms, they invariably fall into a multitude of pit holes which render the security of the algorithm to next to useless.
Your particular scheme, for example, can be trivially broken given two consecutive ID and "hash" pairs. (It's a simple arithmetic sequence, deriving the formula of an arithmetic sequence given two consecutive values is grade ~6 level math.)
I'm doing this because it would be quicker/cheaper than a db lookup for an actual hashed value...
The performance difference would probably be negligible. Don't worry about it.
If the information is not particularly sensitive, just assign each user a randomly generated 128 bit number. The chances of someone guessing a valid user's number are practically zero.

Two property of real hashes that you are missing with this are
a simple change in the input causes a large change in the output
all hashes have the same length
This could be a problem if a user somehow knows their own id and hash.
With your selfmade hash I could easily find out the hash of a random other user by reverse engeniering the hash.

Related

How do I generate a unique id from an auto incremented integer?

I have an auto incremented id (an int) that I want to convert in to something less "mine-able". Basically I don't want people to be able to access data/0, data/1, data/2, etc. and rip through our entire database. I was thinking of just hashing the ID but I wasn't sure if I could guarantee uniqueness.
Let's say the value range is from 1 to a couple hundred million. It may be that one of the hash algorithms can guarantee uniqueness within those parameters.
If not, what would be a good approach to take?
I did consider hashing and then appending the ID.
I'm trying to avoid using a GUID because it would require a lot of changes to existing code so I'd prefer to transform the data I have.
EDIT:
To further explain the situation - these are static resources that are being hit. I don't have to go to a database and reverse it or look it up against something else. Imagine a listing of products - a user might have a link to a specific page but I don't want them to be able to programatically go through every page so I need an non incrementing ID.

As far as I know hashing is intended to create unique ID based on some concrete data (e.g. name, surname etc.). Hashing auto incremented ID wont help you much. If someone searches through your database by entering an auto incremented ID, that ID will be passed to hash function as parameter and he will still get the data he wants. So I think that better solution would be to hash some other data in order to get a unique ID. If you do so then a person who searches through you database would have to know exact data that is stored in there (e.g. He would have to know exact name of you employee, or his SSN).
Hope that helps!

Use something pseudo-random to salt the value before hashing if there is no need for reverse lookup.

Algorithm to generate a user unique, 6-character confirmation code?

I'd like to be able to create an algorithm that generates a 6 character confirmation code (e.g. A1JU2Z) that will be unique for a given (user, code) pair. The reason is, I'd like to keep the code at 6 characters, but using a trimmed set of alphanumerics (to avoid confusion with 1 and I, etc) only allows for ~300 million codes before collisions occur. Sure I may never need 300 million codes, but if I do, it will be a huge pain to go back and fix this.
So is there a way to utilize the user ... say their username, to generic unique codes such that if the same user wants to generate another code, its guaranteed that it is unique for them? (This is of course assuming a single user doesn't generate over 300 mill codes)
Thanks!

If the ID is unique only to the current user, you can just generate each character of the ID randomly. As long as the user is not expected to generate a large number of such IDs, you will have reasonable chance of not generating the same ID more than once (you need to do some math to get exact numbers for the expected chance of collision as the number of generated ID grow).
If you must not have collision at all cost, you need to either keep all previously generated IDs and do a comparison for the new one, or keep the count of the generated IDs (this requires a scheme where the ID generation is deterministic based on the count, but also unique -- a very simple case would be {ID=count; ++count;})

I think you can use a simple password generator like this : http://www.webtoolkit.info/php-random-password-generator.html
in combination with a check algorithm to be sure it is not already used.
$pass=generate_password();
$found=find_password($pass);
while($found){
$pass=generate_password();
$found=find_password($pass);
}
save_password($user,$code,$pass);
generate_password() is the function refered in the link.
find_password() is a function you have to write to check already generated codes in a database.
save_password() is a function you have to write to store the generated code in a database.
The code is in PHP, but the logic is here.
The password generator in the link is easy to understand, you can get 6 chars long, with the character rules you want.

How to fetch the continuous list with PostgreSQL in web

I am making an API over HTTP that fetches many rows from PostgreSQL with pagination. In ordinary cases, I usually implement such pagination through naive OFFET/LIMIT clause. However, there are some special requirements in this case:
A lot of rows there are so that I believe users cannot reach the end (imagine Twitter timeline).
Pages does not have to be randomly accessible but only sequentially.
API would return a URL which contains a cursor token that directs to the page of continuous chunks.
Cursor tokens have not to exist permanently but for some time.
Its ordering has frequent fluctuating (like Reddit rankings), however continuous cursors should keep their consistent ordering.
How can I achieve the mission? I am ready to change my whole database schema for it!

Assuming it's only the ordering of the results that fluctuates and not the data in the rows, Fredrik's answer makes sense. However, I'd suggest the following additions:
store the id list in a postgresql table using the array type rather than in memory. Doing it in memory, unless you carefully use something like redis with auto expiry and memory limits, is setting yourself up for a DOS memory consumption attack. I imagine it would look something like this:
create table foo_paging_cursor (
cursor_token ..., -- probably a uuid is best or timestamp (see below)
result_ids integer[], -- or text[] if you have non-integer ids
expiry_time TIMESTAMP
);
You need to decide if the cursor_token and result_ids can be shared between users to reduce your storage needs and the time needed to run the initial query per user. If they can be shared, chose a cache window, say 1 or 5 minute(s), and then upon a new request create the cache_token for that time period and then check to see if the results ids have already been calculated for that token. If not, add a new row for that token. You should probably add a lock around the check/insert code to handle concurrent requests for a new token.
Have a scheduled background job that purges old tokens/results and make sure your client code can handle any errors related to expired/invalid tokens.
Don't even consider using real db cursors for this.
Keeping the result ids in Redis lists is another way to handle this (see the LRANGE command), but be careful with expiry and memory usage if you go down that path. Your Redis key would be the cursor_token and the ids would be the members of the list.

I know absolutely nothing about PostgreSQL, but I'm a pretty decent SQL Server developer, so I'd like to take a shot at this anyway :)
How many rows/pages do you expect a user would maximally browse through per session? For instance, if you expect a user to page through a maximum of 10 pages for each session [each page containing 50 rows], you could make take that max, and setup the webservice so that when the user requests the first page, you cache 10*50 rows (or just the Id:s for the rows, depends on how much memory/simultaneous users you got).
This would certainly help speed up your webservice, in more ways than one. And it's quite easy to implement to. So:
When a user requests data from page #1. Run a query (complete with order by, join checks, etc), store all the id:s into an array (but a maximum of 500 ids). Return datarows that corresponds to id:s in the array at positions 0-9.
When the user requests page #2-10. Return datarows that corresponds to id:s in the array at posisions (page-1)*50 - (page)*50-1.
You could also bump up the numbers, an array of 500 int:s would only occupy 2K of memory, but it also depends on how fast you want your initial query/response.
I've used a similar technique on a live website, and when the user continued past page 10, I just switched to queries. I guess another solution would be to continue to expand/fill the array. (Running the query again, but excluding already included id:s).
Anyway, hope this helps!

Hashes vs Numeric id's

When creating a web application that some how displays the display of a unique identifier for a recurring entity (videos on YouTube, or book section on a site like mine), would it be better to use a uniform length identifier like a hash or the unique key of the item in the database (1, 2, 3, etc).
Besides revealing a little, what I think is immaterial, information about the internals of your app, why would using a hash be better than just using the unique id?
In short: Which is better to use as a publicly displayed unique identifier - a hash value, or a unique key from the database?
Edit: I'm opening up this question again because Dmitriy brought up the good point of not tying down the naming to db specific property. Will this sort of tie down prevent me from optimizing/normalizing the database in the future?
The platform uses php/python with ISAM /w MySQL.

Unless you're trying to hide the state of your internal object ID counter, hashes are needlessly slow (to generate and to compare), needlessly long, needlessly ugly, and needlessly capable of colliding. GUIDs are also long and ugly, making them just as unsuitable for human consumption as hashes are.
For inventory-like things, just use a sequential (or sharded) counter instead. If you migrate to a different database, you will just have to initialize the new counter to a value at least as large as your largest existing record ID. Pretty much every database server gives you a way to do this.
If you are trying to hide the state of your counter, perhaps because you're counting users and don't want competitors to know how many you have, I suggest avoiding the display of your internal IDs. If you insist on displaying them and don't want the drawbacks of a hash, you might consider using a maximal-period linear feedback shift register to generate IDs.

I typically use hashes if I don't want the user to be able to guess the next ID in the series. But for your book sections, I'd stick with numerical id's.

Using hashes is preferable in case you need to rebuild your database for some reason, for example, and the ordering changes. The ordinal numbers will move around -- but the hashes will stay the same.
Not relying on the order you put things into a box, but on properties of the things, just seems.. safer.
But watch out for collisions, obviously.

With hashes you
Are free to merge the database with a similar one (or a backup), if necessary
Are not doing something that could help some guessing attacks even a bit
Are not disclosing more private information about the user than necessary, e.g. if somebody sees a user number 2 in your current database log in, they're getting information that he is an oldie.
(Provided that you use a long hash or a GUID,) greatly helping youself in case you're bought by YouTube and they decide to integrate your databases.
Helping yourself in case there appears a search engine that indexes by GUID.
Please let us know if the last 6 months brought you some clarity on this question...

Hashes aren't guaranteed to be unique, nor, I believe, consistent.

will your users have to remember/use the value? or are you looking at it from a security POV?
From a security perspective, it shouldn't matter - since you shouldn't just be relying on people not guessing a different but valid ID of something they shouldn't see in order to keep them out.

Yeah, I don't think you're looking for a hash - you're more likely looking for a Guid.If you're on the .Net platform, try System.Guid.
However, the most important reason not to use a Guid is for performance. Doing database joins and lookups on (long) strings is very suboptimal. Numbers are fast. So, unless you really need it, don't do it.

Hashes have the advantage that you can check if they are valid or not BEFORE performing any check to your database whether they exist or not. This can help you to fend off attacks with random hashes as you don't need to burden your database with fake lookups.
Therefor, if your hash has some kind of well-defined format with for example a checksum at the end, you can check if it's correct without needing to go to the database.

How can I generate a unique, small, random, and user-friendly key?

A few months back I was tasked with implementing a unique and random code for our web application. The code would have to be user friendly and as small as possible, but still be essentially random (so users couldn't easily predict the next code in the sequence).
It ended up generating values that looked something like this:
Af3nT5Xf2
Unfortunately, I was never satisfied with the implementation. Guid's were out of the question, they were simply too big and difficult for users to type in. I was hoping for something more along the lines of 4 or 5 characters/digits, but our particular implementation would generate noticeably patterned sequences if we encoded to less than 9 characters.
Here's what we ended up doing:
We pulled a unique sequential 32bit id from the database. We then inserted it into the center bits of a 64bit RANDOM integer. We created a lookup table of easily typed and recognized characters (A-Z, a-z, 2-9 skipping easily confused characters such as L,l,1,O,0, etc.). Finally, we used that lookup table to base-54 encode the 64-bit integer. The high bits were random, the low bits were random, but the center bits were sequential.
The final result was a code that was much smaller than a guid and looked random, even though it absolutely wasn't.
I was never satisfied with this particular implementation. What would you guys have done?

Here's how I would do it.
I'd obtain a list of common English words with usage frequency and some grammatical information (like is it a noun or a verb?). I think you can look around the intertubes for some copy. Firefox is open-source and it has a spellchecker... so it must be obtainable somehow.
Then I'd run a filter on it so obscure words are removed and that words which are too long are excluded.
Then my generation algorithm would pick 2 words from the list and concatenate them and add a random 3 digits number.
I can also randomize word selection pattern between verb/nouns like
eatCake778
pickBasket524
rideFlyer113
etc..
the case needn't be camel casing, you can randomize that as well. You can also randomize the placement of the number and the verb/noun.
And since that's a lot of randomizing, Jeff's The Danger of Naïveté is a must-read. Also make sure to study dictionary attacks well in advance.
And after I'd implemented it, I'd run a test to make sure that my algorithms should never collide. If the collision rate was high, then I'd play with the parameters (amount of nouns used, amount of verbs used, length of random number, total number of words, different kinds of casings etc.)

In .NET you can use the RNGCryptoServiceProvider method GetBytes() which will "fill an array of bytes with a cryptographically strong sequence of random values" (from ms documentation).
byte[] randomBytes = new byte[4];
RNGCryptoServiceProvider rng = new RNGCryptoServiceProvider();
rng.GetBytes(randomBytes);
You can increase the lengh of the byte array and pluck out the character values you want to allow.

In C#, I have used the 'System.IO.Path.GetRandomFileName() : String' method... but I was generating salt for debug file names. This method returns stuff that looks like your first example, except with a random '.xyz' file extension too.
If you're in .NET and just want a simpler (but not 'nicer' looking) solution, I would say this is it... you could remove the random file extension if you like.

At the time of this writing, this question's title is:
How can I generate a unique, small, random, and user-friendly key?
To that, I should note that it's not possible in general to create a random value that's also unique, at least if each random value is generated independently of any other. In addition, there are many things you should ask yourself if you want to generate unique identifiers (which come from my section on unique random identifiers):
Can the application easily check identifiers for uniqueness within the desired scope and range (e.g., check whether a file or database record with that identifier already exists)?
Can the application tolerate the risk of generating the same identifier for different resources?
Do identifiers have to be hard to guess, be simply "random-looking", or be neither?
Do identifiers have to be typed in or otherwise relayed by end users?
Is the resource an identifier identifies available to anyone who knows that identifier (even without being logged in or authorized in some way)?
Do identifiers have to be memorable?
In your case, you have several conflicting goals: You want identifiers that are—
unique,
easy to type by end users (including small), and
hard to guess (including random).
Important points you don't mention in the question include:
How will the key be used?
Are other users allowed to access the resource identified by the key, whenever they know the key? If not, then additional access control or a longer key length will be necessary.
Can your application tolerate the risk of duplicate keys? If so, then the keys can be completely randomly generated (such as by a cryptographic RNG). If not, then your goal will be harder to achieve, especially for keys intended for security purposes.
Note that I don't go into the issue of formatting a unique value into a "user-friendly key". There are many ways to do so, and they all come down to mapping unique values one-to-one with "user-friendly keys" — if the input value was unique, the "user-friendly key" will likewise be unique.

If by user friendly, you mean that a user could type the answer in then I think you would want to look in a different direction. I've seen and done implementations for initial random passwords that pick random words and numbers as an easier and less error prone string.
If though you're looking for a way to encode a random code in the URL string which is an issue I've dealt with for awhile then I what I have done is use 64-bit encoded GUIDs.

You could load your list of words as chakrit suggested into a data table or xml file with a unique sequential key. When getting your random word, use a random number generator to determine what words to fetch by their key. If you concatenate 2 of them, I don't think you need to include the numbers in the string unless "true randomness" is part of the goal.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse