I am generating random OTP-style strings that serve as a short-term identifier to link two otherwise unrelated systems (which have authentication at each end). These need to be read and re-entered by users, so in order to reduce the error rate and reduce the opportunities for forgery, I'd like to make one of the digits a check digit. At present my random string conforms to the pattern (removing I and O to avoid confusion):
^[ABCDEFGHJKLMNPQRSTUVWXYZ][0-9]{4}$
I want to append one extra decimal digit for the check. So far I've implemented this as a BLAKE2 hash (from libsodium) that's converted to decimal and truncated to 1 char. This gives only 10 possibilities for the check digit, which isn't much. My primary objective is to detect single character errors in the input.
This approach kind of works, but it seems that one digit is not enough to detect single char errors, and undetected errors are quite easy to find, for example K37705 and K36705 are both considered valid.
I do not have a time value baked into this OTP; instead it's purely random and I'm relying on keeping a record of the OTPs that have been generated recently for each user, which are deleted periodically, and I'm reducing opportunities for brute-forcing by rate and attempt-count limiting.
I'm guessing that BLAKE2 isn't a good choice here, but given there are only 10 possibilities for the result, I don't know that others will be better. What would be a better algorithm/approach to use?
Frame challenge
Why do you need a check digit?
It doesn't improve security, and a five digits is trivial for most humans to get correct. Check if server side and return an error message if it's wrong.
Normal TOTP tokens are commonly 6 digits, and actors such as google has determined that people in general manage to get them orrect.
I have a product that is basically a USB flash drive based on an NXP LPC18xx microcontroller. I'm using a library provided from the manufacturer (LPCOpen) that handles the USB MSC and the SD card media (which is where I store data).
Here is the problem: Internally the LPC18xx has a 64kB (limited by hardware) buffer used to cache reads/writes which means it can only cache up to 128 blocks(512B) of memory. The SCSI Write-10 command has a total-blocks field that can be up to 256 blocks (128kB). When originally testing the product on Windows 7 it never writes more than 128 blocks at a time but when tested on Linux it sometimes writes more than 128 blocks, which causes the microcontroller to crash.
Is there a way to tell the host OS not to request more than 128 blocks? I see references[1] to a Read-Block-Limit command(05h) but it doesn't seem to be widely supported. Also, what sense key would I return on the Write-10 command to tell Linux the write is too large? I also see references to a block limit VPD page in some device spec sheets but cannot find a lot of documentation about how it is implemented.
[1]https://en.wikipedia.org/wiki/SCSI_command
Let me offer a disclaimer up front that this is what you SHOULD do, but none of this may work. A cursory search of the Linux SCSI driver didn't show me what I wanted to see. So, I'm not at all sure that "doing the right thing" will get you the results you want.
Going by the book, you've got to do two things: implement the Block Limits VPD and handle too-large transfer sizes in WRITE AND READ.
First, implement the Block Limits VPD page, which you can find in late revisions of SBC-3 floating around on the Internet (like this one: http://www.13thmonkey.org/documentation/SCSI/sbc3r25.pdf). It's probably worth going to the t10.org site, registering, and then downloading the last revision (http://www.t10.org/cgi-bin/ac.pl?t=f&f=sbc3r36.pdf).
The Block Limits VPD page has a maximum transfer length field that specifies the maximum number of blocks that can be transferred by all the READ and WRITE commands, and basically anything else that reads or writes data. Of course the downside of implementing this page is that you have to make sure that all the other fields you return are correct!
Second, when handling READ and WRITE, if the command's transfer length exceeds your maximum, respond with an ILLEGAL REQUEST key, and set the additional sense code to INVALID FIELD IN CDB. This behavior is indicated by a table in the section that describes the Block Limits VPD, but only in late revisions of SBC-3 (I'm looking at 35h).
You might just start with returning INVALID FIELD IN CDB, since it's the easiest course of action. See if that's enough?
In wikipedia the image of ethernet frame includes a field "inter-packet gap". Other sited I've looked in don't have that field. I don't seem to understand if the gap is predefined according to the protocol or it can be changed using this field.
The default gap between packets is 96 "bit times" (the time taken to send 96 bits on the medium used).
This is sometimes too big or too small in very specialised circumstances, so organisations are allowed to override this by specifying their own.
By including this field, you're effectively telling the recipient, "I'm not going to send anything for n bit times now, please do the same".
Apparently, it's too small for some Ethernet connections on MS Windows and can be changed
here
In the movie The Social Network, when Mark Zuckberg was in class, the teacher asked this question:
Suppose we're given a computer, with a 16-bit virtual address, and a page size of 256-bytes,the system uses one-level page tables that start at address hex 400, may be you want DMA (Direct Memory Access) on your 16-bit system. Who knows? The first pages are reserved for hardware flags, etc. Assume page-table entries have eight status bits. The eight status bits would then be ...
Mark Zuckberg answered:
One valid bit, one modified bit, one reference bit and five permission bits.
How did he get this?
http://chomaloma.blogspot.com.au/2011/02/social-network-inaccuracies-regarding.html
That does explain it a little
Intel nomenclature in parentheses. The 'valid' (present), 'modified' (dirty) and 'reference' (accessed) bits are the minimum set of bits you need for a demand paging manager and MMU.
The 'valid' (present) bit is used by the MMU to know whether the page is mapped to a valid physical address.
The 'modified' (dirty) bit is used by the demand paging manager to determine if the page being evicted needs to be written to backing media. As accessing backing media can be considered an expensive operation, you really want to keep this to a minimum--especially when writing to it as that is generally slower than reading from it.
The 'reference' (accessed) bit is useful to the demand paging manager to figure out how to age the pages it controls. You don't want to evict the most frequently used pages as that would require saving and/or loading them repeatedly from backing store (which has already been stated as SLOW).
The remaining five bits are gravy. They are free to use as permission and/or option bits. For example, can the page be accessed by supervisor and/or user threads? Is the page available for write, or is it read-only? What is the caching strategy to be used on the page?
Hope this helps.
Sparky
How did he get the answer?---That is just movie BS.
If you take the number of bits in the address and subtract the number of bits used to represent a page, you get the number of bits available for the processor to use as system status bits.
With that information, he could identify the number of system status bits
The usage of those bits is another story. The allocation of system status bits is system dependent. Maybe they exist, but I don't know of any 16-bit virtual addressing system. So he's not referring to any specific type of system.
A reference bit is not used by all systems (e.g. VMS). That's not even mandatory.
Hollywood magic.
We have a system where we want to prevent the same credit card number being registered for two different accounts. As we don't store the credit card number internally - just the last four digits and expiration date - we cannot simply compare credit card numbers and expiration dates.
Our current idea is to store a hash (SHA-1) in our system of the credit card information when the card is registered, and to compare hashes to determine if a card has been used before.
Usually, a salt is used to avoid dictionary attacks. I assume we are vulnerable in this case, so we should probably store a salt along with the hash.
Do you guys see any flaws in this method? Is this a standard way of solving this problem?
Let's do a little math: Credit card numbers are 16 digits long. The first seven digits are 'major industry' and issuer numbers, and the last digit is the luhn checksum. That leaves 8 digits 'free', for a total of 100,000,000 account numbers, multiplied by the number of potential issuer numbers (which is not likely to be very high). There are implementations that can do millions of hashes per second on everyday hardware, so no matter what salting you do, this is not going to be a big deal to brute force.
By sheer coincidence, when looking for something giving hash algorithm benchmarks, I found this article about storing credit card hashes, which says:
Storing credit cards using a simple single pass of a hash algorithm, even when salted, is fool-hardy. It is just too easy to brute force the credit card numbers if the hashes are compromised.
...
When hashing credit card number, the hashing must be carefully designed to protect against brute forcing by using strongest available cryptographic hash functions, large salt values, and multiple iterations.
The full article is well worth a thorough read. Unfortunately, the upshot seems to be that any circumstance that makes it 'safe' to store hashed credit card numbers will also make it prohibitively expensive to search for duplicates.
People are over thinking the design of this, I think. Use a salted, highly secure (e.g. "computationally expensive") hash like sha-256, with a per-record unique salt.
You should do a low-cost, high accuracy check first, then do the high-cost definitive check only if that check hits.
Step 1:
Look for matches to the last 4 digits (and possibly also the exp. date, though there's some subtleties there that may need addressing).
Step 2:
If the simple check hits, use the salt, get the hash value, do the in depth check.
The last 4 digits of the cc# are the most unique (partly because it includes the LUHN check digit as well) so the percentage of in depth checks you will do that won't ultimately match (the false positive rate) will be very, very low (a fraction of a percent), which saves you a tremendous amount of overhead relative to the naive "do the hash check every time" design.
Do not store a simple SHA-1 of the credit card number, it would be way too easy to crack (especially since the last 4 digits are known). We had the same problem in my company: here is how we solved it.
First solution
For each credit card, we store the last 4 digits, the expiration date, a long random salt (50 bytes long), and the salted hash of the CC number. We use the bcrypt hash algorithm because it is very secure and can be tuned to be as CPU-intensive as you wish. We tuned it to be very expensive (about 1 second per hash on our server!). But I guess you could use SHA-256 instead and iterate as many times as needed.
When a new CC number is entered, we start by finding all the existing CC numbers that end with the same 4 digits and have the same expiration date. Then, for each matching CC, we check whether its stored salted hash matches the salted hash calculated from its salt and the new CC number. In other words, we check whether or not hash(stored_CC1_salt+CC2)==stored_CC1_hash.
Since we have roughly 100k credit cards in our database, we need to calculate about 10 hashes, so we get the result in about 10 seconds. In our case, this is fine, but you may want to tune bcrypt down a bit. Unfortunately, if you do, this solution will be less secure. On the other hand, if you tune bcrypt to be even more CPU-intensive, it will take more time to match CC numbers.
Even though I believe that this solution is way better than simply storing an unsalted hash of the CC number, it will not prevent a very motivated pirate (who manages to get a copy of the database) to break one credit card in an average time of 2 to 5 years. So if you have 100k credit cards in your database, and if the pirate has a lot of CPU, then he can can recover a few credit card numbers every day!
This leads me to the belief that you should not calculate the hash yourself: you have to delegate that to someone else. This is the second solution (we are in the process of migrating to this second solution).
Second solution
Simply have your payment provider generate an alias for your credit card.
for each credit card, you simply store whatever you want to store (for example the last 4 digits & the expiration date) plus a credit card number alias.
when a new credit card number is entered, you contact your payment provider and give it the CC number (or you redirect the client to the payment provider, and he enters the CC number directly on the payment provider's web site). In return, you get the credit card alias! That's it. Of course you should make sure that your payment provider offers this option, and that the generated alias is actually secure (for example, make sure they don't simply calculate a SHA-1 on the credit card number!). Now the pirate has to break your system plus your payment provider's system if he wants to recover the credit card numbers.
It's simple, it's fast, it's secure (well, at least if your payment provider is). The only problem I see is that it ties you to your payment provider.
Hope this helps.
PCI DSS states that you can store PANs (credit card numbers) using a strong one-way hash. They don't even require that it be salted. That said you should salt it with a unique per card value. The expiry date is a good start but perhaps a bit too short. You could add in other pieces of information from the card, such as the issuer. You should not use the CVV/security number as you are not allowed to store it. If you do use the expiry date then when the cardholder gets issued a new card with the same number it will count as a different card. This could be a good or bad thing depending on your requirements.
An approach to make your data more secure is to make each operation computationally expensive. For instance if you md5 twice it will take an attacker longer to crack the codes.
Its fairly trivial to generate valid credit card numbers and to attempt a charge through for each possible expiry date. However, it is computationally expensive. If you make it more expensive to crack your hashes then it wouldn't be worthwhile for anyone to bother; even if they had the salts, hashes and the method you used.
#Cory R. King
SHA 1 isn't broken, per se. What the article shows is that it's possible to generate 2 strings which have the same hash value in less than brute force time. You still aren't able to generate a string that equates to a SPECIFIC hash in a reasonable amount of time. There is a big difference between the two.
I believe I have found a fool-proof way to solve this problem. Someone please correct me if there is a flaw in my solution.
Create a secure server on EC2, Heroku, etc. This server will serve ONE purpose and ONLY one purpose: hashing your credit card.
Install a secure web server (Node.js, Rails, etc) on that server and set up the REST API call.
On that server, use a unique salt (1000 characters) and SHA512 it 1000 times.
That way, even if hackers get your hashes, they would need to break into your server to find your formula.
Comparing hashes is a good solution. Make sure that you don't just salt all the credit card numbers with the same constant salt, though. Use a different salt (like the expiration date) on each card. This should make you fairly impervious to dictionary attacks.
From this Coding Horror article:
Add a long, unique random salt to each password you store. The point of a salt (or nonce, if you prefer) is to make each password unique and long enough that brute force attacks are a waste of time. So, the user's password, instead of being stored as the hash of "myspace1", ends up being stored as the hash of 128 characters of random unicode string + "myspace1". You're now completely immune to rainbow table attack.
Almost a good idea.
Storing just the hash is a good idea, it has served in the password world for decades.
Adding a salt seems like a fair idea, and indeed makes a brute force attack that much harder for the attacker. But that salt is going to cost you a lot of extra effort when you actually check to ensure that a new CC is unique: You'll have to SHA-1 your new CC number N times, where N is the number of salts you have already used for all of the CCs you are comparing it to. If indeed you choose good random salts you'll have to do the hash for every other card in your system. So now it is you doing the brute force. So I would say this is not a scalable solution.
You see, in the password world a salt adds no cost because we just want to know if the clear text + salt hashes to what we have stored for this particular user. Your requirement is actually pretty different.
You'll have to weigh the trade off yourself. Adding salt doesn't make your database secure if it does get stolen, it just makes decoding it harder. How much harder? If it changes the attack from requiring 30 seconds to requiring one day you have achieved nothing -- it will still be decoded. If it changes it from one day to 30 years you have achived someting worth considering.
Yes, comparing hashes should work fine in this case.
A salted hash should work just fine. Having a salt-per-user system should be plenty of security.
SHA1 is broken. Course, there isn't much information out on what a good replacement is. SHA2?
If you combine the last 4 digits of the card number with the card holder's name (or just last name) and the expiration date you should have enough information to make the record unique. Hashing is nice for security, but wouldn't you need to store/recall the salt in order to replicate the hash for a duplicate check?
I think a good solution as hinted to above, would be to store a hash value of say Card Number, Expiration date, and name. That way you can still do quick comparisons...
Sha1 broken is not a problem here.
All broken means is that it's possible to calculate collisions (2 data sets that have the same sha1) more easily than you would expect.
This might be a problem for accepting arbitrary files based on their sha1 but it has no relevence for an internal hashing application.
If you are using a payment processor like Stripe / Braintree, let them do the "heavy lifting".
They both offer card fingerprinting that you can safely store in your db and compare later to see if a card already exists:
Stripe returns fingerprint string - see doc
Braintree returns unique_number_identifier string - see doc