Using MD5 checksum to uniquely address a binary content in DB - hash

I need to take binary (images and pdf's) from one environment to another .
These binaries are referenced in a main document mostly HTML Doc as Title and Version No: .
The problem is that we have a versioning ,So an HTML DOC might reead to img src=(Logo1 + Version 2) . The Title is good for me , but the Version is system generated for the host system's use .
I need take the HTML Doc to another system - I can ofcourse insert the Logo assosiated - I don't want to just insert the image(or pdf) , if it is already available in the destination system . Can I use a combination of Title + MD5 Checksum to check if the destination system already has the same content possibly with a different Version No:. I think the chances of a collision is bare minmal with this approach ? We have Md5's stored in our document manager system

The chances for collisions depend on the number of documents you have to store, but should be sufficiently low.
But this assumes nobody actually tries to create collisions. MD5 is considered broken, so if somebody could benefit from causing collisions on your end he/she might be able to pull that of.
Therefore I'd recommend a more secure hash function. It shouldn't make much of a difference for your effort which one you use.
See also this question and answer: What is the clash rate for md5?

Related

Verilog bit metadata

is there a way to easily add a Metadata to a verilog bit? My goal is to be able to identify certain bits that are well known prior to encryption, after an ethernet frame is being encrypted. I'd like to easily identify these bits location in the encrypted frame. I'd like this Metadata to be transparent to the actual design rtl (i.e. Allow it to flow naturally through external IPs that are not mine, and be recovered and analyzed on the other end).
Thanks
There is absolutely no way to do this using the original RTL path.
You were not clear about your reasoning for this, but sometimes people use a watermark which is encoding something into your data which is inconsequential to the design, but has meaning to your verification environment. For example, instead of sending completely random data in a packet, you send data with a specific checksum that has meaning to your verification environment.

How can I limit the number of blocks written in a Write_10 command?

I have a product that is basically a USB flash drive based on an NXP LPC18xx microcontroller. I'm using a library provided from the manufacturer (LPCOpen) that handles the USB MSC and the SD card media (which is where I store data).
Here is the problem: Internally the LPC18xx has a 64kB (limited by hardware) buffer used to cache reads/writes which means it can only cache up to 128 blocks(512B) of memory. The SCSI Write-10 command has a total-blocks field that can be up to 256 blocks (128kB). When originally testing the product on Windows 7 it never writes more than 128 blocks at a time but when tested on Linux it sometimes writes more than 128 blocks, which causes the microcontroller to crash.
Is there a way to tell the host OS not to request more than 128 blocks? I see references[1] to a Read-Block-Limit command(05h) but it doesn't seem to be widely supported. Also, what sense key would I return on the Write-10 command to tell Linux the write is too large? I also see references to a block limit VPD page in some device spec sheets but cannot find a lot of documentation about how it is implemented.
[1]https://en.wikipedia.org/wiki/SCSI_command
Let me offer a disclaimer up front that this is what you SHOULD do, but none of this may work. A cursory search of the Linux SCSI driver didn't show me what I wanted to see. So, I'm not at all sure that "doing the right thing" will get you the results you want.
Going by the book, you've got to do two things: implement the Block Limits VPD and handle too-large transfer sizes in WRITE AND READ.
First, implement the Block Limits VPD page, which you can find in late revisions of SBC-3 floating around on the Internet (like this one: http://www.13thmonkey.org/documentation/SCSI/sbc3r25.pdf). It's probably worth going to the t10.org site, registering, and then downloading the last revision (http://www.t10.org/cgi-bin/ac.pl?t=f&f=sbc3r36.pdf).
The Block Limits VPD page has a maximum transfer length field that specifies the maximum number of blocks that can be transferred by all the READ and WRITE commands, and basically anything else that reads or writes data. Of course the downside of implementing this page is that you have to make sure that all the other fields you return are correct!
Second, when handling READ and WRITE, if the command's transfer length exceeds your maximum, respond with an ILLEGAL REQUEST key, and set the additional sense code to INVALID FIELD IN CDB. This behavior is indicated by a table in the section that describes the Block Limits VPD, but only in late revisions of SBC-3 (I'm looking at 35h).
You might just start with returning INVALID FIELD IN CDB, since it's the easiest course of action. See if that's enough?

consistent hashing on Multiple machines

I've read the article: http://n00tc0d3r.blogspot.com/ about the idea for consistent hashing, but I'm confused about the method on multiple machines.
The basic process is:
Insert
Hash an input long url into a single integer;
Locate a server on the ring and store the key--longUrl on the server;
Compute the shorten url using base conversion (from 10-base to 62-base) and return it to the user.(How does this step work? In a single machine, there is a auto-increased id to calculate for shorten url, but what is the value to calculate for shorten url on multiple machines? There is no auto-increased id.)
Retrieve
Convert the shorten url back to the key using base conversion (from 62-base to 10-base);
Locate the server containing that key and return the longUrl. (And how can we locate the server containing the key?)
I don't see any clear answer on that page for how the author intended it. I think this is basically an exercise for the reader. Here's some ideas:
Implement it as described, with hash-table style collision resolution. That is, when creating the URL, if it already matches something, deal with that in some way. Rehashing or arithmetic transformation (eg, add 1) are both possibilities. This means, naively, a theoretical worst case of having to hit a server n times trying to find an available key.
There's a lot of ways to take that basic idea and smarten it, eg, just search for another available key on the same server, eg, by rehashing iteratively until you find one that's on the server.
Allow servers to talk to each other, and coordinate on the autoincrement id.
This is probably not a great solution, but it might work well in some situations: give each server (or set of servers) separate namespace, eg, the first 16 bits selects a server. On creation, randomly choose one. Then you just need to figure out how you want that namespace to map. The namespaces only really matter for who is allowed to create what IDs, so if you want to add nodes or rebalance later, it is no big deal.
Let me know if you want more elaboration. I think there's a lot of ways that this one could go. It is annoying that the author didn't elaborate on this point; my experience with these sorts of algorithms is that collision resolution and similar problems tend to be at the very heart of a practical implementation of a distributed system.

Facebook File Names

I've been checking out Facebook code lately and all of their images and files have names comprised of just random letters and numbers like "FSEB6oLTK3I.png", "cWd6w4ZgtPx.png", "GsNJNwuI-UM.gif". What do these names mean? Are they using some sort of naming system (if so, what is it?) or are the names just random?
They are generated completely randomly. And probably done for good reasons too. If this name was predicable then you could see someone's random upload by just knowing their name or id.
After generating a file name, they store the image on disk and store the image name in the database. Again this purely done for security reasons.
I think the names are generated completely random. If that's not the case, one would need a lot more data regarding the images/files and their uploaders, not to mention additional data about... well, anything that might be relevant for an upload.
I think that it is just random. They probably have a database that has all the random filenames

Encrypting SQLite Database file in iPhone OS

Any SQLite database on the iPhone is simply a file bundled with the application. It is relatively simple for anyone to extract this file and query it.
What are your suggestions for encrypting either the file or the data stored within the database.
Edit: The App is a game that will be played against other users. Information about a users relative strengths and weaknesses will be stored in the DB. I don't want a user to be able to jail-break the phone up their reputation/power etc then win the tournament/league etc (NB: Trying to be vague as the idea is under NDA).
I don't need military encryption, I just don't want to store things in plain text.
Edit 2: A little more clarification, my main goals are
Make it non-trivial to hack sensitive data
Have a simple way to discover if data has been altered (some kind of checksum)
You cannot trust the client, period. If your standalone app can decrypt it, so will they. Either put the data on a server or don't bother, as the number of people who actually crack it to enhance stats will be minuscule, and they should probably be rewarded for the effort anyway!
Put a string in the database saying "please don't cheat".
There are at least two easier approaches here (both complimentary) that avoid encrypting values or in-memory databases:
#1 - ipa crack detection
Avoid the technical (and legal) hassle of encrypting the database and/or the contents and just determine if the app is pirated and disable the network/scoring/ranking aspects of the game. See the following for more details:
http://thwart-ipa-cracks.blogspot.com/2008/11/detection.html
#2 - data integrity verification
Alternatively store a HMAC/salted hash of the important columns in each row when saving your data (and in your initial sqlite db). When loading each row, verify the data against the HMAC/hash and if verification fails act accordingly.
Neither approach will force you to fill out the encryption export forms required by Apple/US government.
Score submission
Don't forget you'll need to do something similar for the actual score submissions to protect against values coming from something other than your app. You can see an implementation of this in the cocos2d-iphone and cocoslive frameworks at http://code.google.com/p/cocos2d-iphone/ and http://code.google.com/p/cocoslive/
Response to comments
There is no solution here that will 100% prevent data tampering. If that is a requirement, the client needs to be view only and all state and logic must be calculated on a trusted server. Depending on the application, extra anti-cheat mechanisms will be required on the client.
There are a number of books on developing massively-multiplayer games that discuss these issues.
Having a hash with a known secret in the code is likely a reasonable approach (at least, when considering the type of applications that generally exist on the App Store).
Like Kendall said, including the key on the device is basically asking to get cracked. However, there are folks who have their reasons for obfuscating data with a key on-device. If you're determined to do it, you might consider using SQLCipher for your implementation. It's a build of SQLite that provides transparent, page-level encryption of the entire DB. There's a tutorial over on Mobile Orchard for using it in iPhone apps.
How likely do you think it is that your normal user will be doing this? I assume you're going through the app store, which means that everything is signed/encrypted before getting on to the user's device. They would have to jailbreak their device to get access to your database.
What sort of data are you storing such that it needs encryption? If it contains passwords that the user entered, then you don't really need to encrypt them; the user will not need to find out their own password. If it's generic BLOB data that you only want the user to access through the application, it could be as simple as storing an encrypted blob using the security API.
If it's the whole database you want secured, then you'd still want to use the security api, but on the whole file instead, and decrypt the file as necessary before opening it. The issue here is that if the application closes without cleanup, you're left with a decrypted file.
You may want to take a look at memory-resident databases, or temporary databases which you can create either using a template db or a hard-coded schema in the program (take a look at the documentation for sqlite3_open). The data could be decrypted, inserted into the temporary database, then delete the decrypted database. Do it in the opposite direction when closing the connection.
Edit:
You can cook up your own encryption scheme I'm sure with just a very simple security system by XOR-ing the data with a value stored in the app, and store a hash somewhere else to make sure it doesn't change, or something.
SQLCipher:
Based on my experience SQLCipher is the best option to encrypt the data base.
Once the key("PRAGMA key") is set SQLCipher will automatically encrypt all data in the database! Note that if you don't set a key then SQLCipher will operate identically to a standard SQLite database.
The call to sqlite3_key or "PRAGMA key" should occur as the first operation after opening the database. In most cases SQLCipher uses PBKDF2, a salted and iterated key derivation function, to obtain the encryption key. Alternately, an application can tell SQLCipher to use a specific binary key in blob notation (note that SQLCipher requires exactly 256 bits of key material), i.e.
Reference:
http://sqlcipher.net/ios-tutorial
I hope someone would save time on exploring about this
Ignoring the philosophical and export issues, I'd suggest that you'd be better off encrypting the data in the table directly.
You need to obfuscate the decryption key(s) in your code. Typically, this means breaking them into pieces and encoding the strings in hex and using functions to assemble the pieces of the key together.
For the algorithm, I'd use a trusted implementation of AES for whatever language you're using.
Maybe this one for C#:
http://msdn.microsoft.com/en-us/magazine/cc164055.aspx
Finally, you need to be aware of the limitations of the approach. Namely, the decryption key is a weak link, it will be available in memory at run-time in clear text. (At a minimum) It has to be so that you can use it. The implementation of your encryption scheme is another weakness--any flaws there are flaws in your code too. As several other people have pointed out your client-server communications are suspect too.
You should remember that your executable can be examined in a hex editor where cleartext strings will leap out of the random junk that is your compiled code. And that many languages (like C# for example) can be reverse-compiled and all that will be missing are the comments.
All that said, encrypting your data will raise the bar for cheating a bit. How much depends on how careful you are; but even so a determined adversary will still break your encryption and cheat. Furthermore, they will probably write a tool to make it easy if your game is popular; leaving you with an arms-race scenario at that point.
Regarding a checksum value, you can compute a checksum based on the sum of the values in a row assuming that you have enough numeric values in your database to do so. Or, for an bunch of boolean values you can store them in a varbinary field and use the bitwise exclusive operator ^ to compare them--you should end up with 0s.
For example,
for numeric columns,
2|3|5|7| with a checksum column | 17 |
for booleans,
0|1|0|1| with a checksum column | 0101 |
If you do this, you can even add a summary row at the end that sums your checksums. Although this can be problematic if you are constantly adding new records. You can also convert strings to their ANSI/UNICODE components and sum these too.
Then when you want to check the checksum simple do a select like so:
Select *
FROM OrigTable
right outer join
(select pk, (col1 + col2 + col3) as OnTheFlyChecksum, PreComputedChecksum from OrigTable) OT on OrigTable.pk = OT.pk
where OT.OnTheFlyChecksum = OT.PreComputedChecksum
It appears to be simplest to sync all tournament results to all iPhones in the tournament. You can do it during every game: before a game, if the databases of two phones contradict each other, the warning is shown.
If the User A falsifies the result if his game with User B, this result will propagate until B eventually sees it with the warning that A's data don't match with his phone. He then can go and beat up explain to A that his behavior isn't right, just the way it is in real life if somebody cheats.
When you compute the final tournament results, show the warning, name names, and throw out all games with contradictory results. This takes away the incentive to cheat.
As said before, encryption won't solve the problem since you can't trust the client. Even if your average person can't use disassembler, all it takes is one motivated person and whatever encryption you have will be broken.
Yet, if on windows platform, you also can select SQLiteEncrypt to satisfy your needs.SQLiteEncrypt extends sqlite encryption support, but you can treat it as original sqlite3 c library.