I've been checking out Facebook code lately and all of their images and files have names comprised of just random letters and numbers like "FSEB6oLTK3I.png", "cWd6w4ZgtPx.png", "GsNJNwuI-UM.gif". What do these names mean? Are they using some sort of naming system (if so, what is it?) or are the names just random?
They are generated completely randomly. And probably done for good reasons too. If this name was predicable then you could see someone's random upload by just knowing their name or id.
After generating a file name, they store the image on disk and store the image name in the database. Again this purely done for security reasons.
I think the names are generated completely random. If that's not the case, one would need a lot more data regarding the images/files and their uploaders, not to mention additional data about... well, anything that might be relevant for an upload.
I think that it is just random. They probably have a database that has all the random filenames
Related
I have a large number of XML based data files with complex contents. Currently I am validating the contents at every use, and that is slow. I started thinking I could have a utility to validate the XML, then get an MD5 hash of the file and save it to the file meta data. Then, at use I can compare saved hash with current hash and only validate those files that are different.
At least, I can do a performance comparison and see if that will actually be any faster.
That said, I am not finding any way to add a custom Hash property to the file meta data. And I wonder if there is a better way to do this?
For some other XML files I am using code signing, but those are program resource XML files that I provide. These other XML files are modified by the customer for use, so I can't use code signing.
I also could include a text file that lists the XML files and their associated hashes, but storing the hash in the file seems a more elegant solution. It just seems like Windows is less than forthcoming with the custom metadata options. At least local files. Of course there is all sorts of metadata options when files are on SharePoint, or AWS S3, etc. And indeed, I need to be able to hash files and save that as metadata on the file, and have it survive a round trip through a cloud repository too, since that is the solution I am looking at for solving the Work From Home problem. A company would create and validate their XML files, then upload them to an S3 bucket, then code on the user machine would download and use them.
Am I on the right track, or is this a dead end? And if so, might a self-signed certificate solve the issue? Create your certificate and share the public key with users. Then sign your XML with it. That feels... not ideal.
I determined that this approach was indeed a dead end, due to the fact that I can't ensure that files will always be hosted on an NTFS formatted drive. Especially in smaller firms a NAS is a common location, and with Work from Home becoming a thing, so is a local external FAT32 formatted drive.
The solution is to prevalidate the XML, get a hash of the XML as a string, and then add that hash to the root node at an attribute. The XML load code can then pass loaded XML to a method that compares the value of that hash to a rehash of the same XML as a string, with the attribute removed. Net result, a universally applicable way to verify if the XML is changed since prevalidation. Which was really the goal.
We're developing a REST API for our platform. Let's say we have organisations and projects, and projects belong to organisations.
After reading this answer, I would be inclined to use numerical ID's in the URL, so that some of the URLs would become (say with a prefix of /api/v1):
/organisations/1234
/organisations/1234/projects/5678
However, we want to use the same URL structure for our front end UI, so that if you type these URLs in the browser, you will get the relevant webpage in the response instead of a JSON file. Much in the same way you see relevant names of persons and organisations in sites like Facebook or Github.
Using this, we could get something like:
/organisations/dutchpainters
/organisations/dutchpainters/projects/nightwatch
It looks like Github actually exposes their API in the same way.
The advantages and disadvantages I can come up with for using names instead of IDs for URL definitions, are the following:
Advantages:
More intuitive URLs for end users
1 to 1 mapping of front end UI and JSON API
Disadvantages:
Have to use unique names
Have to take care of conflict with reserved names, such as count, so later on, you can still develop an API endpoint like /organisations/count and actually get the number of organisations instead of the organisation called count.
Especially the latter one seems to become a potential pain in the rear. Still, after reading this answer, I'm almost convinced to use the string identifier, since it doesn't seem to make a difference from a convention point of view.
My questions are:
Did I miss important advantages / disadvantages of using strings instead of numerical IDs?
Did Github develop their string-based approach after their platform matured, or did they know from the start that it would imply some limitations (like the one I mentioned earlier, it seems that they did not implement such functionality)?
It's common to use a combination of both:
/organisations/1234/projects/5678/nightwatch
where the last part is simply ignored but used to make the url more readable.
In your case, with multiple levels of collections you could experiment with this format:
/organisations/1234/dutchpainters/projects/5678/nightwatch
If somebody writes
/organisations/1234/germanpainters/projects/5678/wanderer
it would still map to the rembrandt, but that should be ok. That will leave room for editing the names without messing up url:s allready out there. Also, names doesn't have to be unique if you don't really need that.
Reserved HTTP characters: such as “:”, “/”, “?”, “#”, “[“, “]” and “#” – These characters and others are “reserved” in the HTTP protocol to have “special” meaning in the implementation syntax so that they are distinguishable to other data in the URL. If a variable value within the path contains one or more of these reserved characters then it will break the path and generate a malformed request. You can workaround reserved characters in query string parameters by URL encoding them or sometimes by double escaping them, but you cannot in path parameters.
https://www.serviceobjects.com/blog/path-and-query-string-parameter-calls-to-a-restful-web-service
Numerical consecutive IDs are not recommended anymore because it is very easy to guess records in your database and some might use that to obtain info they do not have access to.
Numerical IDs are used because the in the database it is a fixed length storage which makes indexing easy for the database. For example INT has 4 bytes in MySQL and BIGINT is 8 bytes so the number have the same length in memory (100 in INT has the same length as 200) so it is very easy to index and search for records.
If you have a lot of entries in the database then using a VARCHAR field to index is a bad idea. You should use a fixed width field like CHAR(32) and fill the difference with spaces but you have to add logic in your program to treat the differences when searching the database.
Another idea would be to use slugs but here you should take into consideration the fact that some records might have the same slug, depends on what are you using to form that slug. https://en.wikipedia.org/wiki/Semantic_URL#Slug
I would recommend using UUIDs since they have the same length and resolve this issue easily.
I am wondering what the best approach would be to check whether or not a common first name is contained within an NSString on an iPhone app. I've got a sorted flat text file of ~5500 common American first names delimited by new lines. The NSString I am searching within for a name is not very long, most likely the size of a normal sentence.
My original plan was to load the sorted list into memory and then iterate over every word in the NSString performing a binary search of the list to determine whether or not that word was a common name.
Am I better off trying to put this name list into CoreData or a SQLite table and performing a query with that? My understanding is I would not have to load the entire list into memory if I went that route.
I am guessing this situation is a common problem with word dictionaries for word games, so I'm just wondering what the best practice is for fast lookups. Thanks!
SQLite sounds ideal for this in terms of both speed of lookup and minimising memory usage. It would also make it potentially possible to update the first name list over the internet if so desired.
Using Core Data (which is in effect an elabourate wrapper around SQLite) would be overkill in this instance, especially as you don't require the ORM like capabilities.
An NSSet might be useful as well. Dave DeLong's answer for another question demonstrates that NSSets have constant look-up times, i.e. O(1).
Load your names into an NSMutableSet one by one. This will be the slowest part but will only need to be done once. If your file is a simple line-delimited file of names, it may be easier to use the standard C library for reading the file, since line-by-line input is not well-supported by Cocoa.
After that, simply use [nameSet containsObject:name] to check whether it is in the list.
A couple of drawbacks to this approach:
The name you want to test must be in the same case as the name in the set, that is “paul” and “Paul” are different strings. You can circumvent this by converting all names to lowercase before inserting them into the set, and then also converting the name you want to check into lowercase before checking it against the set.
It might be easier just to go with the already-accepted answer.
Any SQLite database on the iPhone is simply a file bundled with the application. It is relatively simple for anyone to extract this file and query it.
What are your suggestions for encrypting either the file or the data stored within the database.
Edit: The App is a game that will be played against other users. Information about a users relative strengths and weaknesses will be stored in the DB. I don't want a user to be able to jail-break the phone up their reputation/power etc then win the tournament/league etc (NB: Trying to be vague as the idea is under NDA).
I don't need military encryption, I just don't want to store things in plain text.
Edit 2: A little more clarification, my main goals are
Make it non-trivial to hack sensitive data
Have a simple way to discover if data has been altered (some kind of checksum)
You cannot trust the client, period. If your standalone app can decrypt it, so will they. Either put the data on a server or don't bother, as the number of people who actually crack it to enhance stats will be minuscule, and they should probably be rewarded for the effort anyway!
Put a string in the database saying "please don't cheat".
There are at least two easier approaches here (both complimentary) that avoid encrypting values or in-memory databases:
#1 - ipa crack detection
Avoid the technical (and legal) hassle of encrypting the database and/or the contents and just determine if the app is pirated and disable the network/scoring/ranking aspects of the game. See the following for more details:
http://thwart-ipa-cracks.blogspot.com/2008/11/detection.html
#2 - data integrity verification
Alternatively store a HMAC/salted hash of the important columns in each row when saving your data (and in your initial sqlite db). When loading each row, verify the data against the HMAC/hash and if verification fails act accordingly.
Neither approach will force you to fill out the encryption export forms required by Apple/US government.
Score submission
Don't forget you'll need to do something similar for the actual score submissions to protect against values coming from something other than your app. You can see an implementation of this in the cocos2d-iphone and cocoslive frameworks at http://code.google.com/p/cocos2d-iphone/ and http://code.google.com/p/cocoslive/
Response to comments
There is no solution here that will 100% prevent data tampering. If that is a requirement, the client needs to be view only and all state and logic must be calculated on a trusted server. Depending on the application, extra anti-cheat mechanisms will be required on the client.
There are a number of books on developing massively-multiplayer games that discuss these issues.
Having a hash with a known secret in the code is likely a reasonable approach (at least, when considering the type of applications that generally exist on the App Store).
Like Kendall said, including the key on the device is basically asking to get cracked. However, there are folks who have their reasons for obfuscating data with a key on-device. If you're determined to do it, you might consider using SQLCipher for your implementation. It's a build of SQLite that provides transparent, page-level encryption of the entire DB. There's a tutorial over on Mobile Orchard for using it in iPhone apps.
How likely do you think it is that your normal user will be doing this? I assume you're going through the app store, which means that everything is signed/encrypted before getting on to the user's device. They would have to jailbreak their device to get access to your database.
What sort of data are you storing such that it needs encryption? If it contains passwords that the user entered, then you don't really need to encrypt them; the user will not need to find out their own password. If it's generic BLOB data that you only want the user to access through the application, it could be as simple as storing an encrypted blob using the security API.
If it's the whole database you want secured, then you'd still want to use the security api, but on the whole file instead, and decrypt the file as necessary before opening it. The issue here is that if the application closes without cleanup, you're left with a decrypted file.
You may want to take a look at memory-resident databases, or temporary databases which you can create either using a template db or a hard-coded schema in the program (take a look at the documentation for sqlite3_open). The data could be decrypted, inserted into the temporary database, then delete the decrypted database. Do it in the opposite direction when closing the connection.
Edit:
You can cook up your own encryption scheme I'm sure with just a very simple security system by XOR-ing the data with a value stored in the app, and store a hash somewhere else to make sure it doesn't change, or something.
SQLCipher:
Based on my experience SQLCipher is the best option to encrypt the data base.
Once the key("PRAGMA key") is set SQLCipher will automatically encrypt all data in the database! Note that if you don't set a key then SQLCipher will operate identically to a standard SQLite database.
The call to sqlite3_key or "PRAGMA key" should occur as the first operation after opening the database. In most cases SQLCipher uses PBKDF2, a salted and iterated key derivation function, to obtain the encryption key. Alternately, an application can tell SQLCipher to use a specific binary key in blob notation (note that SQLCipher requires exactly 256 bits of key material), i.e.
Reference:
http://sqlcipher.net/ios-tutorial
I hope someone would save time on exploring about this
Ignoring the philosophical and export issues, I'd suggest that you'd be better off encrypting the data in the table directly.
You need to obfuscate the decryption key(s) in your code. Typically, this means breaking them into pieces and encoding the strings in hex and using functions to assemble the pieces of the key together.
For the algorithm, I'd use a trusted implementation of AES for whatever language you're using.
Maybe this one for C#:
http://msdn.microsoft.com/en-us/magazine/cc164055.aspx
Finally, you need to be aware of the limitations of the approach. Namely, the decryption key is a weak link, it will be available in memory at run-time in clear text. (At a minimum) It has to be so that you can use it. The implementation of your encryption scheme is another weakness--any flaws there are flaws in your code too. As several other people have pointed out your client-server communications are suspect too.
You should remember that your executable can be examined in a hex editor where cleartext strings will leap out of the random junk that is your compiled code. And that many languages (like C# for example) can be reverse-compiled and all that will be missing are the comments.
All that said, encrypting your data will raise the bar for cheating a bit. How much depends on how careful you are; but even so a determined adversary will still break your encryption and cheat. Furthermore, they will probably write a tool to make it easy if your game is popular; leaving you with an arms-race scenario at that point.
Regarding a checksum value, you can compute a checksum based on the sum of the values in a row assuming that you have enough numeric values in your database to do so. Or, for an bunch of boolean values you can store them in a varbinary field and use the bitwise exclusive operator ^ to compare them--you should end up with 0s.
For example,
for numeric columns,
2|3|5|7| with a checksum column | 17 |
for booleans,
0|1|0|1| with a checksum column | 0101 |
If you do this, you can even add a summary row at the end that sums your checksums. Although this can be problematic if you are constantly adding new records. You can also convert strings to their ANSI/UNICODE components and sum these too.
Then when you want to check the checksum simple do a select like so:
Select *
FROM OrigTable
right outer join
(select pk, (col1 + col2 + col3) as OnTheFlyChecksum, PreComputedChecksum from OrigTable) OT on OrigTable.pk = OT.pk
where OT.OnTheFlyChecksum = OT.PreComputedChecksum
It appears to be simplest to sync all tournament results to all iPhones in the tournament. You can do it during every game: before a game, if the databases of two phones contradict each other, the warning is shown.
If the User A falsifies the result if his game with User B, this result will propagate until B eventually sees it with the warning that A's data don't match with his phone. He then can go and beat up explain to A that his behavior isn't right, just the way it is in real life if somebody cheats.
When you compute the final tournament results, show the warning, name names, and throw out all games with contradictory results. This takes away the incentive to cheat.
As said before, encryption won't solve the problem since you can't trust the client. Even if your average person can't use disassembler, all it takes is one motivated person and whatever encryption you have will be broken.
Yet, if on windows platform, you also can select SQLiteEncrypt to satisfy your needs.SQLiteEncrypt extends sqlite encryption support, but you can treat it as original sqlite3 c library.
When creating a web application that some how displays the display of a unique identifier for a recurring entity (videos on YouTube, or book section on a site like mine), would it be better to use a uniform length identifier like a hash or the unique key of the item in the database (1, 2, 3, etc).
Besides revealing a little, what I think is immaterial, information about the internals of your app, why would using a hash be better than just using the unique id?
In short: Which is better to use as a publicly displayed unique identifier - a hash value, or a unique key from the database?
Edit: I'm opening up this question again because Dmitriy brought up the good point of not tying down the naming to db specific property. Will this sort of tie down prevent me from optimizing/normalizing the database in the future?
The platform uses php/python with ISAM /w MySQL.
Unless you're trying to hide the state of your internal object ID counter, hashes are needlessly slow (to generate and to compare), needlessly long, needlessly ugly, and needlessly capable of colliding. GUIDs are also long and ugly, making them just as unsuitable for human consumption as hashes are.
For inventory-like things, just use a sequential (or sharded) counter instead. If you migrate to a different database, you will just have to initialize the new counter to a value at least as large as your largest existing record ID. Pretty much every database server gives you a way to do this.
If you are trying to hide the state of your counter, perhaps because you're counting users and don't want competitors to know how many you have, I suggest avoiding the display of your internal IDs. If you insist on displaying them and don't want the drawbacks of a hash, you might consider using a maximal-period linear feedback shift register to generate IDs.
I typically use hashes if I don't want the user to be able to guess the next ID in the series. But for your book sections, I'd stick with numerical id's.
Using hashes is preferable in case you need to rebuild your database for some reason, for example, and the ordering changes. The ordinal numbers will move around -- but the hashes will stay the same.
Not relying on the order you put things into a box, but on properties of the things, just seems.. safer.
But watch out for collisions, obviously.
With hashes you
Are free to merge the database with a similar one (or a backup), if necessary
Are not doing something that could help some guessing attacks even a bit
Are not disclosing more private information about the user than necessary, e.g. if somebody sees a user number 2 in your current database log in, they're getting information that he is an oldie.
(Provided that you use a long hash or a GUID,) greatly helping youself in case you're bought by YouTube and they decide to integrate your databases.
Helping yourself in case there appears a search engine that indexes by GUID.
Please let us know if the last 6 months brought you some clarity on this question...
Hashes aren't guaranteed to be unique, nor, I believe, consistent.
will your users have to remember/use the value? or are you looking at it from a security POV?
From a security perspective, it shouldn't matter - since you shouldn't just be relying on people not guessing a different but valid ID of something they shouldn't see in order to keep them out.
Yeah, I don't think you're looking for a hash - you're more likely looking for a Guid.If you're on the .Net platform, try System.Guid.
However, the most important reason not to use a Guid is for performance. Doing database joins and lookups on (long) strings is very suboptimal. Numbers are fast. So, unless you really need it, don't do it.
Hashes have the advantage that you can check if they are valid or not BEFORE performing any check to your database whether they exist or not. This can help you to fend off attacks with random hashes as you don't need to burden your database with fake lookups.
Therefor, if your hash has some kind of well-defined format with for example a checksum at the end, you can check if it's correct without needing to go to the database.