I've seen multiple sizes and I don't want to waste server memory on a MySQL field which is reserving too much space for too many characters. What's the biggest they can get and will this ever change?
This is how integer overflows, integer->string migrations etc happen, by making datatypes too restrictive. Splash out on a few bytes for a 128b varchar and save yourself the hassle down the road. If your user base gets so massive that you need to be worrying about how many bytes you will save by crunching data types of UIDs, consider yourself a huge success and that is a problem you will be happy to solve.
Short answer, I dont think anyone will be able to answer your question, "ever" is a long time and who knows how many entities facebook will have enslaved by then.
I'll end with a quote from one who said it best;
We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil.
—Donald E. Knuth
How could you ever be sure that this will never change? Better make it a varchar.
Currently, Facebook UIDs are 64 bits integers. But I can't guarantee that won't change one day.
The Facebook UID will never change because it is a unique identifier in their database. If it changed then facebook would cease to work
Related
Ok, so every player in my game has a document in my players collection and each player has 1 string that is a serialized has of their game state. So this string can be
way long or way short and vary a lot for every single player.
I had somebody who doesn't have a ton of mongo experience tell me that i should pad every single string in the collection so that they are all the same length. So like add tons of zeros at the end to all the short and medium game state strings.
So A) is this a good idea?
B) I'm not even totally sure how to find out the longest length of a game so Im not sure how far to pad them and what if later on game states exceed my padding length?
My friend said he had a mongo collection keep blowing up because of fragmentation and when he implemented padding all of his issues went away.
oh i doubt it matters but my code is in php and obviously uses the php pecl mongo driver
Thanks for any thoughts or input!!!!!
-dave
MongoDB allocates space for documents at creation time. If the size of the document increases the document will need to be moved to a new location to accomodate the larger size. The original space is not released to the operating system. Instead, MongoDB will eventually reuse this space. Until this happens, it may appear the database is over-allocated or what is sometimes called fragmented.
So, what probably happened to your friend:
documents were inserted
when fields were updated, their sizes sometimes increased, and the documents therefore grew
documents were moved as they grew, and the database became over-allocated (what your
friend called fragmented)
And by padding the fields in the documents your friend was able to ensure documents never grew in size and therefore his database never became over-allocated.
The padding approach is valid but it also adds complexity to the application. Typically padding is performed for fields that will eventually be created, rather than fixing the size of the values themselves, but the idea is the same. In your case it doesn't sound like padding is a great option because you cannot predict the field size.
Instead, you might consider using usePowerOf2Sizes: http://docs.mongodb.org/manual/reference/command/collMod/
This configuration will automatically pad the space allocated for documents and will increase the chances that space is reused for efficiently by MongoDB at the cost of a slightly larger database.
So A) is this a good idea?
Depends. If the game documents were to be frequently updated in such a manner that they would move on disk a lot then you might find that padding does help, however, considering that the entire works of Shakespear can fit into a 4mb document with some room left I doubt very much that any string you have will cause a heavy amount of fragmentation; in fact I will be quite surprised if it does.
The problem that could, in theory, occur is that you get a lot of documents within your freelists and deleted buckets that cannot be reused causing fragmentation to occur.
Not only that but the IO of disk movement can be a killer if it becomes persistent.
B) I'm not even totally sure how to find out the longest length of a game so Im not sure how far to pad them and what if later on game states exceed my padding length?
Then the idea is useless, infact the idea is 90% of the time useless anyway and you would be better off using a power of 2 sizes allocation on your documents if this were to be a problem: http://docs.mongodb.org/manual/reference/command/collMod/#usePowerOf2Sizes
Using this option would be a far more optimal approach to solving fragmentation issues.
My friend said he had a mongo collection keep blowing up because of fragmentation and when he implemented padding all of his issues went away.
A friend of a friend, of a cousin, of a niece of mine said something similar too...you would be better off testing this for yourself.
I would bet that the bigger problem he had was with indexes and the queries he performed. It is extremely rare for string lengths to cause such a heaving amount of IO usage in disk movement that you would actually use artificial padding.
From your question I understand those strings are just blobs, i.e. they are not structured in some way for allowing db queries/filtering on their contents. If this is the case, store them in files, and store file names in the mongo document.
I know, think in "denormalized way" or "nosql way".
but tell me about this simple use-case.
db.users
db.comments
some user post a comment, and i want to fetch some user data while fetching a comment.
say i want to show dynamic data, like "userlevel", and static data like "username".
with the static data i will never have problems, but what about the dynamic data?
userlevel is in users collation, i need the denormalized data duplicated into comments to archieve read performance but also having the userlevel updated.
is this archiveable in some way?
EDIT:
Just found an answer of Brendan McAdams, guy from 10gen, who is obviously way way authoritative than me, and he recommends to embed documents.
older text:
The first one is to manually include to each comment ObjectID of user they're belong.
comment: { text : "...",
date: "...",
user: ObjectId("4b866f08234ae01d21d89604"),
votes: 7 }
The second one, and clever way is to use DBRefs
we add extra I/O to our disk, losing performance am i right? (i'm not sure how this work internally) therefore we need to avoid linking if possible, right?
Yes - there would be one more query, but driver will do it for you - you can think of this as of kind of syntax sugar. Does it affect performance? Actually, it is depends too :) One of the reasons why Mongo so freaking fast is that it is using memory mapped files
and mongo try it best to keep all of the working set (plus indexes) directly in RAM. And every 60 seconds (by default) it syncs RAM snapshot with disk based file.
When I'm saying working set, I mean things you are working with: you can have three collections - foo, bar, baz, but if you are working now only with foo and bar, they ought to be loaded into ram, while baz stays on disk abandoned. Moreover memory mapped files allow as to load only part of the collection. So if you're building something like engadget or techcrunch there is high probability that working set would be comments for the last few days and old pages will be revived way less frequently (comments would be spawned to memory on demand), so it doesn't affect performance significally.
So recap: as long as you keep working set in memory (you may think that is read/write caching), fetching of those things is superfast and one more query wouldn't be a problem. If you working with a slices of data that doesn't fit into memory, there would be speed degradation, but I don't now your circumstances -- it could be acceptable, so in both cases I tend to choose do use linking.
I am assigned with the task to implement a functionality to shorten text typed text
For example , I type text like "you" when I highlight it and it has to change like "u"
I will have table which has list of words which has longer version of text and with text to be replaced.so whenever a user types word and highlights it i want to query the db for match , if a match is found I want to replace the word with the shortened word.
This is not my idea and am being assigned to this implementation.
I think this functionality will down the speed of the app responsiveness. And it has some disadvantages over the user friendliness of the application.
So I'd like to hear your opinions on what are the disadvantages it has and how can I implement this in a better manner. Or is that ok to have this kind of functionlity? Won't it affect the app speed?
It's hard to imagine that you'll see a noticeable decrease in performance. Even the iPhone 3G's processor runs at around 400MHz, and someone typing really fast on an iPhone might get four or five characters entered in a second. A simple implementation of the sort of thing you're talking about would involve a lookup in a data structure such as a dictionary, tree, or database, and you ought to be able to do that pretty quickly.
Why not try it? Implement the simplest thing you can think of and measure its performance. For the purpose of measuring, you might want to use a loop to repeatedly look up words from an array. Count the number of lookups you can do in, say, 20 seconds, and divide by 20 to get the average number per second.
i dont think it will take a lot of performance, anyway you can use the profiler to check how long every method is taking, as for the functionality, i believe you should give the user the opportunity to "undo" and keep his own word (same as apple's auto correction)
I have to store about 10k text lines in an Array. Each line is stored as a separate encrypted entry. When the app runs I only need to access a small number and decrypt them - depending on user input. I thought of some kind of lazy evaluation but don't know how to do it in this case.
This is how I build up my array: [allElements addObject: #"wdhkasuqqbuqwz" ] The string is encrypted. Accessing is like txt = [[allElements objectAtIndex:n] decrypt]
The problem currently is that this uses lots of memory from the very start - most of the items I don't need anyway, just don't know which ones ;). Also I am hesitant to store the text externally eg in a textfile, since this would make it easier to access it.
Is there a way to minimize memory usage in such a case?
ps initialization is very fast, so no issue here
So it's quite a big array, although not really big enough to be triggering any huge memory warnings (unless my maths has gone horribly wrong, I reckon your array of 10,000 40-character strings is about 0.76 MB. Perhaps there are other things going on in your app causing these warnings - are you loading any large images or many assets?
What I'm a little confused about it how you're currently storing these elements before you initalise the array. Because you say you don't want to store the text externally in a text file, but you must be holding them in some kind of file before initialising your array, unless of course your values are generated on the fly.
If you've encrypted correctly, you shouldn't need to care whether your values are stored in plain-sight or not. Hopefully you're using an established standard and not rolling your own encryption, so really I think worrying about users getting hold of the file is a moot point. After all, the whole point of encryption is being able to hide data in plain sight.
I would recommend, as a couple of your commenters already have, is that you should just use some form of database storage. Core Data was made for this purpose - handling large amounts of data with minimal memory impact. But again, I'm not sure how that array alone could trigger a memory warning, so I suspect there's other stuff going on in your app that's eating up your memory.
What is the best way to get this thing done:
I have a huge table with +40k records (tv show titles) in sqlite and I want to do real time lookups to this table. For eg if user searches for a show, as and when user enters search terms I read sqlite and filter records after every keystroke (like google search suggestion).
My performance benchmark is 100 milliseconds. A few things I have thought of are: creating indexes, splitting the data into multiple tables.
However, I would really appreciate any suggestions to achieve this in the fastest possible time so I can avoid any ui refresh delays - it would be awesome to have feedback from coders who have already done something similar.
Things to do:
Index fields appropriately.
Limit yourself to only 10-15 records on the initial query—that should be enough to populate the top of the table view.
If you don't need to sort, don't. If you do need to sort, sort on an indexed field.
Do as much as you can in SQLite rather than your own code.
Do as little as you can overall.
You'll likely find what I have: SQLite and the iPhone are actually amazingly capable as long as you don't do anything really dumb.
Keep "perceived performance" in mind - doing lookups right after a key is hit is could be somewhat expensive. How many milliseconds does it take a user to hit a key, though? You can probably get away with not updating the resultlist until the user hasn't typed anything for several hundred milliseconds. (For really fast users, perhaps update every X hundred millisecodns while he's still typing).
How do you know the performance will be bad? 40k rows is not that much, even for an iPhone... try it on the phone before you optimize.
Avoid doing any joins, try to use paging so that you keep the amount of data returned to a minimum. Perhaps you should try loading the whole thing into memory, then sort and do binary search? If it is just a list of show titles it would fit?