Can I tell Lucene.net not to create files over a certain size? - lucene.net

I have an upload limit on file sizes of 100MB but my index files are coming out larger than that. Is there an option somewhere I can set so that at a specific file size the index will spill out into a secondary file?

Sort of. From Java Lucene FAQ:
Is there a way to limit the size of an index?
This question is sometimes brought up because of the 2GB file size
limit of some 32-bit operating systems.
This is a slightly modified answer from Doug Cutting:
The easiest thing is to use IndexWriter.setMaxMergeDocs().
If, for instance, you hit the 2GB limit at 8M documents set
maxMergeDocs to 7M. That will keep Lucene from trying to merge an
index that won't fit in your filesystem. It will actually effectively
round this down to the next lower power of Index.mergeFactor.
So with the default mergeFactor set to 10 and maxMergeDocs set to 7M
Lucene will generate a series of 1M document indexes, since merging 10
of these would exceed the maximum.

Related

GridFS disk management

In my environments I can have DB of 5-10 GB or DB of 10 TB (video recordings).
Focusing on the 5-10 GB: if I keep default settings for prealloc an small-files I can actually loose 20-40% of the disk space because of allocations.
In my production environments, the disk size can be 512G, but user can limit DB allocation to only 10G.
To implement this, I have a scheduled task that deletes the old documents from the DB when DB dataSize reached a certain threshold.
I can't use capped-collection (GridFS, sharding limitation, cannot delete random documents..), I can't use --no-prealloc/small-files flags, cause i need the files insert to be efficient.
So what happens, is this: if dataSize gets to 10G, the fileSize would be at least 12G, so I need to take that in consideration and lower the threshold in 2GB (and lose a lot of disk space).
What I do want, is to tell mongo to pre-allocate all the 10 GB the user requested, and disable further pre-alloc.
For example, running mongod with --no-prealloc and --small-files, but pre-allocate in advance all the 10 GB.
Another protection I gain here, is protecting the user against sudden disk-full errors. If he regularly downloads Game of Thrones episodes to the same drive, he can't take space from the DB 10G, since it's already pre-allocated.
(using C# driver)
I think I found a solution: You might want to look at the --quota and --quotafiles command line opts. In your case, you also might want to add the --smalfiles option. So
mongod --smallfiles --quota --quotafiles 11
should give you a size of exactly 10224 MB for your data, which, adding the default namespace file size of 16MB equals your target size of 10GB, excluding indices.
The following applies to regular collections as per documentation. But since metadata can be attached to files, it might very well apply to GridFS as well.
MongoDB uses what is called a record to store data. A record consists of two parts: the actual data and something which is called "padding". The padding is basically unused data which is used if the document grows in size. The reason for that is that a document or file chunk in GridFS respectively never gets fragmented to enhance query performance. So what would happen when the document or a file chunk grows in size is that it had to be moved to a different location in the datafile(s) every time the file is modified, which can be a very costly operation in terms of IO and time. So with the default settings, if the document or file chunk grows in size is that the padding is used instead of moving the file, thus reducing the need of moving around data in the data file and thereby improving performance. Only if the growth of the data exceeds the preallocated padding the document or file chunk is moved within the datafile(s).
The default strategy for preallocating padding space is "usePowerOf2Sizes", which determines the padding size by taking the document size and uses the next power of two size as the size preallocated for the document. Say we have a 47 byte document, the usePowerOf2Sizes strategy would preallocate 64 bytes for that document, resulting in a padding of 17 bytes.
There is another preallocation strategy, however. It is called "exactFit". It determines the padding space by multiplying the document size with a dynamically computed "paddingFactor". As far as I understood, the padding factor is determined by the average document growth in the respective collection. Since we are talking of static files in your case, the padding factor should always be 0, and because of this, there should not be any "lost" space any more.
So I think a possible solution would be to change the allocation strategy for both the files and the chunks collection to exactFit. Could you try that and share your findings with us?

Does MongoDB reuse deleted space?

First off, I know about this question:
Auto compact the deleted space in mongodb?
My question is not about shrinking DB file sizes though, but more about the reuse of deleted space. Say I have 100K documents in a collection, I then delete 50K of those. Will Mongo reuse the space within its data file that the deleted documents have freed? Or are they simply "marked" as deleted?
I don't care so much about the actual size of the file on disk, its more about "does it just grow and grow".
Update (Mar 2015): As of the 3.0 release, there are multiple storage engines available in MongoDB. This answer applies to the MMAP storage engine (still the default in MongoDB 3.0), the answer for other engines (WiredTiger for example) is quite different and may well be tunable and adjustable. Hence if you are using another engine, please read the relevant docs for that storage engine to determine what your space re-use defaults and options are.
With the MMAP storage engine, when documents are deleted the space left behind is put into a free list. However, to use the space there will need to be similarly sized documents inserted later, and MongoDB will need to find an appropriate space for that document within a certain time frame (once it times out looking at the list, it will just append) otherwise the space re-use is not going to happen very often. This deletion is done within the data files, so there is no disk space reclamation happening here - all of this is done internally within the existing data files.
If you subsequently do a repair, or resync a secondary from scratch, the data files are rewritten and the space on disk will be reclaimed (any padding on docs is also removed). This is where you will see actual space reclamation on-disk. For any other actions (compact included) the on disk usage will not change and may even increase.
With 2.2+ you can now use the collMod command and the usePowersOf2Sizes option to make the re-use of deleted space more likely (note that this is the default in 2.6+). This means that the initial space allocation for a document is a bit less efficient (512 bytes for a 400 byte doc for example) but means that when a new doc is inserted it is more likely to be able to re-use that space. If you are deleting (or growing and hence moving) documents a lot, then this will be more efficient in the long term.
For anyone that is interested, one of the people that wrote a lot of the storage code (Mathias Stearn) has a great presentation about the storage internals, which can be found here

How to check the available free space in mongodb

I am newly starting on mongodb. I know that by default, mongodb pre allocated data files in the order of 64MB, 128MB, 256MB, ....2GB, 2GB, 2GB....etc
Now let us supposed i have a mongo instance that is preallocated with
64MB and 128MB for each of the different collections.
Now I would like to know the below
How to check the available free space in the 128MB file of each collections?
How to check the total size of all the collections, data bases?
How to check the total available free space out of allocated?
Checking free space in the internal storage files of mongo is not something you normally (or at all) do.
Mongo provides a number of functions which you can use to check the size of collections and some helpful scripts demonstrating how to run the functions on each database on the server.
You can also wrap such functions in a script to make reading easier, such as this script which lists the size and counts for all collections in a database.

newbie to mongodb: about initial files size

Just wanted to understand.
I've just installed mongodb to test it on Windows OS. For each DB it creates 2 files: dbname.0 and dbname.ns
these db files have constant initial size (dbname.0 - 67MB and dbname.ns 16MB)
Is it normal and if yes why?
Thanks!
Yes, this is normal - these are the pre-allocated datafile and the namespace file.
dbname.0 is the pre-allocated initial datafile , which starts with 64MB
dbname.ns is for book-keeping. ns stands for namespace. The default limit for the 16MB .ns file supports 24,000 namespaces (collections + indexes) (see: --nssize parameter)
whenever MongoDB grows beyond the size of the last dbname.x file, it allocates a new data file with twice the size, up to size 2GB. Once the file size reaches 2GB, each successive file is also 2GB.
See:
http://www.mongodb.org/display/DOCS/Using+a+Large+Number+of+Collections
http://www.mongodb.org/display/DOCS/Developer+FAQ
http://www.mongodb.org/display/DOCS/Excessive+Disk+Space
http://comments.gmane.org/gmane.comp.db.mongodb.user/49819
Also:
How many collections are possible in a MongoDB without losing performance?

How much can SQLite store on the iPhone?

I have an idea for a webapp for the iPhone but its unknown to me how much data can be stored in mobile Safari's SQLite db. I tried searching through the Apple docs but found nothing:
Safari Client-Side Storage and Offline Applications Programming Guide: Using the JavaScript Database
Most of these answers are totally wrong. Safari will not allow you to create SQLite databases over 50MB (or expand existing databases beyond that size).
This is a limit imposed by Safari - as other people have noted, SQLite itself supports much larger databases that you can use from native apps. But webapps are limited to 50MB.
It might be useful to note that this is per database - if you really need the extra space, you can create multiple databases, although this would obviously cause a lot of hassle.
It's as the other posters say. You're only limited by the drive space on the device.
You also need to consider your in memory footprint though. There is a finite amount of memory on the iphone, and in general it's quiet small, so the amount of data/hydrated objects you'll be able to have in memory is another potential limitation for your app.
There are a LOT of people answering that have clearly never tested it. I am on the latest version of iOS (4.3.3) and have set up a system to create multiple databases and keep them under 45 MB but found that the 50 MB cap is for the site as a whole. So, no matter how much you split the data up, it still restricts it to an aggregated cap of 50 MB.
The database size limit on safari mobile, is 50 mb per site not per database. i have tested this. even if you have an extra empty database you cannot add to it if the total size of all databases on a single site is 50 mb
whats worth noting as well is that characters are saved as double bytes on websql, that is 2 million characters will be 4 megabytes not 2 megabytes on disk.
You are only limited by the amount of free space on the device.
I'm not sure. If you were doing your own application you'd be limited by free space on the device and to some extent in memory footprint (as Bryan McLemore points out).
However since you're looking at using JavaScript inside of Safari there's no easy way to tell. According to the document you found it looks like it may be limited by site, but there's nothing telling you how much. I'd suggest writing a quick script to fill up the database and figure out how much it actually is. After that, I'd probably halve that value and assume I'd be always be able to use that much.
Be sure to report back so we'll all know!
It's most likely 32 terabytes... which is well over the available disk space.
I reached this number by multiplying the maximum page size by the maximum page count listed at the bottom of the SQLite limits page.
Limits In SQLite
"Limits" in the context of this article means sizes or quantities that can not be exceeded. We are concerned with things like the maximum number of bytes in a BLOB or the maximum number of columns in a table.
SQLite was originally designed with a policy of avoiding arbitrary limits. Of course, every program that runs on a machine with finite memory and disk space has limits of some kind. But in SQLite, those limits were not well defined. The policy was that if it would fit in memory and you could count it with a 32-bit integer, then it should work.
Unfortunately, the no-limits policy has been shown to create problems. Because the upper bounds were not well defined, they were not tested, and bugs (including possible security exploits) were often found when pushing SQLite to extremes. For this reason, newer versions of SQLite have well-defined limits and those limits are tested as part of the test suite.
As of version 3.6.19 (all statistics in the report are against that release of SQLite), the SQLite library consists of approximately 65.7 KSLOC of C code. (KSLOC means thousands of "Source Lines Of Code" or, in other words, lines of code excluding blank lines and comments.) By comparison, the project has 690 times as much test code and test scripts - 45409.7 KSLOC.
The default storage limit on iPhone seems to be 5mb
davibe has done some work to raise the limit up to 1GB with his PhoneGap plugin.
https://github.com/davibe/Phonegap-SQLitePlugin
The plugin calls the native sqlite3 API, with a wrapper on the Javascript side.
The relevant code extracted from sqlite.js are:
update origins set quota = '999999999999' where origin = 'file__0';
"update databases set estimatedSize = '999999999999' where name = '" + dbName + "';'";
Caution: my iphone is jailbroken! But I don't suspect that this changes anything.
The limit of 50MB is no longer correct.
On my iPhone 4S with iOS 6.1 I have a database of 58.66 MB (448496 records) for my webclip (website pinned to the springboard).
No special tricks, just standard HTML5 usage.
Maximum Database Size
Please refer Official Sqlite site
Every database consists of one or more "pages". Within a single database, every page is the same size, but different database can have page sizes that are powers of two between 512 and 65536, inclusive. The maximum size of a database file is 2147483646 pages. At the maximum page size of 65536 bytes, this translates into a maximum database size of approximately 1.4e+14 bytes (140 terabytes, or 128 tebibytes, or 140,000 gigabytes or 128,000 gibibytes).
This particular upper bound is untested since the developers do not have access to hardware capable of reaching this limit. However, tests do verify that SQLite behaves correctly and sanely when a database reaches the maximum file size of the underlying filesystem (which is usually much less than the maximum theoretical database size) and when a database is unable to grow due to disk space exhaustion.