I need to convert real-time index to disk-based without reindexing data. Is it possible?
Just stop updating the RT index, its then disk based. (A RT index is a RAM chunk, and a series of Disk Chunks)
Use OPTIMIZE INDEX, to consolidate the chunks and flush the RAM chunk.
http://sphinxsearch.com/docs/current.html#sphinxql-optimize-index
(in theory you could take the RT files - after the optimize and rename them to fit the pattern of a disk index - with a suitable config for it in the config file. But no idea if will work in practice)
Related
Do column store databases such as Cassandra or Hbase (or Postgres with column store) maintain their advantages when using SSDs as storage? One of the main benefits of the column stores, is that range queries are very fast, because in theory they require only one seek, and then are read contiguously from the storage medium. Even for spinning disks this is very fast because you're reading at full speed along the fastest dimension of the medium. But is there a concept of a "contiguous read" in an SSD? IE is reading along one dimension faster than others? If not, are the column stores still relevant?
If used correctly, columnar storage will use less I/O than fetching the whole rows. Less I/O always means better performance, no matter how fast your I/O subsystem is.
In my environments I can have DB of 5-10 GB or DB of 10 TB (video recordings).
Focusing on the 5-10 GB: if I keep default settings for prealloc an small-files I can actually loose 20-40% of the disk space because of allocations.
In my production environments, the disk size can be 512G, but user can limit DB allocation to only 10G.
To implement this, I have a scheduled task that deletes the old documents from the DB when DB dataSize reached a certain threshold.
I can't use capped-collection (GridFS, sharding limitation, cannot delete random documents..), I can't use --no-prealloc/small-files flags, cause i need the files insert to be efficient.
So what happens, is this: if dataSize gets to 10G, the fileSize would be at least 12G, so I need to take that in consideration and lower the threshold in 2GB (and lose a lot of disk space).
What I do want, is to tell mongo to pre-allocate all the 10 GB the user requested, and disable further pre-alloc.
For example, running mongod with --no-prealloc and --small-files, but pre-allocate in advance all the 10 GB.
Another protection I gain here, is protecting the user against sudden disk-full errors. If he regularly downloads Game of Thrones episodes to the same drive, he can't take space from the DB 10G, since it's already pre-allocated.
(using C# driver)
I think I found a solution: You might want to look at the --quota and --quotafiles command line opts. In your case, you also might want to add the --smalfiles option. So
mongod --smallfiles --quota --quotafiles 11
should give you a size of exactly 10224 MB for your data, which, adding the default namespace file size of 16MB equals your target size of 10GB, excluding indices.
The following applies to regular collections as per documentation. But since metadata can be attached to files, it might very well apply to GridFS as well.
MongoDB uses what is called a record to store data. A record consists of two parts: the actual data and something which is called "padding". The padding is basically unused data which is used if the document grows in size. The reason for that is that a document or file chunk in GridFS respectively never gets fragmented to enhance query performance. So what would happen when the document or a file chunk grows in size is that it had to be moved to a different location in the datafile(s) every time the file is modified, which can be a very costly operation in terms of IO and time. So with the default settings, if the document or file chunk grows in size is that the padding is used instead of moving the file, thus reducing the need of moving around data in the data file and thereby improving performance. Only if the growth of the data exceeds the preallocated padding the document or file chunk is moved within the datafile(s).
The default strategy for preallocating padding space is "usePowerOf2Sizes", which determines the padding size by taking the document size and uses the next power of two size as the size preallocated for the document. Say we have a 47 byte document, the usePowerOf2Sizes strategy would preallocate 64 bytes for that document, resulting in a padding of 17 bytes.
There is another preallocation strategy, however. It is called "exactFit". It determines the padding space by multiplying the document size with a dynamically computed "paddingFactor". As far as I understood, the padding factor is determined by the average document growth in the respective collection. Since we are talking of static files in your case, the padding factor should always be 0, and because of this, there should not be any "lost" space any more.
So I think a possible solution would be to change the allocation strategy for both the files and the chunks collection to exactFit. Could you try that and share your findings with us?
I want to design my cluster and want to set proper size of key_cache and row_cache
depending on the size of the tables/columnfamilies.
Similar to mysql, do we have something like this in Cassandra/CQL?
SELECT table_name AS "Tables",
round(((data_length + index_length) / 1024 / 1024), 2) "Size in MB"
FROM information_schema.TABLES
WHERE table_schema = "$DB_NAME";
Or any other way to look for the data-size and indexes' size separately.
Or what configuration of each node would be needed to put my table completely in the memory
without considering any replication factor.
The key cache and row caches work rather differently. It's important to understand the difference for calculating sizes.
The key cache is a cache of offsets within files for locations for rows. It is basically a map from (key, file) to offset. Therefore scaling the key cache size depends on number of rows, not overall data size. You can find the number of rows from the 'Number of keys' parameter in 'nodetool cfstats'. Note this is per node, not a total, but that's what you want to decide on cache sizes. The default size is min(5% of Heap (in MB), 100MB), which is probably sufficient for most applications. A subtlety here is that rows may exist in multiple files (SSTables), the number depending on your write pattern. However, this duplication is accounted for (approximately) in the estimated count from nodetool.
The row cache caches the actual row. To get a size estimate for this you can use the 'Space used' parameter in 'nodetool cfstats'. However, the row cache caches deserialized data and only the latest copy so the size could be quite different (higher or lower).
There is also a third less configurable cache - your OS filesystem cache. In most cases this is actually better than the row cache. It avoids duplicating data in memory, because when using the row cache most likely data will be in the filesystem cache too. And reading from an SSTable in the filesystem cache is only 30% slower than the row cache in my experiments (a while ago, probably not valid any more but unlikely to be significantly different). The main use case for the row cache is when you have one relatively small CF that you want to ensure is cached. Otherwise using the filesystem cache is probably the best.
In conclusion, the Cassandra defaults of a large key cache and no row cache are the best for most setups. You should only play with the caches if you know your access pattern won't work with the defaults or if you're having performance issues.
Can anyone explain how index files are loaded in memory while searching?
Is the whole file (fnm, tis, fdt etc) loaded at once or in chunks?
How individual segments are loaded and in which order?
How to encrypt Lucene index?
The main point of having the index segments is that you can rarely load the whole index in the memory.
The most important limitation that is taken into account while designing the index format is that disk seek time is relatively long (on plate-base hard drives, that are still most widely used). A good estimation is that the transfer time per byte is about 0.01 to 0.02 μs, while average seek time of disk head is about 5 ms!
So the part that is kept in memory is typically only the dictionary, used to find out the beginning block of the postings list on the disk*. The other parts are loaded only on-demand and then purged from the memory to make room for other searches.
As for encryption, it depends on whether you need to keep the index encrypted all the time (even when in memory) or if it suffices to encrypt only the index files. As for the latter, I think that an encrypted file system will be enough. As for the former, it is also certainly possible, as different index compression techniques are already in place. However, I don't think it's widely used, as the first and foremost requirement for full-text engine is speed.
[*] It's not really such simple, as we're performing binary searches against the dictionary, so we need to ensure that all entries in the first structure have equal length. As it's clearly not the case with normal words in dictionary and applying padding is too much costly (think of word lengths for some chemical substances), we actually maintain two levels of dictionary, the first one (which needs to fit in the memory and is stored in .tii files) keeps sorted list of starting positions of terms in the second index (.tis files). The second index is then a concatenated array of all terms in an increasing order, along with pointer to the sector in the .frq file. The second index often fits in the memory and is loaded at the start, but it can be impossible e.g. for bigram indexes. Also note that for some time Lucene by default doesn't use individual files, but so called compound files (with .cfs extension) to cut down the number of open files.
As I know, MongoDB is optimized for situation when all data fits into memory. And as I understood GridFS uses standard collection and all standard storage methods. Is it?
Does it mean that storing a large set of data (images at my case), that bigger that current amount of memory, it will forse out my real data from memory?
Maybe MongoDB smart enough to give less priority for GridFS collection?
MongoDB uses memory-mapped files to manage its data files. If you use data, it will stay in memory. If you don't use it, it will eventually be flushed to disk (and be read back, when you request it next time). If you need to read all your data, you better fit it all in RAM or your system might enter the deadly swap spiral (depends on your load, of course).
If you just store data and don't do much with it, MongoDB will use only a fraction of memory. For example, in one of my projects total dataset size is over 300 GB and mongo takes only 800 MB of RAM (because I almost don't read data, only write it).