Why DJB CDB (constant database) was designed to use 256 hashtables?
Why not single bigger 252 * 256 hashtable?
Is it only to save space or there are some other reason?
DJB CDB uses two levels of hash tables. The first table is fixed size 2K at the beginning of the file. The second set of tables are at the end of the file, and is built in memory as data is being streamed into the cdb. Once all data has been streamed into the cdb, the second set of hash tables are streamed out to disk, and then the first table (at the beginning of the file) is populated with the offsets to each of the tables in the second set.
In other words, the multi-level hash tables allow streaming creation of the cdb with the simple exception of writing the beginning 2K of the file at the end of the cdb creation.
Access to the cdb is fast, hitting the first table (2K at beginning of the file) to find the offset of the second table (among the second set of tables) at the end of the cdb file, which provides the location of the data in the cdb.
Further information can be found in the NOTES at https://github.com/gstrauss/mcdb/ which is a rewrite of DJB's venerable cdb. mcdb is faster than cdb and removes the 4GB cdb limitation, among other benefits.
Related
I know that postgres does not allow for in memory structures. But for temp table to work efficiently it is important to have in memory structure, otherwise it would have to flow to disk and would not be that efficient. So my question is does postgres allow in memory storage for temp table? My hunch is that it does not. Just wanted to confirm it with someone.
Thanks in advance!
Yes, Postgres can keep temp tables in memory. The amount of memory available for that is configured through the property temp_buffers
Quote from the manual:
Sets the maximum number of temporary buffers used by each database session. These are session-local buffers used only for access to temporary tables. The default is eight megabytes (8MB). The setting can be changed within individual sessions, but only before the first use of temporary tables within the session; subsequent attempts to change the value will have no effect on that session.
A session will allocate temporary buffers as needed up to the limit given by temp_buffers. The cost of setting a large value in sessions that do not actually need many temporary buffers is only a buffer descriptor, or about 64 bytes, per increment in temp_buffers. However if a buffer is actually used an additional 8192 bytes will be consumed for it (or in general, BLCKSZ bytes).
So if you really need that, you can increase temp_buffers.
Via the command
select
relname as "Table",
pg_size_pretty(pg_total_relation_size(relid)) as "Size",
pg_size_pretty(pg_total_relation_size(relid) - pg_relation_size(relid)) as "External Size"
from pg_catalog.pg_statio_user_tables order by pg_total_relation_size(relid) desc;
I can retrieve the size of my tables. (according to this article) which works. But I just came to a weird conclusion. Inserting multiple values which contain approx 30,000 characters each, doesn't change the size.
When executing before inserting I get
tablename | size text |external size text
-------------------------------------------
participant | 264kb | 256kb
After inserting (btw they are base64 encoded images) and executing the select command, I get the exact same sizes returned.
I figured this couldn't be correct so I was wondering, is the command wrong? Or does PostgreSQL do something special with very large strings?
(In pgadminIII the strings do not show in the 'view data' view but do are shown when executing select base64image from participant).
And next to this was I wondering (not my main question but would be nice to have answered) if this is the best practice (since my app generates base64 images) or should I f.e. convert them to an image on the backend and store the images remotely on my server instead of in the database?
Storage management
When you insert (or update) data that requires more space on disk then it currently uses, Postgres (or actually any DBMS) will allocate that space to store the new data.
When you delete data either by setting a column to a smaller or by deleting rows, the space is not immediately released to the operating system. The assumption is that that space will most probably be re-used by subsequent updates or inserts and extending a file is a relatively expensive operation so the database tries to avoid that (again this is something that all DBMS do).
If the space allocated is much bigger then the space that is actually stored, this can influence the speed of the retrieval - especially for table scans ("Seq Scan" in the execution plan) as more blocks then necessary need to be read from the harddisk. This is also known as "table bloat".
It is possible to shrink the space used using the statement VACUUM FULL. But that should only be used if you do suspect a problem with "bloat". This blog post explains this in more details.
Storing binary data in the database
If you want to store images in the database, then by all means use bytea instead of a string value. An image encoded in Base64 takes twice as much spaces as the raw data would.
There are pros and cons regarding the question if binary data (images, documents) should be stored in the database or not. This is a somewhat subjective decision to make and depends on a lot of external factors.
See e.g. here: Which is the best method to store files on the server (in database or storing the location alone)?
A < 4 GB csv became a 7.7 GB table in my AWS Postgres instance. And a 14 GB csv wouldn't load into 22 GB of space, I'm guessing because it is also going to double in size! Is this factor of two normal? And if so, why, and is it reliable?
There are many possible reasons:
Indexes take up space. If you have lots of indexes, especially multi-column indexes or GiST / GIN indexes, they can be a big space hog.
Some data types are represented more compactly in text form than in a table. For example, 1 consumes 1 byte in csv (or 2 if you count the comma delimiter) but if you store it in a bigint column it requires 8 bytes.
If there's a FILLFACTOR set, PostgreSQL will intentionally waste space so make later UPDATEs and INSERTs faster. If you don't know what FILLFACTOR is, then there isn't one set.
PostgreSQL has a much larger per-row overhead than CSV. In CSV, the per-row overhead is 2 bytes for a newline and carriage return. Rows in a PostgreSQL table require 24 to 28 bytes, plus data values, mainly because of the metadata required for multiversion concurrency control. So a CSV with very many narrow rows will produce a significantly bigger table than one the same size in bytes that has fewer wider rows.
PostgreSQL can do out-of-line storage and compression of values using TOAST. This can make big text strings significantly smaller in the database than in CSV.
You can use octet_size and pg_column_size to get PostgreSQL to tell you how big rows are. Because of TOAST out-of-line compressed storage, the pg_column_size might be different for a tuple produced by a VALUES expression vs one SELECTed from a table.
You can also use pg_total_relation_size to find out how big the table for a given sample input is.
I have a large set of data on S3 in the form of a few hundred CSV files that are ~1.7 TB in total (uncompressed). I am trying to copy it to an empty table on a Redshift cluster.
The cluster is empty (no other tables) and has 10 dw2.large nodes. If I set a sort key on the table, the copy commands uses up all available disk space about 25% of the way through, and aborts. If there's no sort key, the copy completes successfully and never uses more than 45% of the available disk space. This behavior is consistent whether or not I also set a distribution key.
I don't really know why this happens, or if it's expected. Has anyone seen this behavior? If so, do you have any suggestions for how to get around it? One idea would be to try importing each file individually, but I'd love to find a way to let Redshift deal with that part itself and do it all in one query.
Got an answer to this from the Redshift team. The cluster needs free space of at least 2.5x the incoming data size to use as temporary space for the sort. You can upsize your cluster, copy the data, and resize it back down.
Each dw2.large box has 0.16 TB disk space. When you said you you have cluster of 10 nodes, total space available is around 1.6 TB.
You have mentioned that you have around 1.7 TB raw data ( uncompressed) to be loaded in redshift.
When you load data to redshift using copy commands redshift automatically compresses you data and load it table.
once you load any db table you can see compression encoding by below query
Select "column", type, encoding
from pg_table_def where tablename = 'my_table_name'
Once you load your data when table has no sort key. See what are compression are being applied.
I suggested you drop and create table each time when you load data for your testing So that compressions encoding will be analysed each time.Once you load your table using copy commands see below link and fire script to determine table size
http://docs.aws.amazon.com/redshift/latest/dg/c_analyzing-table-design.html
Since when you apply sort key for your table and load data , sort key also occupies some disk space.
Since table with out sort key need less disk space than table with sort key.
You need to make sure that compression are being applied to table.
When we have sort key applied it need more space to store. When you apply sort key you need to check if you are loading data in sorted order as well,so that data will be stored in sorted fashion. This we need to avoid vacuum command to sort table after data being loaded.
Can anyone explain how index files are loaded in memory while searching?
Is the whole file (fnm, tis, fdt etc) loaded at once or in chunks?
How individual segments are loaded and in which order?
How to encrypt Lucene index?
The main point of having the index segments is that you can rarely load the whole index in the memory.
The most important limitation that is taken into account while designing the index format is that disk seek time is relatively long (on plate-base hard drives, that are still most widely used). A good estimation is that the transfer time per byte is about 0.01 to 0.02 μs, while average seek time of disk head is about 5 ms!
So the part that is kept in memory is typically only the dictionary, used to find out the beginning block of the postings list on the disk*. The other parts are loaded only on-demand and then purged from the memory to make room for other searches.
As for encryption, it depends on whether you need to keep the index encrypted all the time (even when in memory) or if it suffices to encrypt only the index files. As for the latter, I think that an encrypted file system will be enough. As for the former, it is also certainly possible, as different index compression techniques are already in place. However, I don't think it's widely used, as the first and foremost requirement for full-text engine is speed.
[*] It's not really such simple, as we're performing binary searches against the dictionary, so we need to ensure that all entries in the first structure have equal length. As it's clearly not the case with normal words in dictionary and applying padding is too much costly (think of word lengths for some chemical substances), we actually maintain two levels of dictionary, the first one (which needs to fit in the memory and is stored in .tii files) keeps sorted list of starting positions of terms in the second index (.tis files). The second index is then a concatenated array of all terms in an increasing order, along with pointer to the sector in the .frq file. The second index often fits in the memory and is loaded at the start, but it can be impossible e.g. for bigram indexes. Also note that for some time Lucene by default doesn't use individual files, but so called compound files (with .cfs extension) to cut down the number of open files.